Make post-review less sensitive to CVS server name aliases

Review Request #576 — Created Oct. 1, 2008 and submitted

Information

Review Board SVN (deprecated)

Reviewers

You might have many aliases for your CVS server.  People can set up their sandboxes using any of these names.  Unfortunately, if the hostname used in the reviewboard repository definition doesn't match what the user used in their cvs checkout command, post-review will barf.  This patch converts the local hostname to the canonical name for the machine.  When setting up the CVS repository object all you have to do is use the canonical name for the server.
Confirmed that it worked for our setup at work.  Note however that we have a kind of odd behind-a-firewall network setup.  The canonical name doesn't include any domain information.  This patch might need adjustment to be more robust in a more normal network setup.
JH
  1. The code looks fine, for what it tackles.  But revmaps often are dicey, so the whole business of doing fwd-map => rev-map makes me a little nervous.  I understand that something is causing perceived pain; let's explore that.
    
    (1)  Would you maybe be willing to view hostname mismatches as something that will yield fatal errors for the end user and will therefore motivate folks to advertise and stick to a canonical hostname if they want things to work?  (That is, maybe "un-ask" the feature request?  Sigh, ok, didn't think so.)
    
    (2)  Next thing that makes me nervous is that byname() and byaddr() may get stuck for 30 or 120 seconds before timing out on distant nameservers and eventually returning to you.  Probably that's OK, here, but sprinkling debug()s might help end users diagnose such delays.  Delayed return from the oft-troublesome byaddr() could be avoided if the code is able to just use a dotted-quad instead.  No?  Sigh, didn't think so.  Do bear in mind that IPs change from time to time, sometimes changing into a /24 with broken revmaps.  Also, some servers have zero or more A-records plus zero or more AAAA's; I verified just now that running your code several times will yield several distinct IPs.
    
    (3)  Ok, we're committed to doing byname() and then byaddr().  Perhaps you could have the code insist that double checking the result (via socket.gethostbyname(canon)) yields the original "host" string?  That is, insist on 3 successful mappings, byname byaddr byname, and insist that we be able to map back and forth between name and number before deciding that we truly do have the canonical name of the server.  This is a pretty reasonable requirement, which seems pretty closely aligned with your original motivation for the feature request.
    
    (4)  The canon string came back from byname(), and optionally passed the check suggested by item (3).  At this point the content might still be uglier than we would like.  DNS is "case preserving but case insensitive".  You probably want to case smash to lower.  You really don't want to accept characters outside of [\.\w_-]+.  This is an extremely reasonable requirement, which I encourage you to enforce.  Validating data from the outside is just good security hygiene.
    
    (5)  A repo with an AAAA but no A (an IPv6 repo) will work fine, but will not be normalized until python2.6.  Might be worth a comment in the code.  This detail tends to motivate item (1), above.
    
    (6)  Asking gethostbyaddr for the revmap of 82.210.242.149 (or 68.87.76.178) might yield a string like "buecher.ch" or "bücher.ch" or "xn--bcher-kva.ch".  At this point, the notion of "canonical" starts to make my head hurt -- see item(1), above.
    
    
    The bottom line is that if you don't really really need to call gethostbyaddr(), then avoid calling it, since it can yield odd results and take a long time to do so.  You might decide that socket.getfqdn() offers enough functionality to address your needs.  Good luck!
    1. > (1)  Would you maybe be willing to view hostname mismatches as something that will yield fatal errors for the end user
      > and will therefore motivate folks to advertise and stick to a canonical hostname if they want things to work?  (That is,
      > maybe "un-ask" the feature request?  Sigh, ok, didn't think so.)
      
      I'm personally not against that, but given a diverse group of developers trying to attempt addition of reviewboard to existing
      projects, I see the strict hostname acceptance that reviewboard's CVS adaptor implements as a barrier to acceptance of
      reviewboard as a new addition to the tool chain.
      
      > (2)  Next thing that makes me nervous is that byname() and byaddr() may get stuck for 30 or 120 seconds before timing
      > out on distant nameservers and eventually returning to you.  Probably that's OK, here, but sprinkling debug()s might help
      > end users diagnose such delays.  Delayed return from the oft-troublesome byaddr() could be avoided if the code is able
      > to just use a dotted-quad instead.
      
      The only problem with just using dotted quads is that then your reviewboard configuration isn't resilient in the face of CVS
      server migration.  (Using dotted quads was actually my first thought.)
      
      > The bottom line is that if you don't really really need to call gethostbyaddr(), then avoid calling it, since it can yield odd
      > results and take a long time to do so.  You might decide that socket.getfqdn() offers enough functionality to address your
      > needs.
      
      Will do.
  2. 
      
david
  1. Committed with minor changes as r1522. Thanks!
  2.