Fix keyword expansion in files from SVN repositories

Review Request #276 — Created Feb. 21, 2008 and submitted

Information

Review Board SVN (deprecated)
trunk
362

Reviewers

Fixed issues with SVN keyword expansion. If a file had $Id$ or $Date$ or something and the keyword was defined in that file's svn:keywords property, svn cat would return the file with the keyword expanded. This broke patching for us and essentially meant that any patches modifying anything near a keyword expansion would break.

We now collapse the keyword by turning $Keyword:.*$ into $Keyword$. We only do this for keywords actually defined in that file's svn:keywords, in order to prevent collapsing things that aren't actually defined keywords but look like them.

This also supports known keyword aliases.
Unit tests succeed! Debug output shows we're collapsing correctly.
CU
  1. This is a great start!  I imagine that this alone will fix the issue for most of my users, so let's please get this in.  However, there are a few corner cases I can think of that the current test doesn't expose:
    
    1. Fixed-length substitution:  By using a double-colon, you can tell SVN the exact length of the field.  e.g.,
        $Revision::        $
    becomes
        $Revision:: 42     $
    
    2. Keyword aliases:  There are only 5 keywords that SVN understands, but several have equivalent aliases.  The worst part is that expansion still happens if the file uses a different alias than what's in svn:keywords.
    
    3. Non-keywords:  The property can contain any arbitrary keywords, and any invalid keywords are silently ignored.  Your code will happily collapse anything.  Admittedly, it would be a quite malformed repo for someone to run into this.
    1. That regex should handle case #1, shouldn't it? As long as "$Keyword:" exists, anything after that and up until a "$" should be condensed. We should test for this in the unit tests, though.
      
      #2 is a problem. I'll have to cover that at some point. Hopefully it's less common?
      
      #3 I'm less concerned about, as it's sort of user error.
    2. This looks pretty good.  I agree with you that the regex covers #1.  According to the svnbook, there's actually a small number of aliases:
      
      Date -> LastChangedDate
      Revision -> LastChangedRevision, Rev
      Author -> LastChangedBy
      HeadURL -> URL
      
      It should be easy enough to just create a small table and iterate over a few regexes.
    3. The difference with the double-colon (#1) is that the unexpanded version in the diff will still have the colons and spaces.  Then in expansion it doesn't touch the width at all; it just replaces the spaces with field contents.
      
      I think we can handle both cases in re.sub() with a function instead of a replacement string, i.e.:
      
      def repl(m):
          if m.group(2):
              return "$%s::%s$" % (m.group(1), " "*len(m.group(3)))
          return "$%s$" % m.group(1)
      data = re.sub(r"\$(%s):(:?)([^\$]+)\$" % keyword_names, repl, data)
    4. Excellent, worked like a charm.
      
      Going to commit this and leave aliases to another day.
  2. /trunk/reviewboard/scmtools/svn.py (Diff revision 1)
     
     
    The docs on svn:keywords specify that they are delimited by whitespace, not commas.
  3. I sure wish we didn't have to re-implement all of SVN's crazy keyword rules to make this robust.  The most painful part is that SVN checkouts do have an unexpanded copy on the client side in .svn/text-base/foobar.svn-base!  I can't find any way to capture that using pysvn though, unless you actually create a temporary working directory. (yuck)
CU
  1. This is a great start!  I imagine that this alone will fix the issue for most of my users, so let's please get this in.  However, there are a few corner cases I can think of that the current test doesn't expose:
    
    1. Fixed-length substitution:  By using a double-colon, you can tell SVN the exact length of the field.  e.g.,
        $Revision::        $
    becomes
        $Revision:: 42     $
    
    2. Keyword aliases:  There are only 5 keywords that SVN understands, but several have equivalent aliases.  The worst part is that expansion still happens if the file uses a different alias than what's in svn:keywords.
    
    3. Non-keywords:  The property can contain any arbitrary keywords, and any invalid keywords are silently ignored.  Your code will happily collapse anything.  Admittedly, it would be a quite malformed repo for someone to run into this.
  2. I sure wish we didn't have to re-implement all of SVN's crazy keyword rules to make this robust.  The most painful part is that SVN checkouts do have an unexpanded copy on the client side in .svn/text-base/foobar.svn-base!  I can't find any way to capture that using pysvn though, unless you actually create a temporary working directory. (yuck)
CU
  1. 
      
  2. /trunk/reviewboard/scmtools/svn.py (Diff revision 2)
     
     
     
     
     
     
     
    This table needs to have the reverse mappings as well.
  3. /trunk/reviewboard/scmtools/svn.py (Diff revision 2)
     
     
     
     
     
     
     
    If you make keyword_aliases include self-references (e.g., "Id": ["Id"], "URL": ["URL", "HeadURL"], etc.), and then build the final list purely from the alias table, instead of appending, then unknown keywords will be ignored (corner case #3).
    
    You'll also prevent any weird values from going into '|'.join(keywords) which might break the regular expression.
    
    I know these cases are unlikely to crop up in real repositories, but it's pretty easy to be protective here.
  4. /trunk/reviewboard/scmtools/tests.py (Diff revision 2)
     
     
    Debug statement...
  5. With those tweaks, this will cover all the cases that I can think of.  Thanks!
david
  1. 
      
  2. 
      
CU
  1. 
      
  2. /trunk/reviewboard/scmtools/svn.py (Diff revision 4)
     
     
    Need "Id" and "Rev" in here also.
  3. Looks good - please commit!