Properly display SVN diffs with changes in $Keyword$ formatting

Review Request #3105 — Created May 11, 2012 and submitted

Information

Review Board

Reviewers

We encountered this issue when a developer accidentally committed several files with inappropriate keyword format, namely $Header:$ instead of $Header$. I am not sure how it happened at that time and I cannot reproduce adding such file with a recent Subversion by just adding a file. The same end result can be reproduced, though, using the Subversion's svnmucc utility:

$ cd /tmp
$ svnadmin create foo
$ echo '$Header:$' > file
$ svnmucc put file file:///tmp/foo/file 
r1 committed by aneyman at 2012-05-12T04:22:21.507824Z
$ svnmucc mv file:///tmp/foo/file file:///tmp/foo/badfile propset svn:keywords Header file:///tmp/foo/badfile
r2 committed by aneyman at 2012-05-12T04:22:48.440614Z
$ svn co file:///tmp/foo foo.wc
A    foo.wc/badfile
Checked out revision 2.
$ cd foo.wc/
$ cat badfile 
$Header: file:///tmp/foo/badfile 2 2012-05-12 04:22:48Z aneyman $
$ echo hi >> badfile 
$ svn di badfile 
Index: badfile
===================================================================
--- badfile     (revision 2)
+++ badfile     (working copy)
@@ -1 +1,2 @@
-$Header:$
+$Header$
+hi

As seen above, whenever one of these files is modified, 'svn diff' generates (and then post-review uploads) a diff which contains the following lines:

- ... $Header:$
+ ... $Header$

However, when RB tries to apply this diff, it first contracts the keywords in the file it gets from the repository. Thus, a file with contracted keyword has $Header$ and the patch above fails to apply cleanly.

The solution proposed is to apply the same keyword contraction technique to the patch as it is applied to the sources from the repository. This makes the change in $Header:$ not be displayed, but the rest of the changes are displayed fine. The regexp matching header contraction is modified to account for the $Keyword:$ form (i.e. with no characters following the colon).
./reviewboard/manage.py test
Description From Last Updated

No need to default this to None. We'll just say that a revision is required. Can you add a docstring …

chipx86chipx86

I know we have the double underscore in other functions, but we should really be changing them. Python will mangle …

chipx86chipx86

Can you remove the extra space before the "=" while you're here?

chipx86chipx86

Blank line both before and after this line.

chipx86chipx86

Might as well call this get_file_keywords. I'd prefer not to abbreviate unless we have to.

chipx86chipx86

Blank line between these.

chipx86chipx86

Should just be "if keywords:" Blank line before this.

chipx86chipx86

Blank line between these.

chipx86chipx86
chipx86
  1. I want to make sure I understand this right.
    
    "$Header:$" was being added through a somewhat non-standard workflow, and we take that and turn it into "$Header$". Meanwhile, svn diff makes a change, turning it into "$Header$" as well. This patch fails because we take the "- $Header:$" and turn it into "- $Header$", which is no longer correct.
    
    Am I right so far?
    
    Given that, can we simplify this to just not contract it when there's no value after the ":"? The change here feels like a very heavy-weight solution that adds a fair amount of complexity that may have to be duplicated for other systems.
    1. Not exactly. The file in SVN repository has $Header:$ and this is how it appears in the "original" file in the working copy. For Subversion's keyword expansion, however, this is the same as if it were $Header$. That is, on checkout, the keyword is expanded. When such file is changed, 'svn diff' contracts the keyword back to its "canonical" form, $Header$, before comparing the modified file to the original. Because of that, 'svn diff' output includes the following hunk in the patch:
      
      -$Header:$
      +$Header$
      
      Now, when RB tries to obtain the original through 'svn cat', it receives an expanded keyword, e.g.:
      
      $Header: svn://foo/bar/baz.c 123 2012-01-01 05:30:32Z somebody $
      
      To apply the patch, RB contracts this keyword back to $Header$. But, with the diff hunk above would not apply to $Header$ (as it expects $Header:$ on the removed line). So, your proposed approach won't work, as that diff hunk would also fail to apply to the expanded keyword that 'svn cat' produces.
      
      It would be much easier if 'svn cat' just had an option not to expand keywords.
    2. Okay, so that makes sense. Now today, we process the keywords once in get_file. I see that you've changed this function so that things could be split up such that normalize_patch could also figure out the file and ask for keywords. I don't think I love having to change this, but we'll hold off on that for now.
      
      The normalize_patch function, as far as I can tell, is mostly redundant. That type of work *should* be done during diff parsing, so that we store the proper patch file in the database. This saves on additional fetches.
      
      I think the reason you made normalize_patch is because today, the diff parsing code doesn't check file contents, and it doesn't interact with the SVN repository. Is that correct? If so, instead of fetching the keywords from Subversion and using them for collapse_keywords, can we use our list of known keywords? IIRC, the reason we checked svn:keywords was for any custom keywords that were provided, but I think it'll be very rare to have this problem you've hit *with* custom keywords, so I'm okay for now not supporting that. It will, I believe, greatly simplify the code and not have the performance hit we're going to have with additional keyword fetches.
      
      Thoughts?
    3. > The normalize_patch function, as far as I can tell, is mostly redundant.
      > That type of work *should* be done during diff parsing, so that we store
      > the proper patch file in the database. This saves on additional fetches.
      > I think the reason you made normalize_patch is because today, the diff
      > parsing code doesn't check file contents, and it doesn't interact with
      > the SVN repository. Is that correct?
      
      Yes, but that would require much more extensive changes in the RB code. If I understand you right, you suggest to apply the patch during diff parsing, then diff against the original and store the result. Is it right? If so, it would save some time on calling normalize_patch() on fetches. But to apply the patch, you still have to do the same keyword contraction in the patch that normalize_patch() does - or it wouldn't apply for the same reasons described above.
      
      > If so, instead of fetching the keywords from Subversion and using them
      > for collapse_keywords, can we use our list of known keywords? IIRC,
      > the reason we checked svn:keywords was for any custom keywords that were
      > provided, but I think it'll be very rare to have this problem you've hit
      > *with* custom keywords, so I'm okay for now not supporting that.
      
      Not exactly. Actually, custom keywords are not yet supported by Subversion, they are only available in the FreeBSD-modified port of SVN (IIRC). The reason for fetching svn:keywords is that Subversion only expands those keywords that are present in svn:keywords. In other words, if a file contains $Id: foo.c 1.12.3.1 1917-11-07 21:11:15Z somebody $ expanded keyword and Id is not present in svn:keywords, SVN will treat this line just as plain text - it will display diffs for the changes inside the contents of $Id$, etc. RB has to follow the same model - if RB would contract $Id$ in the above example, the output of 'svn diff' would not apply.
    4. No, I don't want to apply during diff parsing. I don't really want to fetch anything. Diff parsing should not require any access to a network or any expensive operations, ideally.
      
      Okay, that makes sense with svn:keywords. It's been a while since I've really dealt with SVN on that level, and I was misremembering how it worked.
      
      I was hoping to avoid any extra lookups at diff apply time, because they're expensive. I guess though that we're a bit out of luck in this regard. So given that, the general design is fine. I'll make some comments about the code.
  2. 
      
chipx86
  1. Had most of this written but never published. Sorry about that. :/
  2. reviewboard/scmtools/core.py (Diff revision 1)
     
     
    No need to default this to None. We'll just say that a revision is required.
    
    Can you add a docstring covering this function? Docstrings must be in the form of:
    
    """One line summary.
    
    Extra details in paragraph form.
    """
  3. reviewboard/scmtools/svn.py (Diff revision 1)
     
     
    I know we have the double underscore in other functions, but we should really be changing them. Python will mangle the names. So instead, let's call this _do_on_path.
  4. reviewboard/scmtools/svn.py (Diff revision 1)
     
     
    Can you remove the extra space before the "=" while you're here?
  5. reviewboard/scmtools/svn.py (Diff revision 1)
     
     
    Blank line both before and after this line.
  6. reviewboard/scmtools/svn.py (Diff revision 1)
     
     
    Might as well call this get_file_keywords. I'd prefer not to abbreviate unless we have to.
  7. reviewboard/scmtools/svn.py (Diff revision 1)
     
     
     
    Blank line between these.
  8. reviewboard/scmtools/svn.py (Diff revision 1)
     
     
    Should just be "if keywords:"
    
    Blank line before this.
  9. reviewboard/scmtools/svn.py (Diff revision 1)
     
     
     
    Blank line between these.
  10. 
      
ST
ST
  1. Any updates on this review?
  2. 
      
chipx86
  1. Ship It!
  2. 
      
ST
Review request changed

Status: Closed (submitted)

Change Summary:

Pushed to release-1.6.x (0194376)
Loading...