Fix Git diff parsing to make filename lookup more reliable.

Review Request #6911 — Created Feb. 5, 2015 and submitted

Information

Review Board
release-2.0.x
9f49eb1...

Reviewers

Our Git diff parsing was trying to get the original and modified
filenames the hard way by parsing the "diff --git" line. This is
sometimes necessarily, but oftentimes not.

For a standard change, the "---" and "+++" lines are more reliable, and
easier to parse. Unlike most diff formats, these lines don't contain
extra information after the filenames, meaning we don't have to worry
about filenames with spaces. However, these lines only show up when
there are changes to a file's contents.

For diffs without those lines, we often have other lines to rely upon.
For instance, "rename from/to" and "copy from/to". In these cases, we
can easily parse out the filename.

We do have to fall back to the "diff --git" line otherwise. Since this
line isn't well-structured, we can only parse it if one of the following
conditions are true:

1) There are "a/" and "b/" prefixes before the filenames. (Assuming
   files aren't modified in "a/" or "b/" directories.)

2) The filenames are quoted.

3) The filenames do not contain any spaces.

4) The filenames contain spaces, but the name hasn't changed, and is
   just simply repeated.

If none of these conditions apply, we raise a DiffParserError with
helpful information on how to generate a parseable diff.

This should clear up a lot of the parsing problems that people may
encounter.

Added some new unit tests to test these behaviors. The ones that I
expected to fail before the fixes did indeed fail.

Now, all unit tests pass.

reviewbot
  1. Tool: Pyflakes
    Processed Files:
        reviewboard/scmtools/tests.py
        reviewboard/scmtools/git.py
    
    Ignored Files:
        reviewboard/scmtools/testdata/git_newfile.diff
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/scmtools/tests.py
        reviewboard/scmtools/git.py
    
    Ignored Files:
        reviewboard/scmtools/testdata/git_newfile.diff
    
    
  2. 
      
david
  1. Ship It!
  2. 
      
david
  1. Actually, can you add one more test that checks when one but not both of the filenames has quotes around it? You can probably just pull the test from my change.

    1. One of my tests has that, unless I accidentally erased it.

    2. I don't see it. You've got ones where neither has quotes, and ones where both have quotes, but none where only one has quotes.

    3. test_diff_git_line_without_a_b_and_spaces_quotes_changed

    4. Aha, I didn't read beyond the first file there. Carry on.

  2. 
      
chipx86
Review request changed
Status:
Completed
Change Summary:
Pushed to release-2.0.x (e3a3385)