• 
      

    Fix Git diff parsing to make filename lookup more reliable.

    Review Request #6911 — Created Feb. 5, 2015 and submitted

    Information

    Review Board
    release-2.0.x
    9f49eb1...

    Reviewers

    Our Git diff parsing was trying to get the original and modified
    filenames the hard way by parsing the "diff --git" line. This is
    sometimes necessarily, but oftentimes not.
    
    For a standard change, the "---" and "+++" lines are more reliable, and
    easier to parse. Unlike most diff formats, these lines don't contain
    extra information after the filenames, meaning we don't have to worry
    about filenames with spaces. However, these lines only show up when
    there are changes to a file's contents.
    
    For diffs without those lines, we often have other lines to rely upon.
    For instance, "rename from/to" and "copy from/to". In these cases, we
    can easily parse out the filename.
    
    We do have to fall back to the "diff --git" line otherwise. Since this
    line isn't well-structured, we can only parse it if one of the following
    conditions are true:
    
    1) There are "a/" and "b/" prefixes before the filenames. (Assuming
       files aren't modified in "a/" or "b/" directories.)
    
    2) The filenames are quoted.
    
    3) The filenames do not contain any spaces.
    
    4) The filenames contain spaces, but the name hasn't changed, and is
       just simply repeated.
    
    If none of these conditions apply, we raise a DiffParserError with
    helpful information on how to generate a parseable diff.
    
    This should clear up a lot of the parsing problems that people may
    encounter.

    Added some new unit tests to test these behaviors. The ones that I
    expected to fail before the fixes did indeed fail.

    Now, all unit tests pass.

    reviewbot
    1. Tool: Pyflakes
      Processed Files:
          reviewboard/scmtools/tests.py
          reviewboard/scmtools/git.py
      
      Ignored Files:
          reviewboard/scmtools/testdata/git_newfile.diff
      
      
      
      Tool: PEP8 Style Checker
      Processed Files:
          reviewboard/scmtools/tests.py
          reviewboard/scmtools/git.py
      
      Ignored Files:
          reviewboard/scmtools/testdata/git_newfile.diff
      
      
    2. 
        
    david
    1. Ship It!
    2. 
        
    david
    1. Actually, can you add one more test that checks when one but not both of the filenames has quotes around it? You can probably just pull the test from my change.

      1. One of my tests has that, unless I accidentally erased it.

      2. I don't see it. You've got ones where neither has quotes, and ones where both have quotes, but none where only one has quotes.

      3. test_diff_git_line_without_a_b_and_spaces_quotes_changed

      4. Aha, I didn't read beyond the first file there. Carry on.

    2. 
        
    chipx86
    Review request changed
    Status:
    Completed
    Change Summary:
    Pushed to release-2.0.x (e3a3385)