Fix Git diff parsing to make filename lookup more reliable.

Review Request #6911 — Created Feb. 5, 2015 and submitted — Latest diff uploaded

Information

Review Board
release-2.0.x
9f49eb1...

Reviewers

Our Git diff parsing was trying to get the original and modified
filenames the hard way by parsing the "diff --git" line. This is
sometimes necessarily, but oftentimes not.

For a standard change, the "---" and "+++" lines are more reliable, and
easier to parse. Unlike most diff formats, these lines don't contain
extra information after the filenames, meaning we don't have to worry
about filenames with spaces. However, these lines only show up when
there are changes to a file's contents.

For diffs without those lines, we often have other lines to rely upon.
For instance, "rename from/to" and "copy from/to". In these cases, we
can easily parse out the filename.

We do have to fall back to the "diff --git" line otherwise. Since this
line isn't well-structured, we can only parse it if one of the following
conditions are true:

1) There are "a/" and "b/" prefixes before the filenames. (Assuming
   files aren't modified in "a/" or "b/" directories.)

2) The filenames are quoted.

3) The filenames do not contain any spaces.

4) The filenames contain spaces, but the name hasn't changed, and is
   just simply repeated.

If none of these conditions apply, we raise a DiffParserError with
helpful information on how to generate a parseable diff.

This should clear up a lot of the parsing problems that people may
encounter.

Added some new unit tests to test these behaviors. The ones that I
expected to fail before the fixes did indeed fail.

Now, all unit tests pass.

    Loading...