Fix Git diff parsing to make filename lookup more reliable.
Review Request #6911 — Created Feb. 5, 2015 and submitted
Our Git diff parsing was trying to get the original and modified filenames the hard way by parsing the "diff --git" line. This is sometimes necessarily, but oftentimes not. For a standard change, the "---" and "+++" lines are more reliable, and easier to parse. Unlike most diff formats, these lines don't contain extra information after the filenames, meaning we don't have to worry about filenames with spaces. However, these lines only show up when there are changes to a file's contents. For diffs without those lines, we often have other lines to rely upon. For instance, "rename from/to" and "copy from/to". In these cases, we can easily parse out the filename. We do have to fall back to the "diff --git" line otherwise. Since this line isn't well-structured, we can only parse it if one of the following conditions are true: 1) There are "a/" and "b/" prefixes before the filenames. (Assuming files aren't modified in "a/" or "b/" directories.) 2) The filenames are quoted. 3) The filenames do not contain any spaces. 4) The filenames contain spaces, but the name hasn't changed, and is just simply repeated. If none of these conditions apply, we raise a DiffParserError with helpful information on how to generate a parseable diff. This should clear up a lot of the parsing problems that people may encounter.
Added some new unit tests to test these behaviors. The ones that I
expected to fail before the fixes did indeed fail.Now, all unit tests pass.