Fix parsing of git diffs with quoted paths, make unicode use more consistent.

Review Request #6837 — Created Jan. 29, 2015 and discarded — Latest diff uploaded

Information

Review Board
release-2.0.x

Reviewers

In some cases, the filenames in git diffs will include quotes around the
a/filename and b/filename entries. Our old diff parser would parse out the
filenames successfully (but include the quotes), but when we fixed it to try to
properly handle spaces, it broke. This is responsible for the occasional error
message we've seen about "list index out of range" when trying to parse git
diffs.

This change updates our parser to handle all four cases (no quoting, a quoted,
b quoted, and both quoted). I've added a test for the cases that weren't
previously handled.

While I was doing this, I noticed that we were pretty inconsistent about how
sometimes our parsing would be operating on bytes and sometimes on unicode.
I've fixed up the callers to make sure we always are passing in unicode-mode
diffs to parse, which are then converted back to their original encoding after
we've split them up per-file.

  • Ran unit tests.
  • Uploaded a diff that used quoting and saw it parse successfully.
    Loading...