Summary

Fix parsing of git diffs with quoted paths, make unicode use more consistent.

Review Request #6837 — Created Jan. 29, 2015 and discarded Feb. 9, 2015, 1:59 p.m.

Information

Owner

david

Repository

Review Board

Branch

release-2.0.x

Bugs

3630

Depends On

Reviewers

Groups

reviewboard

People

Description

In some cases, the filenames in git diffs will include quotes around the
a/filename and b/filename entries. Our old diff parser would parse out the
filenames successfully (but include the quotes), but when we fixed it to try to
properly handle spaces, it broke. This is responsible for the occasional error
message we've seen about "list index out of range" when trying to parse git
diffs.

This change updates our parser to handle all four cases (no quoting, a quoted,
b quoted, and both quoted). I've added a test for the cases that weren't
previously handled.

While I was doing this, I noticed that we were pretty inconsistent about how
sometimes our parsing would be operating on bytes and sometimes on unicode.
I've fixed up the callers to make sure we always are passing in unicode-mode
diffs to parse, which are then converted back to their original encoding after
we've split them up per-file.

Testing Done


Ran unit tests.
Uploaded a diff that used quoting and saw it parse successfully.

Tool: Pyflakes
Processed Files:
    reviewboard/scmtools/clearcase.py
    reviewboard/scmtools/mtn.py
    reviewboard/scmtools/git.py
    reviewboard/scmtools/hg.py
    reviewboard/scmtools/tests.py
    reviewboard/scmtools/plastic.py
    reviewboard/scmtools/cvs.py
    reviewboard/diffviewer/managers.py
    reviewboard/scmtools/svn/__init__.py



Tool: PEP8 Style Checker
Processed Files:
    reviewboard/scmtools/clearcase.py
    reviewboard/scmtools/mtn.py
    reviewboard/scmtools/git.py
    reviewboard/scmtools/hg.py
    reviewboard/scmtools/tests.py
    reviewboard/scmtools/plastic.py
    reviewboard/scmtools/cvs.py
    reviewboard/diffviewer/managers.py
    reviewboard/scmtools/svn/__init__.py

The code looks fine and all, but a heads up: We're going to need to switch all this back to bytestrings for DiffX, since encoding will be handled for each section after it's been parsed.

I kind of think, based on how GNU diff and other tools do it, that we're converting to Unicode far too soon, and instead want to do that after we've parsed, not before. I'm not sure yet what we get out of doing it all in Unicode.

david Feb. 4, 2015, 2:40 p.m.
```
I'll see what I can do.
```

Review Board 7.0 alpha 0 (dev)

Fix parsing of git diffs with quoted paths, make unicode use more consistent.

Information

Reviewers

Status: Discarded