Fix parsing of git diffs with quoted paths, make unicode use more consistent.

Review Request #6837 — Created Jan. 29, 2015 and discarded

Information

Review Board
release-2.0.x

Reviewers

In some cases, the filenames in git diffs will include quotes around the
a/filename and b/filename entries. Our old diff parser would parse out the
filenames successfully (but include the quotes), but when we fixed it to try to
properly handle spaces, it broke. This is responsible for the occasional error
message we've seen about "list index out of range" when trying to parse git
diffs.

This change updates our parser to handle all four cases (no quoting, a quoted,
b quoted, and both quoted). I've added a test for the cases that weren't
previously handled.

While I was doing this, I noticed that we were pretty inconsistent about how
sometimes our parsing would be operating on bytes and sometimes on unicode.
I've fixed up the callers to make sure we always are passing in unicode-mode
diffs to parse, which are then converted back to their original encoding after
we've split them up per-file.

  • Ran unit tests.
  • Uploaded a diff that used quoting and saw it parse successfully.
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        reviewboard/scmtools/clearcase.py
        reviewboard/scmtools/mtn.py
        reviewboard/scmtools/git.py
        reviewboard/scmtools/hg.py
        reviewboard/scmtools/tests.py
        reviewboard/scmtools/plastic.py
        reviewboard/scmtools/cvs.py
        reviewboard/diffviewer/managers.py
        reviewboard/scmtools/svn/__init__.py
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/scmtools/clearcase.py
        reviewboard/scmtools/mtn.py
        reviewboard/scmtools/git.py
        reviewboard/scmtools/hg.py
        reviewboard/scmtools/tests.py
        reviewboard/scmtools/plastic.py
        reviewboard/scmtools/cvs.py
        reviewboard/diffviewer/managers.py
        reviewboard/scmtools/svn/__init__.py
    
    
  2. 
      
chipx86
  1. The code looks fine and all, but a heads up: We're going to need to switch all this back to bytestrings for DiffX, since encoding will be handled for each section after it's been parsed.

    I kind of think, based on how GNU diff and other tools do it, that we're converting to Unicode far too soon, and instead want to do that after we've parsed, not before. I'm not sure yet what we get out of doing it all in Unicode.

    1. I'll see what I can do.

  2. 
      
david
Review request changed
Status:
Discarded