• 
      

    Fix parsing of git diffs with quoted paths, make unicode use more consistent.

    Review Request #6837 — Created Jan. 29, 2015 and discarded

    Information

    Review Board
    release-2.0.x

    Reviewers

    In some cases, the filenames in git diffs will include quotes around the
    a/filename and b/filename entries. Our old diff parser would parse out the
    filenames successfully (but include the quotes), but when we fixed it to try to
    properly handle spaces, it broke. This is responsible for the occasional error
    message we've seen about "list index out of range" when trying to parse git
    diffs.

    This change updates our parser to handle all four cases (no quoting, a quoted,
    b quoted, and both quoted). I've added a test for the cases that weren't
    previously handled.

    While I was doing this, I noticed that we were pretty inconsistent about how
    sometimes our parsing would be operating on bytes and sometimes on unicode.
    I've fixed up the callers to make sure we always are passing in unicode-mode
    diffs to parse, which are then converted back to their original encoding after
    we've split them up per-file.

    • Ran unit tests.
    • Uploaded a diff that used quoting and saw it parse successfully.
    reviewbot
    1. Tool: Pyflakes
      Processed Files:
          reviewboard/scmtools/clearcase.py
          reviewboard/scmtools/mtn.py
          reviewboard/scmtools/git.py
          reviewboard/scmtools/hg.py
          reviewboard/scmtools/tests.py
          reviewboard/scmtools/plastic.py
          reviewboard/scmtools/cvs.py
          reviewboard/diffviewer/managers.py
          reviewboard/scmtools/svn/__init__.py
      
      
      
      Tool: PEP8 Style Checker
      Processed Files:
          reviewboard/scmtools/clearcase.py
          reviewboard/scmtools/mtn.py
          reviewboard/scmtools/git.py
          reviewboard/scmtools/hg.py
          reviewboard/scmtools/tests.py
          reviewboard/scmtools/plastic.py
          reviewboard/scmtools/cvs.py
          reviewboard/diffviewer/managers.py
          reviewboard/scmtools/svn/__init__.py
      
      
    2. 
        
    chipx86
    1. The code looks fine and all, but a heads up: We're going to need to switch all this back to bytestrings for DiffX, since encoding will be handled for each section after it's been parsed.

      I kind of think, based on how GNU diff and other tools do it, that we're converting to Unicode far too soon, and instead want to do that after we've parsed, not before. I'm not sure yet what we get out of doing it all in Unicode.

      1. I'll see what I can do.

    2. 
        
    david
    Review request changed
    Status:
    Discarded