• 
      

    Require byte strings for diff chunk generation and use Unicode for differs.

    Review Request #10500 — Created April 2, 2019 and submitted

    Information

    Review Board
    release-4.0.x

    Reviewers

    The process of generating diffs requires different types of strings at
    different stages. The differ itself can technically work with either
    byte or Unicode strings and doesn't really care, but when bringing the
    "interesting lines" regexes into the process, the string types suddenly
    matter. Our code expects the strings to be normalized to Unicode at this
    stage, so that there's a consistent format to diff (without worrying
    about mismatched encodings). However, we were passing byte strings in
    some unit tests, which wasn't consistent with normal usage and caused
    problems on Python 3. Those have been fixed to be Unicode.

    The diff chunk generator, on the other hand, expects byte strings. It
    takes these, normalizes them, converts to Unicode, and then hands them
    off to the differ. To ensure it's getting the format it requires, it now
    checks the types coming in during construction so that there's no
    accidental cases of Unicode strings coming in.

    Unit tests pass on Python 2.7 and 3.7 (with other in-progress changes).

    Tested viewing a handful of diffs with Emoji and other non-ASCII content,
    with and without a primed cache.

    Summary ID
    Require byte strings for diff chunk generation and use Unicode for differs.
    The process of generating diffs requires different types of strings at different stages. The differ itself can technically work with either byte or Unicode strings and doesn't really care, but when bringing the "interesting lines" regexes into the process, the string types suddenly matter. Our code expects the strings to be normalized to Unicode at this stage, so that there's a consistent format to diff (without worrying about mismatched encodings). However, we were passing byte strings in some unit tests, which wasn't consistent with normal usage and caused problems on Python 3. Those have been fixed to be Unicode. The diff chunk generator, on the other hand, expects byte strings. It takes these, normalizes them, converts to Unicode, and then hands them off to the differ. To ensure it's getting the format it requires, it now checks the types coming in during construction so that there's no accidental cases of Unicode strings coming in.
    0834cdf1414f3ee2b72d0dc9288acabf3f720c38
    Description From Last Updated

    E501 line too long (80 > 79 characters)

    reviewbotreviewbot

    I'd say "GNU patch" and "GNU diff"

    daviddavid

    understand -> understands

    daviddavid

    Patch -> patch

    daviddavid
    Checks run (1 failed, 1 succeeded)
    flake8 failed.
    JSHint passed.

    flake8

    chipx86
    david
    1. 
        
    2. reviewboard/diffviewer/diffutils.py (Diff revision 2)
       
       
       
      Show all issues

      I'd say "GNU patch" and "GNU diff"

    3. reviewboard/diffviewer/diffutils.py (Diff revision 2)
       
       
      Show all issues

      understand -> understands

    4. reviewboard/diffviewer/diffutils.py (Diff revision 2)
       
       
      Show all issues

      Patch -> patch

    5. 
        
    chipx86
    david
    1. Ship It!
    2. 
        
    chipx86
    Review request changed
    Status:
    Completed
    Change Summary:
    Pushed to release-4.0.x (b9a2b12)