Require byte strings for diff chunk generation and use Unicode for differs.

Review Request #10500 — Created April 2, 2019 and updated

chipx86
Review Board
release-4.0.x
reviewboard

The process of generating diffs requires different types of strings at
different stages. The differ itself can technically work with either
byte or Unicode strings and doesn't really care, but when bringing the
"interesting lines" regexes into the process, the string types suddenly
matter. Our code expects the strings to be normalized to Unicode at this
stage, so that there's a consistent format to diff (without worrying
about mismatched encodings). However, we were passing byte strings in
some unit tests, which wasn't consistent with normal usage and caused
problems on Python 3. Those have been fixed to be Unicode.

The diff chunk generator, on the other hand, expects byte strings. It
takes these, normalizes them, converts to Unicode, and then hands them
off to the differ. To ensure it's getting the format it requires, it now
checks the types coming in during construction so that there's no
accidental cases of Unicode strings coming in.

Unit tests pass on Python 2.7 and 3.7 (with other in-progress changes).

Tested viewing a handful of diffs with Emoji and other non-ASCII content,
with and without a primed cache.

Summary
Require byte strings for diff chunk generation and use Unicode for differs.
Description From Last Updated

E501 line too long (80 > 79 characters)

reviewbotreviewbot

I'd say "GNU patch" and "GNU diff"

daviddavid

understand -> understands

daviddavid

Patch -> patch

daviddavid
Checks run (1 failed, 1 succeeded)
flake8 failed.
JSHint passed.

flake8

chipx86
david
  1. 
      
  2. reviewboard/diffviewer/diffutils.py (Diff revision 2)
     
     
     

    I'd say "GNU patch" and "GNU diff"

  3. reviewboard/diffviewer/diffutils.py (Diff revision 2)
     
     

    understand -> understands

  4. reviewboard/diffviewer/diffutils.py (Diff revision 2)
     
     

    Patch -> patch

  5. 
      
chipx86
Review request changed

Change Summary:

Fixed up some docstring issues.

Commits:

Summary
-
Require byte strings for diff chunk generation and use Unicode for differs.
+
Require byte strings for diff chunk generation and use Unicode for differs.

Diff:

Revision 3 (+884 -628)

Show changes

Checks run (2 succeeded)

flake8 passed.
JSHint passed.
david
  1. Ship It!
  2. 
      
Loading...