• 
      

    Fix output of binary diff data in python 3

    Review Request #11279 — Created Nov. 13, 2020 and submitted

    Information

    RBTools
    master

    Reviewers

    Fix output of non-UTF8 diff data in python 3

    In Python 3, "binary" and "unicode" (i.e. string) data are
    distinct types, and only UTF-8 unicode data can be printed
    to standard out. In diff.py currently, we attempt to decode
    diff data as UTF-8, and we fail if this fails.

    Diff data from many SCMs (Git as an example) is not guaranteed
    to be UTF-8 data; though this is often a safe assumption, it
    is not always the case, and Git will happily output binary
    data to the terminal if it does not detect that binary data
    as being binary data.

    As an example, Chinese language data encoded as Big5 or GB2312
    will not be detected as binary data, even though it is also not
    UTF-8 data. This means that changing a file encoded in Big5
    (in our case, Chinese localization data) makes it impossible
    to run rbt diff to visually validate your changes.

    In our use case, we also have a workflow of rbt diff to output
    to a file, process and validate that diff, and then run rbt post
    to submit the diff, so this breaks our workflow in Python 3.

    The documentation for Python 3 sys.stdin/out/err notes that
    if writing binary data to these streams is required, the stream's
    buffer object should be used, so we apply this change when printing
    the diff object.

    This patch was made against master, but the same change applies
    for both the 1.x and 2.x branches.

    How SCMs detect binary data:

    • Git - Look for nul byte in the first 8000 bytes of the buffer.
    • SVN - Ensure the first 1024 bytes is 15% printable ASCII with no nul bytes

    SImple testing:

    1. Add a Bit5-encoded file to a Git repository
    2. git --no-pager diff the repository; this should show that the diff contains binary data
    3. rbt diff the repository; this previously failed, but now succeeds
    Summary ID
    Output binary diff content as binary, rather than trying to decode it
    8542ddcfe8e96bad4637baf685fac6238d6b80b0
    Fix patch when git output contains non-UTF8 characters
    6312d90b7a4ffd7edae1c8c760da0d2c538b5de4
    Fix regression in diff command for Python 3
    9cdec9006e6e1b5e54f7286b5aa0a5d0df93a836
    Update patch --print for non-UTF-8 diff content
    3a6bcf149cf1e44b1fe2b9b970744261290d8e58
    Add newlines to Python 3 'sys.stdout.buffer.write'
    Since we're replacing print(), which adds a newline, we need to also ensure that any output we shift to buffer.write also has a newline appended.
    3a58e1651c414d7d806e8e888f4361530dc95c64
    Fix output of svn diffs in Python 3
    In Python 3, creating an SVN diff would fail due to calling a function (SVNClient.svn_info) which accepts the `str` type, but passing it a `bytes` type (from the parsed svn diff output). In my tests, this patch works fine and produces byte-for-byte identical diffs in Python 2 and Python 3, but I'm not able to test further beyond that. The code seems to imply that it needs to decode to/from the filesystem encoding type, but I'm not sure if that's 100% true, since we're relying on the svn client's output and its handling of filesystem file paths. All of the systems I have access to (Linux and Windows 10) use utf-8 for their filesystem encoding, and everything seems to run as expected there. I'll try to determine a test case for non-UTF-8 filenames to test.
    1dcdb0aca9ba40299575bdab457dfbf264bb2723
    Description From Last Updated

    Worth noting that this only allows to push binary data to ReviewBoard; there are significant bugs preventing rbtools from actually …

    danudeydanudey

    F821 undefined name 'sys'

    reviewbotreviewbot

    One difference being introduced here is that print(...) will always add a trailing newline (even if the content ends with …

    chipx86chipx86

    Can you add a blank line before the if and between else's block and the print(), like: print() if six.PY2: …

    chipx86chipx86

    E501 line too long (86 > 79 characters)

    reviewbotreviewbot

    E261 at least two spaces before inline comment

    reviewbotreviewbot

    E501 line too long (86 > 79 characters)

    reviewbotreviewbot

    E261 at least two spaces before inline comment

    reviewbotreviewbot

    E501 line too long (90 > 79 characters)

    reviewbotreviewbot

    E261 at least two spaces before inline comment

    reviewbotreviewbot

    E501 line too long (83 > 79 characters)

    reviewbotreviewbot
    danudey
    danudey
    1. 
        
      1. This is only the first step towards binary diff support; other aspects of the rbtools workflow (such as patching) do not current work with non-UTF-8 diff files.

    2. Show all issues

      Worth noting that this only allows to push binary data to ReviewBoard; there are significant bugs preventing rbtools from actually fetching and applying those patches at the moment.

    3. 
        
    danudey
    Review request changed
    Change Summary:

    Add binary output handling from git during rbt patch

    Commits:
    Summary ID
    Fix output of binary diff data in python 3
    19bec3472954aa41fb17dd2a9571a6e9b6ceca02
    Output binary diff content as binary, rather than trying to decode it
    c7b32bf6f570a9f3edfc4110e4092f313590b43f
    Fix patch when git output contains non-UTF8 characters
    c88ed549640b1270376ae5e701b3a1a763af2c6d

    Checks run (1 failed, 1 succeeded)

    flake8 failed.
    JSHint passed.

    flake8

    danudey
    danudey
    chipx86
    1. This looks great! And good timing: We're releasing RBTools 2.0 basically this weekend. I have a couple changes that need to be made to ensure we don't regress on some output, but otherwise this looks very solid.

    2. rbtools/commands/diff.py (Diff revision 4)
       
       
       
      Show all issues

      One difference being introduced here is that print(...) will always add a trailing newline (even if the content ends with one), but sys.stdout.buffer.write(...) will only add one if the content ends with one. We probably want to follow this statement up with a print().

      This applies to the other lines that move to sys.stdout.buffer as well.

    3. rbtools/commands/patch.py (Diff revision 4)
       
       
       
       
       
       
       
      Show all issues

      Can you add a blank line before the if and between else's block and the print(), like:

      print()
      
      if six.PY2:
          ...
      else:
          ...
      
      print()
      

      We separate out statements from new blocks, to help visually distinguish them.

    4. 
        
    danudey
    Review request changed
    Change Summary:

    Re-add newlines to sys.stdout.buffer.write instances.

    Commits:
    Summary ID
    Output binary diff content as binary, rather than trying to decode it
    8542ddcfe8e96bad4637baf685fac6238d6b80b0
    Fix patch when git output contains non-UTF8 characters
    6312d90b7a4ffd7edae1c8c760da0d2c538b5de4
    Fix regression in diff command for Python 3
    9cdec9006e6e1b5e54f7286b5aa0a5d0df93a836
    Update patch --print for non-UTF-8 diff content
    3a6bcf149cf1e44b1fe2b9b970744261290d8e58
    Output binary diff content as binary, rather than trying to decode it
    8542ddcfe8e96bad4637baf685fac6238d6b80b0
    Fix patch when git output contains non-UTF8 characters
    6312d90b7a4ffd7edae1c8c760da0d2c538b5de4
    Fix regression in diff command for Python 3
    9cdec9006e6e1b5e54f7286b5aa0a5d0df93a836
    Update patch --print for non-UTF-8 diff content
    3a6bcf149cf1e44b1fe2b9b970744261290d8e58
    Add newlines to Python 3 'sys.stdout.buffer.write'
    Since we're replacing print(), which adds a newline, we need to also ensure that any output we shift to buffer.write also has a newline appended.
    72abe839d0653fbf2ba037bd78472d435cee1418

    Checks run (1 failed, 1 succeeded)

    flake8 failed.
    JSHint passed.

    flake8

    danudey
    danudey
    Review request changed
    Change Summary:

    Fix up svn diff parsing/output for Python 3

    Commits:
    Summary ID
    Output binary diff content as binary, rather than trying to decode it
    8542ddcfe8e96bad4637baf685fac6238d6b80b0
    Fix patch when git output contains non-UTF8 characters
    6312d90b7a4ffd7edae1c8c760da0d2c538b5de4
    Fix regression in diff command for Python 3
    9cdec9006e6e1b5e54f7286b5aa0a5d0df93a836
    Update patch --print for non-UTF-8 diff content
    3a6bcf149cf1e44b1fe2b9b970744261290d8e58
    Add newlines to Python 3 'sys.stdout.buffer.write'
    Since we're replacing print(), which adds a newline, we need to also ensure that any output we shift to buffer.write also has a newline appended.
    3a58e1651c414d7d806e8e888f4361530dc95c64
    Output binary diff content as binary, rather than trying to decode it
    8542ddcfe8e96bad4637baf685fac6238d6b80b0
    Fix patch when git output contains non-UTF8 characters
    6312d90b7a4ffd7edae1c8c760da0d2c538b5de4
    Fix regression in diff command for Python 3
    9cdec9006e6e1b5e54f7286b5aa0a5d0df93a836
    Update patch --print for non-UTF-8 diff content
    3a6bcf149cf1e44b1fe2b9b970744261290d8e58
    Add newlines to Python 3 'sys.stdout.buffer.write'
    Since we're replacing print(), which adds a newline, we need to also ensure that any output we shift to buffer.write also has a newline appended.
    3a58e1651c414d7d806e8e888f4361530dc95c64
    Fix output of svn diffs in Python 3
    In Python 3, creating an SVN diff would fail due to calling a function (SVNClient.svn_info) which accepts the `str` type, but passing it a `bytes` type (from the parsed svn diff output). In my tests, this patch works fine and produces byte-for-byte identical diffs in Python 2 and Python 3, but I'm not able to test further beyond that. The code seems to imply that it needs to decode to/from the filesystem encoding type, but I'm not sure if that's 100% true, since we're relying on the svn client's output and its handling of filesystem file paths. All of the systems I have access to (Linux and Windows 10) use utf-8 for their filesystem encoding, and everything seems to run as expected there.
    f15e987d6031351fb2482dab891a72b5996f1252

    Checks run (1 failed, 1 succeeded)

    flake8 failed.
    JSHint passed.

    flake8

    danudey
    chipx86
    1. Ship It!
    2. 
        
    danudey
    Review request changed
    Status:
    Completed
    Change Summary:
    Pushed to release-2.0.x (0c1d630)