Always treat diffs in commits as byte strings.

Review Request #9148 — Created Aug. 23, 2017 and submitted

Information

Review Board
release-2.5.x
51ddcbd...

Reviewers

Commit.diff used to be more than happy to accept any Unicode or byte
strings thrown at it, and some code actually expected these to be
Unicode strings, attempting to unconditionally encode the contents as a
UTF-8 byte string. If a hosting service set the diff as a byte string,
and there was Unicode content within the diff, this could lead to a
crash.

Now, Commit.diff always stores a byte string, handling the encoding
from Unicode if needed. Callers can and should now always treat this as
a byte string.

This fixes a crash when posting existing commits from Bitbucket with
Unicode content.

Unit tests pass.

Manually tested that the diff content no longer gets improperly
re-encoded and crashes in the customer case.

Description From Last Updated

Can we emit a warning in this case? We really shouldn't be passing in unicode to this function.

daviddavid
david
  1. 
      
  2. reviewboard/scmtools/core.py (Diff revision 1)
     
     
    Show all issues

    Can we emit a warning in this case? We really shouldn't be passing in unicode to this function.

    1. We do in GitHub and rbgateway. I want to clean those up separately, and then I'm fine making this a warning.

  3. 
      
david
  1. Ship It!
  2. 
      
chipx86
Review request changed
Status:
Completed
Change Summary:
Pushed to release-2.5.x (23fbc2f)