Handle Mercurial diffs as byte strings
Review Request #7184 — Created April 9, 2015 and submitted
Mercurial diffs were previously treated as UTF-8 strings, which may fail if the file has another encoding. It may cause utils.process.execute to raise a UnicodeDecodeError, since it decoded with only UTF-8.
I have changed clients.mercurial to supply "results_unicode=False" when running the diff command. This seems to be in line with what is done in clients.git and clients.svn.
- Used rbt post to upload a diff that converts a file (with non-ASCII characters) from UTF-8 to Latin-1, and vice versa, with success.
- Used rbt patch to apply the diffs
- Summary:
-
Handle diffs that contain Latin-1 characters.[WIP] Handle diffs that contain Latin-1 characters.
- Testing Done:
-
I've used rbt post to upload a diff that contains both UTF-8 and Latin-1 characters, with success.
+ + After testing it some more it seems that Reviewboard may have trouble interpreting the diff.
- Summary:
-
[WIP] Handle diffs that contain Latin-1 characters.Handle diffs that contain Latin-1 characters.
- Testing Done:
-
I've used rbt post to upload a diff that contains both UTF-8 and Latin-1 characters, with success.
- - After testing it some more it seems that Reviewboard may have trouble interpreting the diff.
-
Thanks for the change!
Something I want to bring up for discussion to a wider audience is whether we should even be interpreting any of this as utf-8. I believe we should be expecting any diff output to be in any encoding and just read it as bytestrings, uploading to the server as-is.
I'm sure there are things we need to treat as utf-8, but diffs should probably not be one of those things.
- Change Summary:
-
A much simpler and better solution: Just supply results_unicode=False to utils.process.execute.
- Summary:
-
Handle diffs that contain Latin-1 characters.Handle Mercurial diffs as byte strings
- Description:
-
~ If "hg diff" and "git diff" process a file that is (or was) encoded in Latin-1 (and contains non-ASCII characters), their output would cause rbtools.utils.process.execute to raise a UnicodeDecodeError, since it decoded with only UTF-8.
~ Mercurial diffs were previously treated as UTF-8 strings, which may fail if the file has another encoding. It may cause utils.process.execute to raise a UnicodeDecodeError, since it decoded with only UTF-8.
~ I have added handling of output from commands that contain both UTF-8 and/or Latin-1 characters, by decoding the output line by line, trying both UTF-8 and Latin-1. This allows handling of diffs for Latin-1 files, and files that change encoding to/from UTF-8 from/to Latin-1.
~ I have changed clients.mercurial to supply "results_unicode=False" when running the diff command. This seems to be in line with what is done in clients.git and clients.svn.
- More encodings could possibly considered if desired, but I've chosen Latin-1 since it seems most common (apart from UTF-8). - Testing Done:
-
~ I've used rbt post to upload a diff that contains both UTF-8 and Latin-1 characters, with success.
~ - Used rbt post to upload a diff that converts a file (with non-ASCII characters) from UTF-8 to Latin-1, and vice versa, with success.
+ - Used rbt patch to apply the diffs
- Commit:
-
9714db260185f4d5359527d0182b44dbb5a10b2abc8978d648aae43f1d43f7c54b5b83a7cb27a534
- Diff:
-
Revision 2 (+2 -2)