Handle Mercurial diffs as byte strings

Review Request #7184 — Created April 9, 2015 and submitted

Information

RBTools
master
bc8978d...

Reviewers

Mercurial diffs were previously treated as UTF-8 strings, which may fail if the file has another encoding. It may cause utils.process.execute to raise a UnicodeDecodeError, since it decoded with only UTF-8.

I have changed clients.mercurial to supply "results_unicode=False" when running the diff command. This seems to be in line with what is done in clients.git and clients.svn.

  • Used rbt post to upload a diff that converts a file (with non-ASCII characters) from UTF-8 to Latin-1, and vice versa, with success.
  • Used rbt patch to apply the diffs
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        rbtools/utils/process.py
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        rbtools/utils/process.py
    
    
  2. 
      
HA
HA
chipx86
  1. Thanks for the change!

    Something I want to bring up for discussion to a wider audience is whether we should even be interpreting any of this as utf-8. I believe we should be expecting any diff output to be in any encoding and just read it as bytestrings, uploading to the server as-is.

    I'm sure there are things we need to treat as utf-8, but diffs should probably not be one of those things.

    1. I absolutely agree. Preserving the diff as a byte string is probably the best way to ensure that it is "correct", in the sense that you preserve all information in it, regardless of what encoding your file may have. But I imagine this may require quite a few changes both in RBTools and ReviewBoard?

    2. Review Board is doing the correct thing now with handling these as byte strings. RBTools is, iirc, generating byte strings when sending requests to the server. I suspect it should be fine to move back to bytestrings for diff generation. There may be some corner cases we'll have to solve as a result of that (particularly with any diff parsing/generation code that lives in RBTools), but it won't be any worse than today.

    3. As you may have seen, I found a better and much simpler solution: to supply results_unicode=False to utils.process.execute. This seems to be in line with what is done for e.g. git and SVN.

  2. 
      
HA
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        rbtools/clients/mercurial.py
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        rbtools/clients/mercurial.py
    
    
  2. 
      
david
  1. Ship It!
  2. 
      
HA
Review request changed
Status:
Completed
Change Summary:
Pushed to release-0.7.x (2aedcf9)