• 
      

    Handle Mercurial diffs as byte strings

    Review Request #7184 — Created April 9, 2015 and submitted

    Information

    RBTools
    master
    bc8978d...

    Reviewers

    Mercurial diffs were previously treated as UTF-8 strings, which may fail if the file has another encoding. It may cause utils.process.execute to raise a UnicodeDecodeError, since it decoded with only UTF-8.

    I have changed clients.mercurial to supply "results_unicode=False" when running the diff command. This seems to be in line with what is done in clients.git and clients.svn.

    • Used rbt post to upload a diff that converts a file (with non-ASCII characters) from UTF-8 to Latin-1, and vice versa, with success.
    • Used rbt patch to apply the diffs
    reviewbot
    1. Tool: Pyflakes
      Processed Files:
          rbtools/utils/process.py
      
      
      
      Tool: PEP8 Style Checker
      Processed Files:
          rbtools/utils/process.py
      
      
    2. 
        
    HA
    HA
    chipx86
    1. Thanks for the change!

      Something I want to bring up for discussion to a wider audience is whether we should even be interpreting any of this as utf-8. I believe we should be expecting any diff output to be in any encoding and just read it as bytestrings, uploading to the server as-is.

      I'm sure there are things we need to treat as utf-8, but diffs should probably not be one of those things.

      1. I absolutely agree. Preserving the diff as a byte string is probably the best way to ensure that it is "correct", in the sense that you preserve all information in it, regardless of what encoding your file may have. But I imagine this may require quite a few changes both in RBTools and ReviewBoard?

      2. Review Board is doing the correct thing now with handling these as byte strings. RBTools is, iirc, generating byte strings when sending requests to the server. I suspect it should be fine to move back to bytestrings for diff generation. There may be some corner cases we'll have to solve as a result of that (particularly with any diff parsing/generation code that lives in RBTools), but it won't be any worse than today.

      3. As you may have seen, I found a better and much simpler solution: to supply results_unicode=False to utils.process.execute. This seems to be in line with what is done for e.g. git and SVN.

    2. 
        
    HA
    reviewbot
    1. Tool: Pyflakes
      Processed Files:
          rbtools/clients/mercurial.py
      
      
      
      Tool: PEP8 Style Checker
      Processed Files:
          rbtools/clients/mercurial.py
      
      
    2. 
        
    david
    1. Ship It!
    2. 
        
    HA
    Review request changed
    Status:
    Completed
    Change Summary:
    Pushed to release-0.7.x (2aedcf9)