Patch for issue 3895: UnicodeDecodeError

Review Request #7442 — Created June 22, 2015 and discarded

Information

RBTools

Reviewers

Fix issue 3895, set encoding to prevent potential UnicodeDecodeErrors

After this patch I was able to successfully post a review that contains unicode symbol when having empty files in the patch. I haven't run the test suite (sorry don't have the time to set everything up).

Description From Last Updated

You can just remove results_unicode=False and it will have the same effect as decoding.

brenniebrennie

Col: 80 E501 line too long (95 > 79 characters)

reviewbotreviewbot

results_unicode defaults to True.

brenniebrennie

This isn't a correct fix (it will regress posting changes with non-utf8 files).

daviddavid

Instead, we should change it so that what we're appending here are bytestrings rather than unicode objects.

daviddavid

And here.

daviddavid
reviewbot
  1. Tool: PEP8 Style Checker
    Processed Files:
        rbtools/clients/mercurial.py
    
    
    
    Tool: Pyflakes
    Processed Files:
        rbtools/clients/mercurial.py
    
    
  2. rbtools/clients/mercurial.py (Diff revision 1)
     
     
    Show all issues
    Col: 80
     E501 line too long (95 > 79 characters)
    
    1. After you've fixed errors, please marked them as such. Our infrastructure doesn't allow patches to be landed with open issues.

  3. 
      
brennie
  1. Can you add the bug # to the bug field and fill in the testing done? (e.g., did you run the unit test suite?)

  2. rbtools/clients/mercurial.py (Diff revision 1)
     
     
    Show all issues

    You can just remove results_unicode=False and it will have the same effect as decoding.

  3. 
      
KA
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        rbtools/clients/mercurial.py
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        rbtools/clients/mercurial.py
    
    
  2. 
      
brennie
  1. 
      
  2. rbtools/clients/mercurial.py (Diff revision 2)
     
     
    Show all issues

    results_unicode defaults to True.

  3. 
      
david
  1. 
      
  2. rbtools/clients/mercurial.py (Diff revision 2)
     
     
    Show all issues

    This isn't a correct fix (it will regress posting changes with non-utf8 files).

  3. rbtools/clients/mercurial.py (Diff revision 2)
     
     
     
     
    Show all issues

    Instead, we should change it so that what we're appending here are bytestrings rather than unicode objects.

    1. It should be noted that mercurial.py doesnt use unicode_literals yet.

    2. Oh shucks. We should do that.

  4. rbtools/clients/mercurial.py (Diff revision 2)
     
     
     
     
    Show all issues

    And here.

    1. From my understanding in order to convert the unicode string into a byte array you need to pass in an encoding: unicode_string.decode(encoding). I'm afraid I'm not a Python expert, not sure how to do this properly without breaking non-utf8 files.

    2. At a basic level, we can just prefix each of these strings with a b (i.e. b'diff -r %s -r %s %s\n'). We also need to make sure that the things that get formatted into the string (base, tip, filename, etc.) are all bytestrings too.

    3. It seems that doesn't work. In this case the problem is the diff variable is a unicode string, where as 'diff -r ...' is not. Adding the b prefix would cause the non-unicode string to be converted to a byte array, and then the byte array is being added to a unicode string which ends up causing the same UnicodeDecodeError. diff needs to be converted to a byte array so that it can be manipulated, and then back to a unicode string before the end of _handle_empty_files.

      I tried the following as well as b'diff -r ...':
      diff += (u'diff -r %s -r %s %s\n'
      u'--- %s\t%s\n'
      u'+++ b/%s\t%s\n'
      % (base, tip, filename,
      self.PRE_CREATION, self.PRE_CREATION_DATE,
      filename, tip_date))

      In both cases I still get a UnicodeDecodeError.

    4. Ali,

      After looking at the code, I think there's probably a fair amount of refactoring that will have to happen here, so I'm going to take over the fix for this. Thanks for raising the issue!

    5. Thanks David!

  5. 
      
KA
Review request changed
Status:
Discarded