• 
      

    Fix some unicode-related errors in RBTools SVN parsing.

    Review Request #7044 — Created March 11, 2015 and submitted

    Information

    RBTools
    release-0.7.x
    672ae8c...

    Reviewers

    There are a few issues that have been cropping up with the diff parsing (and
    other interfacing with output from 'svn' commands) since we moved over to
    unicode_literals. This change fixes three of them.

    First of all, the results from 'svn info' can include non-UTF8 characters
    depending on the locales that are available and the configuration of the
    system. Luckily, all of the information we need from this can be parsed out in
    binary mode.

    Second, when doing diffs of files that contain non-UTF8 characters, we were
    attempting to process the diff as a bunch of unicode objects, rather than as
    bytes. This was happening because we'd format some of the lines into unicode,
    and then join everything together. I've fixed things so that all the processing
    happens as bytes, so that we don't need to care about the encoding, and added a
    new unit test for this behavior.

    Finally, Griffin Myers suggested a third test case, with a non-utf8 filename.
    This has also been fixed, along with a new unit test for that case.

    Testing done:
    Ran unit tests.

    Ran unit tests.

    Description From Last Updated

    Leftover from debugging?

    brenniebrennie

    Did you mean to leave these in?

    chipx86chipx86
    reviewbot
    1. Tool: Pyflakes
      Processed Files:
          rbtools/clients/tests.py
          rbtools/clients/svn.py
      
      
      
      Tool: PEP8 Style Checker
      Processed Files:
          rbtools/clients/tests.py
          rbtools/clients/svn.py
      
      
    2. 
        
    brennie
    1. 
        
    2. rbtools/clients/tests.py (Diff revision 1)
       
       
       
      Show all issues

      Leftover from debugging?

    3. 
        
    chipx86
    1. 
        
    2. rbtools/clients/tests.py (Diff revision 1)
       
       
       
      Show all issues

      Did you mean to leave these in?

    3. 
        
    gmyers
    1. I'm way out of my realm when it comes to unicode, locales, etc. -- I don't even have the basic vocabulary down right.  But, your comment about 'svn info' returning non-UTF8 characters made me a little curious about why you didn't need to add results_unicode=False in the call to _run_svn() from svn_info().  I cooked up a test which currently fails, but I don't honestly know if it is a valid test.  It attempts to copy an existing file (foo.txt) to a new filename which contains a non-ascii character:
      
          def test_diff_non_unicode_filename(self):
              """Testing SVNClient diff with a non-utf8 filename"""
              self.options.svn_show_copies_as_adds = 'y'
      
              filename = '\xe2'
              self._run_svn(['copy', 'foo.txt', filename])
              self._run_svn(['propset', 'svn:mime-type', 'text/plain', filename])
      
              revisions = self.client.parse_revision_spec()
              result = self.client.diff(revisions)
              self.assertTrue(isinstance(result, dict))
              self.assertTrue('diff' in result)
              self.assertEqual(md5(result['diff']).hexdigest(),
                               'bfa99e54b8c23b97b1dee23d2763c4fd')
      
      
      I'll let you decide if there is any validity to this test.  My, likely naive, fix was:
      
      diff --git a/rbtools/clients/svn.py b/rbtools/clients/svn.py
      index 55c74aa..23e7d3f 100644
      --- a/rbtools/clients/svn.py
      +++ b/rbtools/clients/svn.py
      @@ -602,7 +602,7 @@ class SVNClient(SCMClient):
                               root = info["Repository Root"]
                               path = unquote(url[len(root):])
       
      -                    line = b'%s %s%s' % (front, path.encode('utf-8'), rest)
      +                    line = b'%s %s%s' % (front, path, rest)
       
                   result.append(line)
       
      @@ -622,7 +622,8 @@ class SVNClient(SCMClient):
               result = self._run_svn(["info", path],
                                      split_lines=True,
                                      ignore_errors=ignore_errors,
      -                               none_on_ignored_error=True)
      +                               none_on_ignored_error=True,
      +                               results_unicode=False)
               if result is None:
                   return None
       
      diff --git a/rbtools/utils/process.py b/rbtools/utils/process.py
      index c6c79a0..a327d93 100644
      --- a/rbtools/utils/process.py
      +++ b/rbtools/utils/process.py
      @@ -35,7 +35,7 @@ def execute(command,
                   results_unicode=True):
           """Utility function to execute a command and return the output."""
           if isinstance(command, list):
      -        logging.debug('Running: ' + subprocess.list2cmdline(command))
      +        logging.debug(b'Running: ' + subprocess.list2cmdline(command))
           else:
               logging.debug('Running: ' + command)
      1. The test that you suggest is indeed a good one, and most of your suggestions are also good. I'll incorporate this (and a little bit more tweaking) into the next revision. Thanks!

    2. 
        
    david
    reviewbot
    1. Tool: Pyflakes
      Processed Files:
          rbtools/clients/tests.py
          rbtools/clients/svn.py
          rbtools/utils/process.py
      
      
      
      Tool: PEP8 Style Checker
      Processed Files:
          rbtools/clients/tests.py
          rbtools/clients/svn.py
          rbtools/utils/process.py
      
      
    2. 
        
    brennie
    1. Ship It!
    2. 
        
    chipx86
    1. Ship It!
    2. 
        
    david
    Review request changed
    Status:
    Completed
    Change Summary:
    Pushed to release-0.7.x (de0e748)