Fix some unicode-related errors in RBTools SVN parsing.

Review Request #7044 — Created March 11, 2015 and submitted

Information

RBTools
release-0.7.x
672ae8c...

Reviewers

There are a few issues that have been cropping up with the diff parsing (and
other interfacing with output from 'svn' commands) since we moved over to
unicode_literals. This change fixes three of them.

First of all, the results from 'svn info' can include non-UTF8 characters
depending on the locales that are available and the configuration of the
system. Luckily, all of the information we need from this can be parsed out in
binary mode.

Second, when doing diffs of files that contain non-UTF8 characters, we were
attempting to process the diff as a bunch of unicode objects, rather than as
bytes. This was happening because we'd format some of the lines into unicode,
and then join everything together. I've fixed things so that all the processing
happens as bytes, so that we don't need to care about the encoding, and added a
new unit test for this behavior.

Finally, Griffin Myers suggested a third test case, with a non-utf8 filename.
This has also been fixed, along with a new unit test for that case.

Testing done:
Ran unit tests.

Ran unit tests.

Description From Last Updated

Leftover from debugging?

brenniebrennie

Did you mean to leave these in?

chipx86chipx86
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        rbtools/clients/tests.py
        rbtools/clients/svn.py
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        rbtools/clients/tests.py
        rbtools/clients/svn.py
    
    
  2. 
      
brennie
  1. 
      
  2. rbtools/clients/tests.py (Diff revision 1)
     
     
     
    Show all issues

    Leftover from debugging?

  3. 
      
chipx86
  1. 
      
  2. rbtools/clients/tests.py (Diff revision 1)
     
     
     
    Show all issues

    Did you mean to leave these in?

  3. 
      
gmyers
  1. I'm way out of my realm when it comes to unicode, locales, etc. -- I don't even have the basic vocabulary down right.  But, your comment about 'svn info' returning non-UTF8 characters made me a little curious about why you didn't need to add results_unicode=False in the call to _run_svn() from svn_info().  I cooked up a test which currently fails, but I don't honestly know if it is a valid test.  It attempts to copy an existing file (foo.txt) to a new filename which contains a non-ascii character:
    
        def test_diff_non_unicode_filename(self):
            """Testing SVNClient diff with a non-utf8 filename"""
            self.options.svn_show_copies_as_adds = 'y'
    
            filename = '\xe2'
            self._run_svn(['copy', 'foo.txt', filename])
            self._run_svn(['propset', 'svn:mime-type', 'text/plain', filename])
    
            revisions = self.client.parse_revision_spec()
            result = self.client.diff(revisions)
            self.assertTrue(isinstance(result, dict))
            self.assertTrue('diff' in result)
            self.assertEqual(md5(result['diff']).hexdigest(),
                             'bfa99e54b8c23b97b1dee23d2763c4fd')
    
    
    I'll let you decide if there is any validity to this test.  My, likely naive, fix was:
    
    diff --git a/rbtools/clients/svn.py b/rbtools/clients/svn.py
    index 55c74aa..23e7d3f 100644
    --- a/rbtools/clients/svn.py
    +++ b/rbtools/clients/svn.py
    @@ -602,7 +602,7 @@ class SVNClient(SCMClient):
                             root = info["Repository Root"]
                             path = unquote(url[len(root):])
     
    -                    line = b'%s %s%s' % (front, path.encode('utf-8'), rest)
    +                    line = b'%s %s%s' % (front, path, rest)
     
                 result.append(line)
     
    @@ -622,7 +622,8 @@ class SVNClient(SCMClient):
             result = self._run_svn(["info", path],
                                    split_lines=True,
                                    ignore_errors=ignore_errors,
    -                               none_on_ignored_error=True)
    +                               none_on_ignored_error=True,
    +                               results_unicode=False)
             if result is None:
                 return None
     
    diff --git a/rbtools/utils/process.py b/rbtools/utils/process.py
    index c6c79a0..a327d93 100644
    --- a/rbtools/utils/process.py
    +++ b/rbtools/utils/process.py
    @@ -35,7 +35,7 @@ def execute(command,
                 results_unicode=True):
         """Utility function to execute a command and return the output."""
         if isinstance(command, list):
    -        logging.debug('Running: ' + subprocess.list2cmdline(command))
    +        logging.debug(b'Running: ' + subprocess.list2cmdline(command))
         else:
             logging.debug('Running: ' + command)
    1. The test that you suggest is indeed a good one, and most of your suggestions are also good. I'll incorporate this (and a little bit more tweaking) into the next revision. Thanks!

  2. 
      
david
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        rbtools/clients/tests.py
        rbtools/clients/svn.py
        rbtools/utils/process.py
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        rbtools/clients/tests.py
        rbtools/clients/svn.py
        rbtools/utils/process.py
    
    
  2. 
      
brennie
  1. Ship It!
  2. 
      
chipx86
  1. Ship It!
  2. 
      
david
Review request changed
Status:
Completed
Change Summary:
Pushed to release-0.7.x (de0e748)