Summary

Fix some unicode-related errors in RBTools SVN parsing.

Review Request #7044 — Created March 11, 2015 and submitted March 12, 2015, 2:13 p.m.

Information

Owner

david

Repository

RBTools

Branch

release-0.7.x

Bugs

Depends On

Commit

672ae8c...

Reviewers

Groups

rbtools

People

Description

There are a few issues that have been cropping up with the diff parsing (and
other interfacing with output from 'svn' commands) since we moved over to
unicode_literals. This change fixes three of them.

First of all, the results from 'svn info' can include non-UTF8 characters
depending on the locales that are available and the configuration of the
system. Luckily, all of the information we need from this can be parsed out in
binary mode.

Second, when doing diffs of files that contain non-UTF8 characters, we were
attempting to process the diff as a bunch of unicode objects, rather than as
bytes. This was happening because we'd format some of the lines into unicode,
and then join everything together. I've fixed things so that all the processing
happens as bytes, so that we don't need to care about the encoding, and added a
new unit test for this behavior.

Finally, Griffin Myers suggested a third test case, with a non-utf8 filename.
This has also been fixed, along with a new unit test for that case.

Testing done:
Ran unit tests.

Testing Done

Ran unit tests.

Issues

Description	From	Last Updated
Leftover from debugging?	brennie	March 11, 2015, 10:33 p.m.
Did you mean to leave these in?	chipx86	March 11, 2015, 10:33 p.m.

Tool: Pyflakes
Processed Files:
    rbtools/clients/tests.py
    rbtools/clients/svn.py



Tool: PEP8 Style Checker
Processed Files:
    rbtools/clients/tests.py
    rbtools/clients/svn.py

rbtools/clients/tests.py (Diff revision 1)
The issue has been resolved. Show all issues
```
Leftover from debugging?
```

Ship it!

rbtools/clients/tests.py (Diff revision 1)
The issue has been resolved. Show all issues
```
Did you mean to leave these in?
```

I'm way out of my realm when it comes to unicode, locales, etc. -- I don't even have the basic vocabulary down right.  But, your comment about 'svn info' returning non-UTF8 characters made me a little curious about why you didn't need to add results_unicode=False in the call to _run_svn() from svn_info().  I cooked up a test which currently fails, but I don't honestly know if it is a valid test.  It attempts to copy an existing file (foo.txt) to a new filename which contains a non-ascii character:

    def test_diff_non_unicode_filename(self):
        """Testing SVNClient diff with a non-utf8 filename"""
        self.options.svn_show_copies_as_adds = 'y'

        filename = '\xe2'
        self._run_svn(['copy', 'foo.txt', filename])
        self._run_svn(['propset', 'svn:mime-type', 'text/plain', filename])

        revisions = self.client.parse_revision_spec()
        result = self.client.diff(revisions)
        self.assertTrue(isinstance(result, dict))
        self.assertTrue('diff' in result)
        self.assertEqual(md5(result['diff']).hexdigest(),
                         'bfa99e54b8c23b97b1dee23d2763c4fd')


I'll let you decide if there is any validity to this test.  My, likely naive, fix was:

diff --git a/rbtools/clients/svn.py b/rbtools/clients/svn.py
index 55c74aa..23e7d3f 100644
--- a/rbtools/clients/svn.py
+++ b/rbtools/clients/svn.py
@@ -602,7 +602,7 @@ class SVNClient(SCMClient):
                         root = info["Repository Root"]
                         path = unquote(url[len(root):])
 
-                    line = b'%s %s%s' % (front, path.encode('utf-8'), rest)
+                    line = b'%s %s%s' % (front, path, rest)
 
             result.append(line)
 
@@ -622,7 +622,8 @@ class SVNClient(SCMClient):
         result = self._run_svn(["info", path],
                                split_lines=True,
                                ignore_errors=ignore_errors,
-                               none_on_ignored_error=True)
+                               none_on_ignored_error=True,
+                               results_unicode=False)
         if result is None:
             return None
 
diff --git a/rbtools/utils/process.py b/rbtools/utils/process.py
index c6c79a0..a327d93 100644
--- a/rbtools/utils/process.py
+++ b/rbtools/utils/process.py
@@ -35,7 +35,7 @@ def execute(command,
             results_unicode=True):
     """Utility function to execute a command and return the output."""
     if isinstance(command, list):
-        logging.debug('Running: ' + subprocess.list2cmdline(command))
+        logging.debug(b'Running: ' + subprocess.list2cmdline(command))
     else:
         logging.debug('Running: ' + command)

david March 11, 2015, 10:56 p.m.

The test that you suggest is indeed a good one, and most of your suggestions are also good. I'll incorporate this (and a little bit more tweaking) into the next revision. Thanks!

Description:

		There are a few issues that have been cropping up with the diff parsing (and
		other interfacing with output from 'svn' commands) since we moved over to
~		unicode_literals. This change fixes two of them, one of which was posted to the
	~	unicode_literals. This change fixes three of them.
-		mailing list, and the other to a bug.

		First of all, the results from 'svn info' can include non-UTF8 characters
		depending on the locales that are available and the configuration of the
		system. Luckily, all of the information we need from this can be parsed out in
		binary mode.

		Second, when doing diffs of files that contain non-UTF8 characters, we were
		attempting to process the diff as a bunch of unicode objects, rather than as
		bytes. This was happening because we'd format some of the lines into unicode,
		and then join everything together. I've fixed things so that all the processing
		happens as bytes, so that we don't need to care about the encoding, and added a
		new unit test for this behavior.
	+
	+	Finally, Griffin Myers suggested a third test case, with a non-utf8 filename.
	+	This has also been fixed, along with a new unit test for that case.
	+
	+	Testing done:
	+	Ran unit tests.

Commit:

-	2ee2646f0c7181d9c8f0c12a22e04a2db994a6da
+	672ae8c39519efdf829cf3d792614f19b94b5d09

Diff:

Revision 2 (+93 -55)

Show changes

	rbtools/clients/svn.py
	rbtools/clients/tests.py
	rbtools/utils/process.py

Tool: Pyflakes
Processed Files:
    rbtools/clients/tests.py
    rbtools/clients/svn.py
    rbtools/utils/process.py



Tool: PEP8 Style Checker
Processed Files:
    rbtools/clients/tests.py
    rbtools/clients/svn.py
    rbtools/utils/process.py

Ship it!

```
Ship It!
```

Ship it!

```
Ship It!
```

Status: Closed (submitted)

Change Summary:

Pushed to release-0.7.x (de0e748)

rbtools/clients/tests.py (Diff revision 1)

rbtools/clients/tests.py (Diff revision 1)