This change has been marked as completed.

Describe the completed change (optional):

Pushed to release-0.7.x (ec1a47d)

Summary

Fix unicode-related errors when parsing XML outputs from SVN

Review Request #7224 — Created April 17, 2015 and submitted 10 years, 2 months ago

Information

Owner

gmyers*

Repository

RBTools

Branch

master

Bugs

3844

Depends On

Commit

ca72d8c...

Reviewers

Groups

rbtools

People

Description*

There are two instances where svn is invoked with the --xml option to produce an XML document:
1. 'svn log' is called to extract a specific revision number
2. 'svn status' is called when attempting to detect a changelist.

For either of these cases, if a non-UTF8 character is present in the resultant XML document then an error ensues. This could be due to a non-UTF8 character in a commit message, or in a filename in a changelist. Case 1 was reported as issue 3844.

The fix here is simple, and that is to add results_unicode=False for both underlying calls to svn.

I've also added unit tests to exercise both of these cases. Both tests failed prior to the fix. To support case 1 it was necessary to introduce a non-UTF8 character in to the svn log. The easiest approach here was to update an existing commit message in the testdata repo to contain such a character, and the message for commit r2 has been updated to support this.

While testing I noticed that the error message when case 1 fails was quite misleading. For instance 'rbt diff 2' (where r2 has a non-UTF8 character in the commit log message) would produce:

CRITICAL: "2" does not appear to be a valid revision or changelist name

This is confusing because 2 is in fact a valid revision, and it does not give any indication that the error is actually due to an XML parsing failure because of a non-UTF8 character. The exception raised by the ElementTree XML parser is of type UnicodeEncodeError, which is derived from ValueError. But, in _convert_symbolic_revision(), where this exception is produced, a ValueError already has a specific purpose of indicating any failure to determine the revision number. Now, a ValueError from ElementTree is explicitly caught and a generic SCMError, with a more informative error message, is raised instead.

Testing Done

Ran unit tests.  Confirmed that new unit tests fail prior to the fix.

Issues

Description	From	Last Updated
Is this still going to end up with CRITICAL: foo does not appear to be a valid revision" ?	brennie	10 years, 2 months ago
There are no open issues

Tool: PEP8 Style Checker
Processed Files:
    rbtools/clients/tests.py
    rbtools/clients/svn.py

Ignored Files:
    rbtools/clients/testdata/svn-repo/db/revprops/0/2

Tool: Pyflakes
Processed Files:
    rbtools/clients/tests.py
    rbtools/clients/svn.py

Ignored Files:
    rbtools/clients/testdata/svn-repo/db/revprops/0/2

rbtools/clients/svn.py (Diff revision 1)

The issue has been resolved. Show all issues

Is this still going to end up with
CRITICAL: foo does not appear to be a valid revision" ?

gmyers 10 years, 2 months ago

So now it will be:

ERROR: Failed to parse svn log - 'ascii' codec can't encode character u'\xe9' in position 184: ordinal not in range(128).
CRITICAL: "2" does not appear to be a valid revision or changelist name

Perhaps it would be more preferable to explicitly raise some other exception here, but I wasn't sure and that's why I went with the logging approach.

brennie 10 years, 2 months ago

I think a different error should be raised, but I'm not exactly sure what. https://reviews.reviewboard.org/r/7153/ adds a generic SCMError, but that hasn't landed yet. You could add that error yourself to errors.py and then use that?

Change Summary:

Implement Barret's suggestion to raise an SCMError when errors parsing the SVN log are encountered.

Description:

		There are two instances where svn is invoked with the `--xml` option to produce an XML document:
		1. 'svn log' is called to extract a specific revision number
		2. 'svn status' is called when attempting to detect a changelist.

		For either of these cases, if a non-UTF8 character is present in the resultant XML document then an error ensues. This could be due to a non-UTF8 character in a commit message, or in a filename in a changelist. Case 1 was reported as issue 3844.

		The fix here is simple, and that is to add `results_unicode=False` for both underlying calls to svn.

		I've also added unit tests to exercise both of these cases. Both tests failed prior to the fix. To support case 1 it was necessary to introduce a non-UTF8 character in to the svn log. The easiest approach here was to update an existing commit message in the testdata repo to contain such a character, and the message for commit r2 has been updated to support this.

		While testing I noticed that the error message when case 1 fails was quite misleading. For instance 'rbt diff 2' (where r2 has a non-UTF8 character in the commit log message) would produce:

		`CRITICAL: "2" does not appear to be a valid revision or changelist name`

~		This is confusing because 2 is in fact a valid revision, and it does not give any indication that the error is actually due to an XML parsing failure because of a non-UTF8 character. The exception raised by the ElementTree XML parser is of type `UnicodeEncodeError`, which is derived from `ValueError`. But, in `_convert_symbolic_revision()`, where this exception is produced, a `ValueError` already has a specific purpose of indicating any failure to determine the revision number. Now, a `ValueError` from ElementTree is explicitly caught and a more informative message is logged.
	~	This is confusing because 2 is in fact a valid revision, and it does not give any indication that the error is actually due to an XML parsing failure because of a non-UTF8 character. The exception raised by the ElementTree XML parser is of type `UnicodeEncodeError`, which is derived from `ValueError`. But, in `_convert_symbolic_revision()`, where this exception is produced, a `ValueError` already has a specific purpose of indicating any failure to determine the revision number. Now, a `ValueError` from ElementTree is explicitly caught and a generic SCMError, with a more informative error message, is raised instead.

Commit:

9d82ed28d4de052378312f8bd20298bb5fc96132

ca72d8c78a3c47610ae2d968147db4a6da6d49ee

Diff:

Revision 2 (+43 -5)

Show changes

	rbtools/clients/errors.py
	rbtools/clients/svn.py
	rbtools/clients/tests.py
	rbtools/clients/testdata/svn-repo/db/revprops/0/2

Tool: Pyflakes
Processed Files:
    rbtools/clients/tests.py
    rbtools/clients/svn.py
    rbtools/clients/errors.py

Ignored Files:
    rbtools/clients/testdata/svn-repo/db/revprops/0/2



Tool: PEP8 Style Checker
Processed Files:
    rbtools/clients/tests.py
    rbtools/clients/svn.py
    rbtools/clients/errors.py

Ignored Files:
    rbtools/clients/testdata/svn-repo/db/revprops/0/2

Ship it!

```
Ship It!
```

Status:: Completed
Change Summary:: Pushed to release-0.7.x (ec1a47d)