Fix unicode-related errors when parsing XML outputs from SVN
Review Request #7224 — Created April 17, 2015 and submitted
There are two instances where svn is invoked with the
--xml
option to produce an XML document:
1. 'svn log' is called to extract a specific revision number
2. 'svn status' is called when attempting to detect a changelist.For either of these cases, if a non-UTF8 character is present in the resultant XML document then an error ensues. This could be due to a non-UTF8 character in a commit message, or in a filename in a changelist. Case 1 was reported as issue 3844.
The fix here is simple, and that is to add
results_unicode=False
for both underlying calls to svn.I've also added unit tests to exercise both of these cases. Both tests failed prior to the fix. To support case 1 it was necessary to introduce a non-UTF8 character in to the svn log. The easiest approach here was to update an existing commit message in the testdata repo to contain such a character, and the message for commit r2 has been updated to support this.
While testing I noticed that the error message when case 1 fails was quite misleading. For instance 'rbt diff 2' (where r2 has a non-UTF8 character in the commit log message) would produce:
CRITICAL: "2" does not appear to be a valid revision or changelist name
This is confusing because 2 is in fact a valid revision, and it does not give any indication that the error is actually due to an XML parsing failure because of a non-UTF8 character. The exception raised by the ElementTree XML parser is of type
UnicodeEncodeError
, which is derived fromValueError
. But, in_convert_symbolic_revision()
, where this exception is produced, aValueError
already has a specific purpose of indicating any failure to determine the revision number. Now, aValueError
from ElementTree is explicitly caught and a generic SCMError, with a more informative error message, is raised instead.
Ran unit tests. Confirmed that new unit tests fail prior to the fix.
Description | From | Last Updated |
---|---|---|
Is this still going to end up with CRITICAL: foo does not appear to be a valid revision" ? |
brennie |
-
Tool: Pyflakes Processed Files: rbtools/clients/tests.py rbtools/clients/svn.py Ignored Files: rbtools/clients/testdata/svn-repo/db/revprops/0/2
- Change Summary:
-
Implement Barret's suggestion to raise an SCMError when errors parsing the SVN log are encountered.
- Description:
-
There are two instances where svn is invoked with the
--xml
option to produce an XML document:1. 'svn log' is called to extract a specific revision number 2. 'svn status' is called when attempting to detect a changelist. For either of these cases, if a non-UTF8 character is present in the resultant XML document then an error ensues. This could be due to a non-UTF8 character in a commit message, or in a filename in a changelist. Case 1 was reported as issue 3844.
The fix here is simple, and that is to add
results_unicode=False
for both underlying calls to svn.I've also added unit tests to exercise both of these cases. Both tests failed prior to the fix. To support case 1 it was necessary to introduce a non-UTF8 character in to the svn log. The easiest approach here was to update an existing commit message in the testdata repo to contain such a character, and the message for commit r2 has been updated to support this.
While testing I noticed that the error message when case 1 fails was quite misleading. For instance 'rbt diff 2' (where r2 has a non-UTF8 character in the commit log message) would produce:
CRITICAL: "2" does not appear to be a valid revision or changelist name
~ This is confusing because 2 is in fact a valid revision, and it does not give any indication that the error is actually due to an XML parsing failure because of a non-UTF8 character. The exception raised by the ElementTree XML parser is of type
UnicodeEncodeError
, which is derived fromValueError
. But, in_convert_symbolic_revision()
, where this exception is produced, aValueError
already has a specific purpose of indicating any failure to determine the revision number. Now, aValueError
from ElementTree is explicitly caught and a more informative message is logged.~ This is confusing because 2 is in fact a valid revision, and it does not give any indication that the error is actually due to an XML parsing failure because of a non-UTF8 character. The exception raised by the ElementTree XML parser is of type
UnicodeEncodeError
, which is derived fromValueError
. But, in_convert_symbolic_revision()
, where this exception is produced, aValueError
already has a specific purpose of indicating any failure to determine the revision number. Now, aValueError
from ElementTree is explicitly caught and a generic SCMError, with a more informative error message, is raised instead. - Commit:
-
9d82ed28d4de052378312f8bd20298bb5fc96132ca72d8c78a3c47610ae2d968147db4a6da6d49ee
-
Tool: Pyflakes Processed Files: rbtools/clients/tests.py rbtools/clients/svn.py rbtools/clients/errors.py Ignored Files: rbtools/clients/testdata/svn-repo/db/revprops/0/2 Tool: PEP8 Style Checker Processed Files: rbtools/clients/tests.py rbtools/clients/svn.py rbtools/clients/errors.py Ignored Files: rbtools/clients/testdata/svn-repo/db/revprops/0/2