Fix unicode-related errors when parsing XML outputs from SVN
Review Request #7224 — Created April 17, 2015 and submitted — Latest diff uploaded
There are two instances where svn is invoked with the
--xml
option to produce an XML document:
1. 'svn log' is called to extract a specific revision number
2. 'svn status' is called when attempting to detect a changelist.For either of these cases, if a non-UTF8 character is present in the resultant XML document then an error ensues. This could be due to a non-UTF8 character in a commit message, or in a filename in a changelist. Case 1 was reported as issue 3844.
The fix here is simple, and that is to add
results_unicode=False
for both underlying calls to svn.I've also added unit tests to exercise both of these cases. Both tests failed prior to the fix. To support case 1 it was necessary to introduce a non-UTF8 character in to the svn log. The easiest approach here was to update an existing commit message in the testdata repo to contain such a character, and the message for commit r2 has been updated to support this.
While testing I noticed that the error message when case 1 fails was quite misleading. For instance 'rbt diff 2' (where r2 has a non-UTF8 character in the commit log message) would produce:
CRITICAL: "2" does not appear to be a valid revision or changelist name
This is confusing because 2 is in fact a valid revision, and it does not give any indication that the error is actually due to an XML parsing failure because of a non-UTF8 character. The exception raised by the ElementTree XML parser is of type
UnicodeEncodeError
, which is derived fromValueError
. But, in_convert_symbolic_revision()
, where this exception is produced, aValueError
already has a specific purpose of indicating any failure to determine the revision number. Now, aValueError
from ElementTree is explicitly caught and a generic SCMError, with a more informative error message, is raised instead.
Ran unit tests. Confirmed that new unit tests fail prior to the fix.