- Description:
-
~ When files in a repository is encoded with a non-ASCII, non-UTF-8 encoding, a special configuration option, repository encoding is required. However even if such an option is provided files are still processed incorrectly by diffviewer.
~ When files in a repository is encoded with a non-ASCII, non-UTF-8 encoding, a special configuration option, repository encoding is required. However even if this option is provided files are still processed incorrectly by diffviewer.
convert_to_utf8() correctly returns unicode strings for byte strings which can be decoded as UTF-8 (i.e. ASCII and actual UTF-8) and further processing (e.g. by pygments) assumes unicode strings as parameters. However for non-UTF-8 strings the function returned byte strings which effectively break pygments.
The patch
1. renames convert_to_utf8() to convert_to_unicode() to reflect its real purpose :) 2. return unicode instead of str for strings in a user-specified encoding
Fix processing of non-UTF-8-encoded files and diffs
Review Request #872 — Created May 20, 2009 and submitted
When files in a repository is encoded with a non-ASCII, non-UTF-8 encoding, a special configuration option, repository encoding is required. However even if this option is provided files are still processed incorrectly by diffviewer. convert_to_utf8() correctly returns unicode strings for byte strings which can be decoded as UTF-8 (i.e. ASCII and actual UTF-8) and further processing (e.g. by pygments) assumes unicode strings as parameters. However for non-UTF-8 strings the function returned byte strings which effectively break pygments. The patch 1. renames convert_to_utf8() to convert_to_unicode() to reflect its real purpose :) 2. return unicode instead of str for strings in a user-specified encoding
MO