1359: Diff fails for text file with non-ascii characters
- Fixed
- Review Board
ping****@yaho***** (Google Code) (Is this you? Claim this profile.) | |
Oct. 8, 2011 |
What version are you running? Review board 1.0 on Python 2.4 What's the URL of the page containing the problem? http://<omitted>/reviews/r/658/diff/#index_header What steps will reproduce the problem? 1. Create a text file containing the byte 0xED (in Windows, Alt+0237) This character corresponds to the "latin small letter i with acute" in the Windows Western encoding. 2. Check this into source control 3. Edit the file to replace this character with a lowercase 'i' 4. Post the change to review board 5. Attempt to view the diff (This actually happened to me today -- I was trying to fix a source file that had this non-ascii, non utf-8 character in a docstring) What is the expected output? What do you see instead? I would expect to see a diff, possibly with invalid characters replaced by hexadecimal representations. Instead I get the following traceback: 'ascii' codec can't decode byte 0xc3 in position 26: ordinal not in range(128) Traceback (most recent call last): File "/usr/lib/python2.4/site-packages/ReviewBoard-1.0-py2.4.egg/reviewboard/diffviewer/views.py", line 152, in view_diff interdiffset, highlighting, True) File "/usr/lib/python2.4/site-packages/ReviewBoard-1.0-py2.4.egg/reviewboard/diffviewer/diffutils.py", line 623, in get_diff_files large_data=True) File "/usr/lib/python2.4/site-packages/Djblets-0.5-py2.4.egg/djblets/util/misc.py", line 143, in cache_memoize data = lookup_callable() File "/usr/lib/python2.4/site-packages/ReviewBoard-1.0-py2.4.egg/reviewboard/diffviewer/diffutils.py", line 622, in <lambda> enable_syntax_highlighting), File "/usr/lib/python2.4/site-packages/ReviewBoard-1.0-py2.4.egg/reviewboard/diffviewer/diffutils.py", line 434, in get_chunks a[i1:i2], b[j1:j2], oldlines, newlines) File "/usr/lib/python2.4/site-packages/ReviewBoard-1.0-py2.4.egg/reviewboard/diffviewer/diffutils.py", line 268, in diff_line if oldline and newline and oldline != newline: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 26: ordinal not in range(128) What operating system are you using? What browser? Firefox 3.5.3 on Windows XP SP3 Please provide any additional information below.
This also occurs on Copyright sign (0x2ca9) and all non-ascii script.
There is error-handling to be set for unicode and encode-functions. See http://www.amk.ca/python/howto/unicode I had this problem in post-review, and changed: value = files[key]['content'] into: value = unicode(files[key]['content'], errors='replace')
Actually, that wasn't such a good idea after all; the diff was accepted but could not be applied :P Adding this to sitecustomize.py was better: import sys sys.setdefaultencoding('utf-8') Now the diff gets accepted all right. It show double-encoded though; all multibyte chars becomes two chars in the "review diff"-page, and the page is sent as utf-8 - but that's not this issue :)
I have a similar issue, but with post-review: Traceback (most recent call last): File "/u/laurentn/p4/code-review-feb22/rbtools/postreview.py", line 3137, in <module> main() File "/u/laurentn/p4/code-review-feb22/rbtools/postreview.py", line 3117, in main submit_as=options.submit_as) File "/u/laurentn/p4/code-review-feb22/rbtools/postreview.py", line 2747, in tempt_fate parent_diff_content) File "/u/laurentn/p4/code-review-feb22/rbtools/postreview.py", line 571, in upload_diff review_request['id'], fields, files) File "/u/laurentn/p4/code-review-feb22/rbtools/postreview.py", line 697, in api_post return self.process_json(self.http_post(path, fields, files)) File "/u/laurentn/p4/code-review-feb22/rbtools/postreview.py", line 672, in http_post content_type, body = self._encode_multipart_formdata(fields, files) File "/u/laurentn/p4/code-review-feb22/rbtools/postreview.py", line 731, in _encode_multipart_formdata return content_type, content.encode('utf-8') UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 54556: ordinal not in range(128) Here the diffs contain Loïc (i-trema or diaeresis). I'm not sure if we should group all these issues into a single one or track them separately. Note that I could upload the diffs in Review Board, using upload diffs, but the name is translated to Loïc .
having such issue too, with post-review script. happens with any diff file contains Chinese character, here's the dump with --debug turned on: >>> repository info: Path: https://svn.mysite.com/repos/myproject, Base path: /website/trunk, Supports changesets: False >>> svn diff --diff-cmd=diff app/src/index.php >>> svn info app/src/index.php >>> svn info app/src/index.php >>> svn info app/src/index.php >>> svn info app/src/index.php >>> Looking for 'www.mysite.com /reviewboard/' cookie in /home/yining/.post-review- cookies.txt >>> Loaded valid cookie -- no login required >>> HTTP GETting /api/json/repositories/ >>> HTTP GETting /api/json/repositories/1/info/ >>> repository info: Path: file:///opt/svn/repos/myproject/website/trunk, Base path: /, Supports changesets: False >>> Attempting to create review request on file:///opt/svn/repos/myproject/website/trunk for None >>> HTTP POSTing to http://www.mysite.com/reviewboard/api/json/reviewrequests/new/: {'repository_path': u'file:///opt/svn/repos/myproject/website/trunk'} >>> Review request created >>> Uploading diff, size: 277 >>> HTTP POSTing to http://www.mysite.com/reviewboard/api/json/reviewrequests/27/diff/new/: {'basedir': '/'} Traceback (most recent call last): File "/opt/python/2.6.4/bin/post-review", line 8, in <module> load_entry_point('RBTools==0.2rc1', 'console_scripts', 'post-review')() File "/opt/python/2.6.4/lib/python2.6/site-packages/RBTools-0.2rc1- py2.6.egg/rbtools/postreview.py", line 2774, in main File "/opt/python/2.6.4/lib/python2.6/site-packages/RBTools-0.2rc1- py2.6.egg/rbtools/postreview.py", line 2480, in tempt_fate File "/opt/python/2.6.4/lib/python2.6/site-packages/RBTools-0.2rc1- py2.6.egg/rbtools/postreview.py", line 503, in upload_diff File "/opt/python/2.6.4/lib/python2.6/site-packages/RBTools-0.2rc1- py2.6.egg/rbtools/postreview.py", line 622, in api_post File "/opt/python/2.6.4/lib/python2.6/site-packages/RBTools-0.2rc1- py2.6.egg/rbtools/postreview.py", line 597, in http_post File "/opt/python/2.6.4/lib/python2.6/site-packages/RBTools-0.2rc1- py2.6.egg/rbtools/postreview.py", line 656, in _encode_multipart_formdata UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 436: ordinal not in range(128) Python version: 2.6.4 ReviewBoard version: 1.0.5.1 RBTools version: 0.2rc1-py2.6 post-review version: 0.8 It's a show stopper for me if this issue can not be fixed.
lonico and zhang.yining, your issues are separate from the bug reported, and is fixed in the RBTools nightlies.
Any further action? Appears that post-review is fixed, but reviewboard is not?
I'm pretty sure issue #2155 is a duplicate of this. The attachment attached there might be useful, though.
Review Board 1.6.1 I tried to upload a diff from a file with a subversion base directory that contains U+00E4 (ä) "Latin small letter a with diaresis" and get a similiar error. Therefore, I cannot post a review request. To Reproduce: 1. Add a folder "Anwendungsfälle" with a file "Some.txt" in your Subversion 2. Change the file 3. Try to post a review request for your change. The problem seems very similar to the one posted here, but I do not understand whether a fix already exists.
Have a look at issue #2155. That one has been fixed and RBTools 0.4.0 will be released soon. You can have a look if the proposed fix for #2155 solves your issue, http://code.google.com/u/@WBJQRFRRBBdCWQR5/