1359: Diff fails for text file with non-ascii characters

ping****@yaho***** (Google Code) (Is this you? Claim this profile.)
Oct. 8, 2011
What version are you running?

Review board 1.0 on Python 2.4

What's the URL of the page containing the problem?

http://<omitted>/reviews/r/658/diff/#index_header

What steps will reproduce the problem?
1. Create a text file containing the byte 0xED (in Windows, Alt+0237)
   This character corresponds to the "latin small letter i with acute" in
the Windows Western encoding.
2. Check this into source control
3. Edit the file to replace this character with a lowercase 'i'
4. Post the change to review board
5. Attempt to view the diff

(This actually happened to me today -- I was trying to fix a source file
that had this non-ascii, non utf-8 character in a docstring)

What is the expected output? What do you see instead?

I would expect to see a diff, possibly with invalid characters replaced by
hexadecimal representations.

Instead I get the following traceback:

'ascii' codec can't decode byte 0xc3 in position 26: ordinal not in range(128)

Traceback (most recent call last):
  File
"/usr/lib/python2.4/site-packages/ReviewBoard-1.0-py2.4.egg/reviewboard/diffviewer/views.py",
line 152, in view_diff
    interdiffset, highlighting, True)
  File
"/usr/lib/python2.4/site-packages/ReviewBoard-1.0-py2.4.egg/reviewboard/diffviewer/diffutils.py",
line 623, in get_diff_files
    large_data=True)
  File
"/usr/lib/python2.4/site-packages/Djblets-0.5-py2.4.egg/djblets/util/misc.py",
line 143, in cache_memoize
    data = lookup_callable()
  File
"/usr/lib/python2.4/site-packages/ReviewBoard-1.0-py2.4.egg/reviewboard/diffviewer/diffutils.py",
line 622, in <lambda>
    enable_syntax_highlighting),
  File
"/usr/lib/python2.4/site-packages/ReviewBoard-1.0-py2.4.egg/reviewboard/diffviewer/diffutils.py",
line 434, in get_chunks
    a[i1:i2], b[j1:j2], oldlines, newlines)
  File
"/usr/lib/python2.4/site-packages/ReviewBoard-1.0-py2.4.egg/reviewboard/diffviewer/diffutils.py",
line 268, in diff_line
    if oldline and newline and oldline != newline:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 26:
ordinal not in range(128)

What operating system are you using? What browser?

Firefox 3.5.3 on Windows XP SP3

Please provide any additional information below.
#1 ami.c*****@gmai***** (Google Code) (Is this you? Claim this profile.)
This also occurs on Copyright sign (0x2ca9) and all non-ascii script.
#2 fid***@gmai***** (Google Code) (Is this you? Claim this profile.)
There is error-handling to be set for unicode and encode-functions.  See
http://www.amk.ca/python/howto/unicode

I had this problem in post-review, and changed:
            value = files[key]['content']

into:
            value = unicode(files[key]['content'], errors='replace')
#3 fid***@gmai***** (Google Code) (Is this you? Claim this profile.)
Actually, that wasn't such a good idea after all; the diff was accepted but could not
be applied :P

Adding this to sitecustomize.py was better:
import sys
sys.setdefaultencoding('utf-8')

Now the diff gets accepted all right.  It show double-encoded though; all multibyte
chars becomes two chars in the "review diff"-page, and the page is sent as utf-8 -
but that's not this issue :)
#4 lon***@gmai***** (Google Code) (Is this you? Claim this profile.)
I have a similar issue, but with post-review:

Traceback (most recent call last):
  File "/u/laurentn/p4/code-review-feb22/rbtools/postreview.py", line 3137, in <module>
    main()
  File "/u/laurentn/p4/code-review-feb22/rbtools/postreview.py", line 3117, in main
    submit_as=options.submit_as)
  File "/u/laurentn/p4/code-review-feb22/rbtools/postreview.py", line 2747, in tempt_fate
    parent_diff_content)
  File "/u/laurentn/p4/code-review-feb22/rbtools/postreview.py", line 571, in upload_diff
    review_request['id'], fields, files)
  File "/u/laurentn/p4/code-review-feb22/rbtools/postreview.py", line 697, in api_post
    return self.process_json(self.http_post(path, fields, files))
  File "/u/laurentn/p4/code-review-feb22/rbtools/postreview.py", line 672, in http_post
    content_type, body = self._encode_multipart_formdata(fields, files)
  File "/u/laurentn/p4/code-review-feb22/rbtools/postreview.py", line 731, in
_encode_multipart_formdata
    return content_type, content.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 54556: ordinal
not in range(128)


Here the diffs contain Loïc (i-trema or diaeresis).

I'm not sure if we should group all these issues into a single one or track them
separately.  Note that I could upload the diffs in Review Board, using upload diffs,
but the name is translated to Loïc  .  
#5 zhang.******@gmai***** (Google Code) (Is this you? Claim this profile.)
having such issue too, with post-review script. 

happens with any diff file contains Chinese character, here's the dump with --debug 
turned on:

>>> repository info: Path: https://svn.mysite.com/repos/myproject, Base path: 
/website/trunk, Supports changesets: False
>>> svn diff --diff-cmd=diff app/src/index.php
>>> svn info app/src/index.php
>>> svn info app/src/index.php
>>> svn info app/src/index.php
>>> svn info app/src/index.php
>>> Looking for 'www.mysite.com /reviewboard/' cookie in /home/yining/.post-review-
cookies.txt
>>> Loaded valid cookie -- no login required
>>> HTTP GETting /api/json/repositories/
>>> HTTP GETting /api/json/repositories/1/info/
>>> repository info: Path: file:///opt/svn/repos/myproject/website/trunk, Base path: 
/, Supports changesets: False
>>> Attempting to create review request on 
file:///opt/svn/repos/myproject/website/trunk for None
>>> HTTP POSTing to http://www.mysite.com/reviewboard/api/json/reviewrequests/new/: 
{'repository_path': u'file:///opt/svn/repos/myproject/website/trunk'}
>>> Review request created
>>> Uploading diff, size: 277
>>> HTTP POSTing to 
http://www.mysite.com/reviewboard/api/json/reviewrequests/27/diff/new/: {'basedir': 
'/'}
Traceback (most recent call last):
  File "/opt/python/2.6.4/bin/post-review", line 8, in <module>
    load_entry_point('RBTools==0.2rc1', 'console_scripts', 'post-review')()
  File "/opt/python/2.6.4/lib/python2.6/site-packages/RBTools-0.2rc1-
py2.6.egg/rbtools/postreview.py", line 2774, in main
  File "/opt/python/2.6.4/lib/python2.6/site-packages/RBTools-0.2rc1-
py2.6.egg/rbtools/postreview.py", line 2480, in tempt_fate
  File "/opt/python/2.6.4/lib/python2.6/site-packages/RBTools-0.2rc1-
py2.6.egg/rbtools/postreview.py", line 503, in upload_diff
  File "/opt/python/2.6.4/lib/python2.6/site-packages/RBTools-0.2rc1-
py2.6.egg/rbtools/postreview.py", line 622, in api_post
  File "/opt/python/2.6.4/lib/python2.6/site-packages/RBTools-0.2rc1-
py2.6.egg/rbtools/postreview.py", line 597, in http_post
  File "/opt/python/2.6.4/lib/python2.6/site-packages/RBTools-0.2rc1-
py2.6.egg/rbtools/postreview.py", line 656, in _encode_multipart_formdata
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 436: ordinal not 
in range(128)

Python version: 2.6.4
ReviewBoard version: 1.0.5.1
RBTools version: 0.2rc1-py2.6
post-review version: 0.8

It's a show stopper for me if this issue can not be fixed.
chipx86
#6 chipx86
lonico and zhang.yining, your issues are separate from the bug reported, and is fixed
in the RBTools nightlies.
#7 chris.c********@gmai***** (Google Code) (Is this you? Claim this profile.)
Any further action? Appears that post-review is fixed, but reviewboard is not?
david
#8 david
  • +Component-DiffViewer
#9 jer****@gmai***** (Google Code) (Is this you? Claim this profile.)
I'm pretty sure issue #2155 is a duplicate of this. The attachment attached there might be useful, though.
#10 OMey****@gmai***** (Google Code) (Is this you? Claim this profile.)
Review Board 1.6.1 

I tried to upload a diff from a file with a subversion base directory that contains U+00E4 (ä) "Latin small letter a with diaresis" and get a similiar error. Therefore, I cannot post a review request.

To Reproduce:
 1. Add a folder "Anwendungsfälle" with a file "Some.txt" in your Subversion
 2. Change the file
 3. Try to post a review request for your change.

The problem seems very similar to the one posted here, but I do not understand whether a fix already exists.
  
#11 jer****@gmai***** (Google Code) (Is this you? Claim this profile.)
Have a look at issue #2155. That one has been fixed and RBTools 0.4.0 will be released soon. You can have a look if the proposed fix for #2155 solves your issue, http://code.google.com/u/@WBJQRFRRBBdCWQR5/
david
#12 david
Fixed in release-1.6.x (056f27d)
  • +Fixed