Improve our handling of file encodings with diffs.

Review Request #5161 — Created Dec. 24, 2013 and submitted

Information

Review Board
master

Reviewers

Improve our handling of file encodings with diffs.

The way that we dealt with diff and file contents was kind of broken with
regards to special encodings. This was made worse by the unicode transition,
and in particular, I noticed that the diff validation resource would raise an
exception if the diff had unicode characters in it. Fixing that inspired me to
fix all our issues.

We now decode strings to unicode before manipulating them, and then encode them
back to bytes afterward using the encoding that we used to decode. This means
that we're no longer splitting lines or doing regex replacements on bytes, only
on unicode objects.

  • Ran unit tests
  • Uploaded a diff containing unicode text via the "New Review Request" form.
Description From Last Updated

Could be simplified to: return self.encoding.split(',') or ['iso-8859-15']

chipx86chipx86
chipx86
  1. 
      
  2. reviewboard/diffviewer/diffutils.py (Diff revision 1)
     
     

    Since the comma-separated list of encodings is something that comes from the Repository object, how about using a list here, and an accessor in Repository for getting a list (splitting on the field), like we do with get_bug_list on ReviewRequest?

  3. 
      
david
chipx86
  1. One small suggestion. Otherwise, looks good.

  2. reviewboard/scmtools/models.py (Diff revision 2)
     
     
     
     
     
    Show all issues

    Could be simplified to:

    return self.encoding.split(',') or ['iso-8859-15']
    
  3. 
      
david
Review request changed

Status: Closed (submitted)

Change Summary:

Pushed to master (8317c71).
Loading...