Summary

Improve our handling of file encodings with diffs.

Review Request #5161 — Created Dec. 24, 2013 and submitted Jan. 3, 2014, 4:28 p.m.

Information

Owner

david

Repository

Review Board

Branch

master

Bugs

Depends On

Reviewers

Groups

reviewboard

People

Description

Improve our handling of file encodings with diffs.

The way that we dealt with diff and file contents was kind of broken with
regards to special encodings. This was made worse by the unicode transition,
and in particular, I noticed that the diff validation resource would raise an
exception if the diff had unicode characters in it. Fixing that inspired me to
fix all our issues.

We now decode strings to unicode before manipulating them, and then encode them
back to bytes afterward using the encoding that we used to decode. This means
that we're no longer splitting lines or doing regex replacements on bytes, only
on unicode objects.

Testing Done


Ran unit tests
Uploaded a diff containing unicode text via the "New Review Request" form.

Issues

Description	From	Last Updated
Could be simplified to: return self.encoding.split(',') or ['iso-8859-15']	chipx86	Jan. 3, 2014, 4:24 p.m.

reviewboard/diffviewer/diffutils.py (Diff revision 1)

Since the comma-separated list of encodings is something that comes from the Repository object, how about using a list here, and an accessor in Repository for getting a list (splitting on the field), like we do with get_bug_list on ReviewRequest?

Change Summary:

Pass in a list of encodings instead of a comma-separated string.

Diff:

Revision 2 (+84 -50)

Show changes

	reviewboard/diffviewer/chunk_generator.py
	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/managers.py
	reviewboard/scmtools/models.py
	reviewboard/webapi/resources/original_file.py
	reviewboard/webapi/resources/patched_file.py

Ship it!

One small suggestion. Otherwise, looks good.

reviewboard/scmtools/models.py (Diff revision 2)

The issue has been resolved. Show all issues

Could be simplified to:
return self.encoding.split(',') or ['iso-8859-15']

Status:: Completed
Change Summary:: Pushed to master (8317c71).