Summary

Give FileDiffs control over handling of encodings for source files.

Review Request #10775 — Created Nov. 1, 2019 and submitted Jan. 13, 2020, 3:02 p.m.

Information

Owner

chipx86

Repository

Review Board

Branch

release-4.0.x

Bugs

Depends On

~~10776~~

Commit

78b4aa4...

Reviewers

Groups

reviewboard

People

Description

Historically, Review Board has depended on source files to either us a
UTF-8 or ISO-8859-15 encoding, or for the repository to supply a list of
compatible encodings. If a special encoding were ever used for a file,
and the repository later had its encodings list changed to exclude it,
file would no longer be able to be viewed in the diff viewer.

This encoding list was computed up-front when generating a diff for
file. We'd fetch the repository encoding list, pass it to calls like
get_original_file(), which would then pass that on to
convert_to_unicode(), which would try the built-in encodings and the
provided list of encodings. The caller was always responsible for
fetching or computing that encoding list up-front before calling any of
these methods.

This change rewrites all of this to make the process more dependent on
the FileDiff we're working on, and to record a known working encoding
for future use.

FileDiff.extra_data can now contain an 'encoding' key (conveniently
exposed as FileDiff.encoding), which specifies an explicit encoding to
use. This could in theory be set during diff parsing (which requires
some additional work still), but it will be set after we've successfully
converted a source file to Unicode and back.

If not set, we still respect the provided encoding list (though this is
now deprecated in get_original_file()), and then look up from the
repository. This will allow us to later work with FileDiffs backed by
multiple repositories without placing unnecessary burdens on the caller.

The machinery for diff chunk generation has been updated as well. Now,
RawDiffChunkGenerator.generate_chunks() can be given an encoding list
for the old and new content. This defaults to the encoding list normally
provided to the chunk generator, but subclasses (DiffChunkGenerator)
can pass in separate encoding lists, allowing for two FileDiffs on an
interdiff to be in separate encodings without breaking chunk generation.

Note that it's still possible for a patch to contain content in another
encoding that doesn't match what's in FileDiff.encoding. The chunk
generator will still try to do the right thing in this case, falling
back to the repository encodings.

Testing Done

Unit tests passed.

Made sure my various test diffs and interdiffs were still working as
expected.

Issues

Description	From	Last Updated
F811 redefinition of unused 'test_with_filediff_encoding_set' from line 3548	reviewbot	Nov. 1, 2019, 4:36 p.m.
Can we call this FALLBACK_ENCODING? Our real default is utf-8	david	Dec. 28, 2019, 1:38 a.m.

flake8 failed.

JSHint passed.

flake8

reviewboard/diffviewer/tests/test_diffutils.py (Diff revision 1)
The issue has been resolved. Show all issues
```
F811 redefinition of unused 'test_with_filediff_encoding_set' from line 3548
```

Change Summary:

Fixed a name conflict for a test function, and a broken check it masked.

Commit:

9216e0cc38e69b039efff62c874a2ba18c32de05

7a5524f1d894c0d747fe94ebf2b2e2f5af09e2de

Diff:

Revision 2 (+733 -94)

Show changes

	reviewboard/diffviewer/chunk_generator.py
	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/models/filediff.py
	reviewboard/diffviewer/tests/test_diff_chunk_generator.py
	reviewboard/diffviewer/tests/test_diffutils.py
	reviewboard/diffviewer/tests/test_raw_diff_chunk_generator.py
	reviewboard/scmtools/models.py
	reviewboard/testing/scmtool.py
	1 more

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

Change Summary:


Based this on top of /r/10776/, which implements FileDiff.get_repository() (instead of including it in this change)
Updated a couple more calls to get_original_file().

Depends On:

~~10776 - Add a function for getting the Repository for a FileDiff.~~

Commit:

7a5524f1d894c0d747fe94ebf2b2e2f5af09e2de

b09f47b5bc34ba33fdfae987121f1fd28826abd9

Diff:

Revision 3 (+731 -104)

Show changes

	reviewboard/diffviewer/chunk_generator.py
	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/models/filediff.py
	reviewboard/diffviewer/tests/test_diff_chunk_generator.py
	reviewboard/diffviewer/tests/test_diffutils.py
	reviewboard/diffviewer/tests/test_raw_diff_chunk_generator.py
	reviewboard/reviews/views.py
	reviewboard/scmtools/models.py
	4 more

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

reviewboard/scmtools/models.py (Diff revision 3)
The issue has been resolved. Show all issues
```
Can we call this FALLBACK_ENCODING? Our real default is utf-8
```

Change Summary:

Renamed DEFAULT_ENCODING to FALLBACK_ENCODING.

Commit:

b09f47b5bc34ba33fdfae987121f1fd28826abd9

78b4aa407eb6c2a2549a8408b46d906e79b4f38d

Diff:

Revision 4 (+734 -104)

Show changes

	reviewboard/diffviewer/chunk_generator.py
	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/models/filediff.py
	reviewboard/diffviewer/tests/test_diff_chunk_generator.py
	reviewboard/diffviewer/tests/test_diffutils.py
	reviewboard/diffviewer/tests/test_raw_diff_chunk_generator.py
	reviewboard/reviews/views.py
	reviewboard/scmtools/models.py
	4 more

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

Ship it!

```
Ship It!
```

Status:: Completed
Change Summary:: Pushed to release-4.0.x (4f98605)