Summary

Require byte strings for diff chunk generation and use Unicode for differs.

Review Request #10500 — Created April 2, 2019 and submitted May 17, 2019, 2:15 a.m.

Information

Owner

chipx86

Repository

Review Board

Branch

release-4.0.x

Bugs

Depends On

Reviewers

Groups

reviewboard

People

Description

The diff chunk generator, on the other hand, expects byte strings. It
takes these, normalizes them, converts to Unicode, and then hands them
off to the differ. To ensure it's getting the format it requires, it now
checks the types coming in during construction so that there's no
accidental cases of Unicode strings coming in.

Testing Done

Unit tests pass on Python 2.7 and 3.7 (with other in-progress changes).

Tested viewing a handful of diffs with Emoji and other non-ASCII content,
with and without a primed cache.

Commits

Summary	ID
Require byte strings for diff chunk generation and use Unicode for differs. The process of generating diffs requires different types of strings at different stages. The differ itself can technically work with either byte or Unicode strings and doesn't really care, but when bringing the "interesting lines" regexes into the process, the string types suddenly matter. Our code expects the strings to be normalized to Unicode at this stage, so that there's a consistent format to diff (without worrying about mismatched encodings). However, we were passing byte strings in some unit tests, which wasn't consistent with normal usage and caused problems on Python 3. Those have been fixed to be Unicode. The diff chunk generator, on the other hand, expects byte strings. It takes these, normalizes them, converts to Unicode, and then hands them off to the differ. To ensure it's getting the format it requires, it now checks the types coming in during construction so that there's no accidental cases of Unicode strings coming in.	0834cdf1414f3ee2b72d0dc9288acabf3f720c38

Summary

Require byte strings for diff chunk generation and use Unicode for differs.

The process of generating diffs requires different types of strings at different stages. The differ itself can technically work with either byte or Unicode strings and doesn't really care, but when bringing the "interesting lines" regexes into the process, the string types suddenly matter. Our code expects the strings to be normalized to Unicode at this stage, so that there's a consistent format to diff (without worrying about mismatched encodings). However, we were passing byte strings in some unit tests, which wasn't consistent with normal usage and caused problems on Python 3. Those have been fixed to be Unicode. The diff chunk generator, on the other hand, expects byte strings. It takes these, normalizes them, converts to Unicode, and then hands them off to the differ. To ensure it's getting the format it requires, it now checks the types coming in during construction so that there's no accidental cases of Unicode strings coming in.

0834cdf1414f3ee2b72d0dc9288acabf3f720c38

Issues

Description	From	Last Updated
E501 line too long (80 > 79 characters)	reviewbot	April 2, 2019, 9:09 p.m.
I'd say "GNU patch" and "GNU diff"	david	April 10, 2019, 4:17 p.m.
understand -> understands	david	April 10, 2019, 4:17 p.m.
Patch -> patch	david	April 10, 2019, 4:18 p.m.

flake8 failed.

JSHint passed.

flake8

reviewboard/diffviewer/chunk_generator.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E501 line too long (80 > 79 characters)
```

Change Summary:

Fixed a line length issue.

Commits:

	Summary	ID
	Require byte strings for diff chunk generation and use Unicode for differs. The process of generating diffs requires different types of strings at different stages. The differ itself can technically work with either byte or Unicode strings and doesn't really care, but when bringing the "interesting lines" regexes into the process, the string types suddenly matter. Our code expects the strings to be normalized to Unicode at this stage, so that there's a consistent format to diff (without worrying about mismatched encodings). However, we were passing byte strings in some unit tests, which wasn't consistent with normal usage and caused problems on Python 3. Those have been fixed to be Unicode. The diff chunk generator, on the other hand, expects byte strings. It takes these, normalizes them, converts to Unicode, and then hands them off to the differ. To ensure it's getting the format it requires, it now checks the types coming in during construction so that there's no accidental cases of Unicode strings coming in.	de49a8cecb3047c0ec0a26debe4f94f50e336c2c
	Require byte strings for diff chunk generation and use Unicode for differs. The process of generating diffs requires different types of strings at different stages. The differ itself can technically work with either byte or Unicode strings and doesn't really care, but when bringing the "interesting lines" regexes into the process, the string types suddenly matter. Our code expects the strings to be normalized to Unicode at this stage, so that there's a consistent format to diff (without worrying about mismatched encodings). However, we were passing byte strings in some unit tests, which wasn't consistent with normal usage and caused problems on Python 3. Those have been fixed to be Unicode. The diff chunk generator, on the other hand, expects byte strings. It takes these, normalizes them, converts to Unicode, and then hands them off to the differ. To ensure it's getting the format it requires, it now checks the types coming in during construction so that there's no accidental cases of Unicode strings coming in.	7533a005ec1514ac0f6ac388bdcd857fcb531a55

Diff:

Revision 2 (+884 -628)

Show changes

	reviewboard/diffviewer/chunk_generator.py
	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/tests/test_diff_opcode_generator.py
	reviewboard/diffviewer/tests/test_interesting_lines.py
	reviewboard/diffviewer/tests/test_raw_diff_chunk_generator.py

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

reviewboard/diffviewer/diffutils.py (Diff revision 2)
The issue has been resolved. Show all issues
```
I'd say "GNU patch" and "GNU diff"
```
reviewboard/diffviewer/diffutils.py (Diff revision 2)
The issue has been resolved. Show all issues
```
understand -> understands
```
reviewboard/diffviewer/diffutils.py (Diff revision 2)
The issue has been resolved. Show all issues
```
Patch -> patch
```

Change Summary:

Fixed up some docstring issues.

Commits:

	Summary	ID
	Require byte strings for diff chunk generation and use Unicode for differs. The process of generating diffs requires different types of strings at different stages. The differ itself can technically work with either byte or Unicode strings and doesn't really care, but when bringing the "interesting lines" regexes into the process, the string types suddenly matter. Our code expects the strings to be normalized to Unicode at this stage, so that there's a consistent format to diff (without worrying about mismatched encodings). However, we were passing byte strings in some unit tests, which wasn't consistent with normal usage and caused problems on Python 3. Those have been fixed to be Unicode. The diff chunk generator, on the other hand, expects byte strings. It takes these, normalizes them, converts to Unicode, and then hands them off to the differ. To ensure it's getting the format it requires, it now checks the types coming in during construction so that there's no accidental cases of Unicode strings coming in.	7533a005ec1514ac0f6ac388bdcd857fcb531a55
	Require byte strings for diff chunk generation and use Unicode for differs. The process of generating diffs requires different types of strings at different stages. The differ itself can technically work with either byte or Unicode strings and doesn't really care, but when bringing the "interesting lines" regexes into the process, the string types suddenly matter. Our code expects the strings to be normalized to Unicode at this stage, so that there's a consistent format to diff (without worrying about mismatched encodings). However, we were passing byte strings in some unit tests, which wasn't consistent with normal usage and caused problems on Python 3. Those have been fixed to be Unicode. The diff chunk generator, on the other hand, expects byte strings. It takes these, normalizes them, converts to Unicode, and then hands them off to the differ. To ensure it's getting the format it requires, it now checks the types coming in during construction so that there's no accidental cases of Unicode strings coming in.	0834cdf1414f3ee2b72d0dc9288acabf3f720c38

Diff:

Revision 3 (+884 -628)

Show changes

	reviewboard/diffviewer/chunk_generator.py
	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/tests/test_diff_opcode_generator.py
	reviewboard/diffviewer/tests/test_interesting_lines.py
	reviewboard/diffviewer/tests/test_raw_diff_chunk_generator.py

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

Ship it!

```
Ship It!
```

Status:: Completed
Change Summary:: Pushed to release-4.0.x (b9a2b12)