Summary

Add improved type safety for diff parsing.

Review Request #10496 — Created April 2, 2019 and submitted April 10, 2019, 8:52 p.m.

Information

Owner

chipx86

Repository

Review Board

Branch

release-4.0.x

Bugs

Depends On

~~10495~~

Reviewers

Groups

reviewboard

People

Description

Python 2's ability to interchange byte strings and Unicode strings meant
that this all pretty well worked, even when wrong. That changes with
Python 3. Byte strings and Unicode strings are now very different, and
we need to ensure we're working only with byte strings for all parsing.
We also need to make sure we have somewhat of a migration path for the
parsing that's capable of informing when things are wrong.

This change goes through our diff parsing and ensures that they're
exclusively working with byte strings, updating string literals and
using io.BytesIO instead of StringIO (which is Unicode on Python 3).
All parse_diff_revision() methods now assert the types coming in, and
the code that calls it now asserts the types on the results.
DiffParser also now requires a byte string, rather than accepting a
Unicode string, and asserts on this.

The other big part of this change is the data going into and coming out
of ParsedDiffFile. This class is very old, and we had old attributes
with old-style names that accepted any value with the assumption that
it'd all get normalized to bytes at some point. Now, we have newer
attributes that replace the old ones, but enforce type safety. The old
attribute names continue to work and will cast to the right type, but
will also emit deprecation warnings, helping us or third-parties to
catch any issues.

The diff viewer code all uses the new attributes, but there's still a
bunch of SCMTool code and tests that use the old attributes, causing
warnings to appear. Those will be tackled separately.

Testing Done

Unit tests pass on Python 2.7 and 3.7 (with other in-progress changes).

Tested posting changes for review.

Commits

Summary	ID
Add improved type safety for diff parsing. Diff parsing was intended to work with byte strings, and generally did a pretty good job of this. Still, we had no proper enforcement, and we had some places where byte strings were being compared to Unicode strings, or temporarily transformed into Unicode strings. Much, though not all, of the type inconsistency occurred in the `parse_diff_revision()` methods on SCMTool subclasses. The rest was somewhat harmless, setting empty strings or default revision identifiers in `ParsedDiffFile` attributes, which then just made their way ultimately into other function calls or into `FileDiff` attributes. Python 2's ability to interchange byte strings and Unicode strings meant that this all pretty well worked, even when wrong. That changes with Python 3. Byte strings and Unicode strings are now very different, and we need to ensure we're working only with byte strings for all parsing. We also need to make sure we have somewhat of a migration path for the parsing that's capable of informing when things are wrong. This change goes through our diff parsing and ensures that they're exclusively working with byte strings, updating string literals and using `io.BytesIO` instead of `StringIO` (which is Unicode on Python 3). All `parse_diff_revision()` methods now assert the types coming in, and the code that calls it now asserts the types on the results. `DiffParser` also now requires a byte string, rather than accepting a Unicode string, and asserts on this. The other big part of this change is the data going into and coming out of `ParsedDiffFile`. This class is very old, and we had old attributes with old-style names that accepted any value with the assumption that it'd all get normalized to bytes at some point. Now, we have newer attributes that replace the old ones, but enforce type safety. The old attribute names continue to work and will cast to the right type, but will also emit deprecation warnings, helping us or third-parties to catch any issues. The diff viewer code all uses the new attributes, but there's still a bunch of SCMTool code and tests that use the old attributes, causing warnings to appear. Those will be tackled separately.	268a70462a26b34a037ed6e2998f1a124ea10768

Summary

Add improved type safety for diff parsing.

Diff parsing was intended to work with byte strings, and generally did a pretty good job of this. Still, we had no proper enforcement, and we had some places where byte strings were being compared to Unicode strings, or temporarily transformed into Unicode strings. Much, though not all, of the type inconsistency occurred in the `parse_diff_revision()` methods on SCMTool subclasses. The rest was somewhat harmless, setting empty strings or default revision identifiers in `ParsedDiffFile` attributes, which then just made their way ultimately into other function calls or into `FileDiff` attributes. Python 2's ability to interchange byte strings and Unicode strings meant that this all pretty well worked, even when wrong. That changes with Python 3. Byte strings and Unicode strings are now very different, and we need to ensure we're working only with byte strings for all parsing. We also need to make sure we have somewhat of a migration path for the parsing that's capable of informing when things are wrong. This change goes through our diff parsing and ensures that they're exclusively working with byte strings, updating string literals and using `io.BytesIO` instead of `StringIO` (which is Unicode on Python 3). All `parse_diff_revision()` methods now assert the types coming in, and the code that calls it now asserts the types on the results. `DiffParser` also now requires a byte string, rather than accepting a Unicode string, and asserts on this. The other big part of this change is the data going into and coming out of `ParsedDiffFile`. This class is very old, and we had old attributes with old-style names that accepted any value with the assumption that it'd all get normalized to bytes at some point. Now, we have newer attributes that replace the old ones, but enforce type safety. The old attribute names continue to work and will cast to the right type, but will also emit deprecation warnings, helping us or third-parties to catch any issues. The diff viewer code all uses the new attributes, but there's still a bunch of SCMTool code and tests that use the old attributes, causing warnings to appear. Those will be tackled separately.

268a70462a26b34a037ed6e2998f1a124ea10768

Issues

Description	From	Last Updated
E127 continuation line over-indented for visual indent	reviewbot	April 2, 2019, 4:24 a.m.
E127 continuation line over-indented for visual indent	reviewbot	April 2, 2019, 4:24 a.m.
E127 continuation line over-indented for visual indent	reviewbot	April 2, 2019, 4:25 a.m.
E127 continuation line over-indented for visual indent	reviewbot	April 2, 2019, 4:25 a.m.
E127 continuation line over-indented for visual indent	reviewbot	April 2, 2019, 4:25 a.m.
E127 continuation line over-indented for visual indent	reviewbot	April 2, 2019, 4:25 a.m.
E127 continuation line over-indented for visual indent	reviewbot	April 2, 2019, 4:25 a.m.
E127 continuation line over-indented for visual indent	reviewbot	April 2, 2019, 4:25 a.m.
E127 continuation line over-indented for visual indent	reviewbot	April 2, 2019, 4:26 a.m.
E127 continuation line over-indented for visual indent	reviewbot	April 2, 2019, 4:26 a.m.
E127 continuation line over-indented for visual indent	reviewbot	April 2, 2019, 4:26 a.m.
E127 continuation line over-indented for visual indent	reviewbot	April 2, 2019, 4:26 a.m.
E127 continuation line over-indented for visual indent	reviewbot	April 2, 2019, 4:26 a.m.
E127 continuation line over-indented for visual indent	reviewbot	April 2, 2019, 4:26 a.m.
E127 continuation line over-indented for visual indent	reviewbot	April 2, 2019, 4:26 a.m.
E127 continuation line over-indented for visual indent	reviewbot	April 2, 2019, 4:26 a.m.
F401 'django.utils.six' imported but unused	reviewbot	April 2, 2019, 4:26 a.m.
E127 continuation line over-indented for visual indent	reviewbot	April 2, 2019, 4:27 a.m.
E127 continuation line over-indented for visual indent	reviewbot	April 2, 2019, 4:27 a.m.
F401 'django.utils.encoding.force_text' imported but unused	reviewbot	April 2, 2019, 4:27 a.m.

flake8 failed.

JSHint passed.

flake8

reviewboard/diffviewer/filediff_creator.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E127 continuation line over-indented for visual indent
```
reviewboard/diffviewer/filediff_creator.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E127 continuation line over-indented for visual indent
```
reviewboard/scmtools/clearcase.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E127 continuation line over-indented for visual indent
```
reviewboard/scmtools/clearcase.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E127 continuation line over-indented for visual indent
```
reviewboard/scmtools/cvs.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E127 continuation line over-indented for visual indent
```
reviewboard/scmtools/cvs.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E127 continuation line over-indented for visual indent
```
reviewboard/scmtools/git.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E127 continuation line over-indented for visual indent
```
reviewboard/scmtools/git.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E127 continuation line over-indented for visual indent
```
reviewboard/scmtools/hg.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E127 continuation line over-indented for visual indent
```
reviewboard/scmtools/hg.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E127 continuation line over-indented for visual indent
```
reviewboard/scmtools/perforce.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E127 continuation line over-indented for visual indent
```
reviewboard/scmtools/perforce.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E127 continuation line over-indented for visual indent
```
reviewboard/scmtools/plastic.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E127 continuation line over-indented for visual indent
```
reviewboard/scmtools/plastic.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E127 continuation line over-indented for visual indent
```
reviewboard/scmtools/svn/__init__.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E127 continuation line over-indented for visual indent
```
reviewboard/scmtools/svn/__init__.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E127 continuation line over-indented for visual indent
```
reviewboard/scmtools/svn/base.py (Diff revision 1)
The issue has been resolved. Show all issues
```
F401 'django.utils.six' imported but unused
```
reviewboard/scmtools/svn/base.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E127 continuation line over-indented for visual indent
```
reviewboard/scmtools/svn/base.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E127 continuation line over-indented for visual indent
```
reviewboard/scmtools/svn/subvertpy.py (Diff revision 1)
The issue has been resolved. Show all issues
```
F401 'django.utils.encoding.force_text' imported but unused
```

Change Summary:

Hopefully fixed Review Bot complaints.

Testing Done:

	+	Unit tests pass on Python 2.7 and 3.7 (with other in-progress changes).
	+
	+	Tested posting changes for review.

Commits:

	Summary	ID
	Add improved type safety for diff parsing. Diff parsing was intended to work with byte strings, and generally did a pretty good job of this. Still, we had no proper enforcement, and we had some places where byte strings were being compared to Unicode strings, or temporarily transformed into Unicode strings. Much, though not all, of the type inconsistency occurred in the `parse_diff_revision()` methods on SCMTool subclasses. The rest was somewhat harmless, setting empty strings or default revision identifiers in `ParsedDiffFile` attributes, which then just made their way ultimately into other function calls or into `FileDiff` attributes. Python 2's ability to interchange byte strings and Unicode strings meant that this all pretty well worked, even when wrong. That changes with Python 3. Byte strings and Unicode strings are now very different, and we need to ensure we're working only with byte strings for all parsing. We also need to make sure we have somewhat of a migration path for the parsing that's capable of informing when things are wrong. This change goes through our diff parsing and ensures that they're exclusively working with byte strings, updating string literals and using `io.BytesIO` instead of `StringIO` (which is Unicode on Python 3). All `parse_diff_revision()` methods now assert the types coming in, and the code that calls it now asserts the types on the results. `DiffParser` also now requires a byte string, rather than accepting a Unicode string, and asserts on this. The other big part of this change is the data going into and coming out of `ParsedDiffFile`. This class is very old, and we had old attributes with old-style names that accepted any value with the assumption that it'd all get normalized to bytes at some point. Now, we have newer attributes that replace the old ones, but enforce type safety. The old attribute names continue to work and will cast to the right type, but will also emit deprecation warnings, helping us or third-parties to catch any issues. The diff viewer code all uses the new attributes, but there's still a bunch of SCMTool code and tests that use the old attributes, causing warnings to appear. Those will be tackled separately.	ee49967573ea68d1f4449d2cb2e88bcccab65535
	Add improved type safety for diff parsing. Diff parsing was intended to work with byte strings, and generally did a pretty good job of this. Still, we had no proper enforcement, and we had some places where byte strings were being compared to Unicode strings, or temporarily transformed into Unicode strings. Much, though not all, of the type inconsistency occurred in the `parse_diff_revision()` methods on SCMTool subclasses. The rest was somewhat harmless, setting empty strings or default revision identifiers in `ParsedDiffFile` attributes, which then just made their way ultimately into other function calls or into `FileDiff` attributes. Python 2's ability to interchange byte strings and Unicode strings meant that this all pretty well worked, even when wrong. That changes with Python 3. Byte strings and Unicode strings are now very different, and we need to ensure we're working only with byte strings for all parsing. We also need to make sure we have somewhat of a migration path for the parsing that's capable of informing when things are wrong. This change goes through our diff parsing and ensures that they're exclusively working with byte strings, updating string literals and using `io.BytesIO` instead of `StringIO` (which is Unicode on Python 3). All `parse_diff_revision()` methods now assert the types coming in, and the code that calls it now asserts the types on the results. `DiffParser` also now requires a byte string, rather than accepting a Unicode string, and asserts on this. The other big part of this change is the data going into and coming out of `ParsedDiffFile`. This class is very old, and we had old attributes with old-style names that accepted any value with the assumption that it'd all get normalized to bytes at some point. Now, we have newer attributes that replace the old ones, but enforce type safety. The old attribute names continue to work and will cast to the right type, but will also emit deprecation warnings, helping us or third-parties to catch any issues. The diff viewer code all uses the new attributes, but there's still a bunch of SCMTool code and tests that use the old attributes, causing warnings to appear. Those will be tackled separately.	268a70462a26b34a037ed6e2998f1a124ea10768

Diff:

Revision 2 (+2058 -1250)

Show changes

	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/filediff_creator.py
	reviewboard/diffviewer/parser.py
	reviewboard/diffviewer/tests/test_diff_parser.py
	reviewboard/diffviewer/tests/test_forms.py
	reviewboard/reviews/views.py
	reviewboard/scmtools/clearcase.py
	reviewboard/scmtools/core.py
	17 more

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

Ship it!

```
Ship It!
```

Status:: Completed
Change Summary:: Pushed to release-4.0.x (9a15b1b)

reviewboard/scmtools/cvs.py (Diff revision 1)

reviewboard/scmtools/cvs.py (Diff revision 1)

reviewboard/scmtools/git.py (Diff revision 1)

reviewboard/scmtools/git.py (Diff revision 1)