This change has been marked as completed.

Describe the completed change (optional):

Pushed to master (802810e)

Summary

Update the types for diff chunks and lines.

Review Request #14399 — Created April 14, 2025 and submitted 2 days, 21 hours ago

Information

Owner

david*

Repository

Review Board

Branch

master

Bugs

Depends On

Reviewers

Groups

reviewboard

People

Description*

This change defines a TypedDict for diff chunks, and a tuple for
diff lines, with definitions and documentation for each element.

Before, these were just an opaque dictionary, and a heterogenous list.
The diff lines were particularly annoying because they were variable
length (if there was no metadata, that element was just left off the
end), and types were annoying (for "real" line numbers, we'd have an
int if the line was present on that side of the diff, but an empty
string if not). This change technically breaks compatibility in those
two cases.

We now always include the metadata field, set to None if there's no
metadata for the line. This increases the cached size of a very large
diff with thousands of lines by less that 0.1% (much better than many
years ago with older pickle protocols--the tuple is probably also
helping here).

In the case of real line numbers which are not present on one side of
the diff (for example, the line number of an added line for the
left-hand side of the diff), we set the data for that to None. In
order to maintain compatibility with current versions of Review Bot, we
change None line numbers back to an empty string for the filediff
resource's diff_data mode.

This change only touches code related to chunks and lines. Other diff
viewer cleanup is coming in later changes.

Testing Done


Tested a ton of different diffs, both in the diff viewer as well as

  the text and markdown Review UIs. Verified that all types of edits

  (added/edited/deleted/moved) lines all worked as expected.
Ran unit tests.

Commits

Summary	ID
Update the types for diff chunks and lines. This change defines a `TypedDict` for diff chunks, and a `tuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but an empty string if not). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. This increases the cached size of a very large diff with thousands of lines by less that 0.1% (much better than many years ago with older pickle protocols--the tuple is probably also helping here). In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. In order to maintain compatibility with current versions of Review Bot, we change `None` line numbers back to an empty string for the filediff resource's `diff_data` mode. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.	9daae1bba689c7145420b91cfb69c0aee56928ea

Summary

Update the types for diff chunks and lines.

This change defines a `TypedDict` for diff chunks, and a `tuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but an empty string if not). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. This increases the cached size of a very large diff with thousands of lines by less that 0.1% (much better than many years ago with older pickle protocols--the tuple is probably also helping here). In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. In order to maintain compatibility with current versions of Review Bot, we change `None` line numbers back to an empty string for the filediff resource's `diff_data` mode. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.

9daae1bba689c7145420b91cfb69c0aee56928ea

Issues

Description	From	Last Updated
closing bracket does not match indentation of opening bracket's line Column: 17 Error code: E123	reviewbot	2 months, 4 weeks ago
'reviewboard.diffviewer.chunk_generator.DiffLine' imported but unused Column: 5 Error code: F401	reviewbot	2 months, 4 weeks ago
too many blank lines (2) Column: 5 Error code: E303	reviewbot	2 months, 4 weeks ago
undefined name 'none' Column: 42 Error code: F821	reviewbot	2 months, 4 weeks ago
Let's move this and other non-runtime types to a TYPE_CHECKING block.	chipx86	1 month, 3 weeks ago
Can we order the keys in here alphabetically?	chipx86	1 month, 3 weeks ago
I'm trying to understand this set of changes. Bear with me. We were conditionally normalizing, and now we always do. …	chipx86	1 month, 3 weeks ago
All diff operations use i1:i2, j1:j2, and tag as the variables. I'd like to keep these as they were wherever …	chipx86	1 month, 3 weeks ago
Do we have to cast to this? It'd be nice to find away around having to do this for every …	chipx86	1 month, 3 weeks ago
While here, can we pull self.filediff out into a local variable? We access it a lot.	chipx86	1 month, 3 weeks ago
This needs to be a : TypeAlias. Can we define it near the top of the module where we normally …	chipx86	1 month, 3 weeks ago
Let's keep all this as None. It's an explicit "There are no ranges" response, and avoids creating a new object …	chipx86	1 month, 3 weeks ago
The old code was this way to avoid needing to perform comparisons when we know we have our answer. Any …	chipx86	1 month, 3 weeks ago
These can be linenum1 or '' and linenum2 or ''. But I'm not sure this matters, because in the cases …	chipx86	1 month, 3 weeks ago
'typing.Union' imported but unused Column: 1 Error code: F401	reviewbot	1 month, 3 weeks ago
Can we move this into the TYPE_CHECKING?	chipx86	1 month, 2 weeks ago
convert_to_unicode should be before the get_*.	chipx86	1 month, 2 weeks ago
This sentence ("This compatibility ...") is missing a word, I think.	chipx86	1 month, 2 weeks ago
TypeAlias can live in TYPE_CHECKING.	chipx86	1 month, 2 weeks ago
Should use :py:meth: for the reference.	chipx86	2 days, 21 hours ago
There are no open issues

flake8 failed.

JSHint passed.

flake8

reviewboard/diffviewer/chunk_generator.py (Diff revision 1)

The issue has been dropped. Show all issues

closing bracket does not match indentation of opening bracket's line

Column: 17
Error code: E123

reviewboard/diffviewer/templatetags/difftags.py (Diff revision 1)
The issue has been dropped. Show all issues
```
'reviewboard.diffviewer.chunk_generator.DiffLine' imported but unused

Column: 5
Error code: F401
```
reviewboard/diffviewer/tests/test_raw_diff_chunk_generator.py (Diff revision 1)
The issue has been dropped. Show all issues
```
too many blank lines (2)

Column: 5
Error code: E303
```
reviewboard/webapi/tests/test_filediff.py (Diff revision 1)
The issue has been dropped. Show all issues
```
undefined name 'none'

Column: 42
Error code: F821
```

Commits:

	Summary	ID
	Update the types for diff chunks and lines. This change defines a `TypedDict` for diff chunks, and a `NamedTuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but was set to an empty string if it wasn't). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. The `diff_lines` template tag and other consumers now use names instead of indexing magically into the structure, making it much clearer what's going on. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.	c07cac59fee7c3f20506e9999d70f54eb99148c6
	Update the types for diff chunks and lines. This change defines a `TypedDict` for diff chunks, and a `NamedTuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but was set to an empty string if it wasn't). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. The `diff_lines` template tag and other consumers now use names instead of indexing magically into the structure, making it much clearer what's going on. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.	541e055e3978c23ea3bba8f9ee0b67c26fc4134f

Diff:

Revision 2 (+1054 -462)

Show changes

	reviewboard/diffviewer/chunk_generator.py
	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/templatetags/difftags.py
	reviewboard/diffviewer/tests/test_diff_chunk_generator.py
	reviewboard/diffviewer/tests/test_raw_diff_chunk_generator.py
	reviewboard/reviews/tests/test_markdown_review_ui.py
	reviewboard/reviews/tests/test_text_based_review_ui.py
	reviewboard/webapi/tests/test_filediff.py

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

Commits:

	Summary	ID
	Update the types for diff chunks and lines. This change defines a `TypedDict` for diff chunks, and a `NamedTuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but was set to an empty string if it wasn't). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. The `diff_lines` template tag and other consumers now use names instead of indexing magically into the structure, making it much clearer what's going on. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.	541e055e3978c23ea3bba8f9ee0b67c26fc4134f
	Update the types for diff chunks and lines. This change defines a `TypedDict` for diff chunks, and a `NamedTuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but was set to an empty string if it wasn't). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. The `diff_lines` template tag and other consumers now use names instead of indexing magically into the structure, making it much clearer what's going on. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.	b65d93070af3cf1bc40238f661da52ad4eeacd2d

Diff:

Revision 3 (+1044 -452)

Show changes

	reviewboard/diffviewer/chunk_generator.py
	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/templatetags/difftags.py
	reviewboard/diffviewer/tests/test_diff_chunk_generator.py
	reviewboard/diffviewer/tests/test_raw_diff_chunk_generator.py
	reviewboard/reviews/tests/test_markdown_review_ui.py
	reviewboard/reviews/tests/test_text_based_review_ui.py
	reviewboard/webapi/tests/test_filediff.py

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

Commits:

	Summary	ID
	Update the types for diff chunks and lines. This change defines a `TypedDict` for diff chunks, and a `NamedTuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but was set to an empty string if it wasn't). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. The `diff_lines` template tag and other consumers now use names instead of indexing magically into the structure, making it much clearer what's going on. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.	b65d93070af3cf1bc40238f661da52ad4eeacd2d
	Update the types for diff chunks and lines. This change defines a `TypedDict` for diff chunks, and a `NamedTuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but was set to an empty string if it wasn't). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. The `diff_lines` template tag and other consumers now use names instead of indexing magically into the structure, making it much clearer what's going on. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.	e2861454c701dc0aa04163b7aa4a1982233c2cb2

Diff:

Revision 4 (+1044 -452)

Show changes

	reviewboard/diffviewer/chunk_generator.py
	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/templatetags/difftags.py
	reviewboard/diffviewer/tests/test_diff_chunk_generator.py
	reviewboard/diffviewer/tests/test_raw_diff_chunk_generator.py
	reviewboard/reviews/tests/test_markdown_review_ui.py
	reviewboard/reviews/tests/test_text_based_review_ui.py
	reviewboard/webapi/tests/test_filediff.py

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

Commits:

	Summary	ID
	Update the types for diff chunks and lines. This change defines a `TypedDict` for diff chunks, and a `NamedTuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but was set to an empty string if it wasn't). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. The `diff_lines` template tag and other consumers now use names instead of indexing magically into the structure, making it much clearer what's going on. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.	e2861454c701dc0aa04163b7aa4a1982233c2cb2
	Update the types for diff chunks and lines. This change defines a `TypedDict` for diff chunks, and a `NamedTuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but was set to an empty string if it wasn't). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. The `diff_lines` template tag and other consumers now use names instead of indexing magically into the structure, making it much clearer what's going on. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.	111341255393a8d9dc42826a2f5e7e443ddb1ed0

Diff:

Revision 5 (+1044 -452)

Show changes

	reviewboard/diffviewer/chunk_generator.py
	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/templatetags/difftags.py
	reviewboard/diffviewer/tests/test_diff_chunk_generator.py
	reviewboard/diffviewer/tests/test_raw_diff_chunk_generator.py
	reviewboard/reviews/tests/test_markdown_review_ui.py
	reviewboard/reviews/tests/test_text_based_review_ui.py
	reviewboard/webapi/tests/test_filediff.py

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

Commits:

	Summary	ID
	Update the types for diff chunks and lines. This change defines a `TypedDict` for diff chunks, and a `NamedTuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but was set to an empty string if it wasn't). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. The `diff_lines` template tag and other consumers now use names instead of indexing magically into the structure, making it much clearer what's going on. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.	111341255393a8d9dc42826a2f5e7e443ddb1ed0
	Update the types for diff chunks and lines. This change defines a `TypedDict` for diff chunks, and a `NamedTuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but was set to an empty string if it wasn't). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. The `diff_lines` template tag and other consumers now use names instead of indexing magically into the structure, making it much clearer what's going on. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.	c53a6fb50e23eaecc39f7239a5998d6553c13cee

Diff:

Revision 6 (+1044 -452)

Show changes

	reviewboard/diffviewer/chunk_generator.py
	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/templatetags/difftags.py
	reviewboard/diffviewer/tests/test_diff_chunk_generator.py
	reviewboard/diffviewer/tests/test_raw_diff_chunk_generator.py
	reviewboard/reviews/tests/test_markdown_review_ui.py
	reviewboard/reviews/tests/test_text_based_review_ui.py
	reviewboard/webapi/tests/test_filediff.py

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

Commits:

	Summary	ID
	Update the types for diff chunks and lines. This change defines a `TypedDict` for diff chunks, and a `NamedTuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but was set to an empty string if it wasn't). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. The `diff_lines` template tag and other consumers now use names instead of indexing magically into the structure, making it much clearer what's going on. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.	c53a6fb50e23eaecc39f7239a5998d6553c13cee
	Update the types for diff chunks and lines. This change defines a `TypedDict` for diff chunks, and a `NamedTuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but was set to an empty string if it wasn't). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. The `diff_lines` template tag and other consumers now use names instead of indexing magically into the structure, making it much clearer what's going on. This also updates the regions field to always be a list. In some cases we had empty lists, and in others we had `None`. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.	43f82cea94288ecb0ca769be4b38c080b27fd028

Diff:

Revision 7 (+1454 -658)

Show changes

	reviewboard/diffviewer/chunk_generator.py
	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/templatetags/difftags.py
	reviewboard/diffviewer/tests/test_diff_chunk_generator.py
	reviewboard/diffviewer/tests/test_diffutils.py
	reviewboard/diffviewer/tests/test_raw_diff_chunk_generator.py
	reviewboard/reviews/tests/test_markdown_review_ui.py
	reviewboard/reviews/tests/test_text_based_review_ui.py
	1 more

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

Description:: This change defines a TypedDict for diff chunks, and a NamedTuple
for diff lines, with definitions and documentation for each element.

Before, these were just an opaque dictionary, and a heterogenous list.
The diff lines were particularly annoying because they were variable
length (if there was no metadata, that element was just left off the
end), and types were annoying (for "real" line numbers, we'd have an
int if the line was present on that side of the diff, but was set to
an empty string if it wasn't). This change technically breaks
compatibility in those two cases. We now always include the metadata
field, set to None if there's no metadata for the line. In the case of
real line numbers which are not present on one side of the diff (for
example, the line number of an added line for the left-hand side of the
diff), we set the data for that to None.

The diff_lines template tag and other consumers now use names instead
of indexing magically into the structure, making it much clearer what's
going on.

+
This also updates the regions field to always be a list. In some cases
+ we had empty lists, and in others we had None.
+

This change only touches code related to chunks and lines. Other diff
viewer cleanup is coming in later changes.

There are a couple things I'm going to be really mindful of when reviewing anything touching diffs:

Terminology. Diff operations use "tags", "i1:i2" ranges, and "j1:j2" ranges, which don't necessarily mean "lines". That's deliberate. I want to keep those i/j naming wherever possible for diffs when the code is generic. I have some old work I intend to dig up some day to add a higher-level process for diffing some kinds of non-text data.
Performance. Diff code is some of the most expensive code in the codebase. As such, I want to keep things tight where we can, avoid unnecessary work (keep checks optimized, avoid expense if it's just for typing-related benefits). Much of this code is called per-line across potentially tens of thousands of lines or more across a page of diffs, so it matters.
Regressions/changes in behavior. This code's been around a long time and there are a lot of edge cases covered. Plus we have public APIs around this. I want to avoid changing data types where it's not needed.

For reasons 2 and 3, I'm also pretty hesistant to move to a NamedTuple. Nice as it is, I ultimately don't think we should. It's easier to read and follow, but it carries a performance cost for no reason but code maintenance, and while that's not a bad reason, I think the impact on performance in diff code paths make it less of a gain. NamedTuples cost more to construct and cost more to access. Even if it's just a little, we do it a lot.

reviewboard/diffviewer/chunk_generator.py (Diff revision 7)
The issue has been resolved. Show all issues
```
Let's move this and other non-runtime types to a TYPE_CHECKING block.
```
reviewboard/diffviewer/chunk_generator.py (Diff revision 7)
The issue has been resolved. Show all issues
```
Can we order the keys in here alphabetically?
```

reviewboard/diffviewer/chunk_generator.py (Diff revision 7)

The issue has been resolved. Show all issues

I'm trying to understand this set of changes. Bear with me.

We were conditionally normalizing, and now we always do. Looks like before, we weren't using {old,new}_encoding_list if there wasn't a class-level encoding list set, causing us to ignore what was passed in. That's fine.

However, it changes things when the case where neither is provided, forcing encoding to take place to utf-8. I'm not sure if this is right or wrong, but it's not how we had it before. But what we had before may have been broken. Trying to walk through the scenarios.

And then we're joining to strings. I don't think any of this is necessary. We certainly never did it before, and looking through this code, we never touch old, new, old_str, or new_str in the is_lists case. We only care about old_lines and new_lines. So I think we can just delete that code outright, which is good because it's an expense we don't want.

As a tidbit of historical information, at one point I was trying to keep some of this free of assuming byte strings so that we could pass higher-level objects that a Differ could understand, and then generate string results from all that. I have an ancient branch to help get some of this in shape for that, but it's stale. Not too worried about it now, but this is in part the reason we call things a and b and not old_lines and new_lines.

david 1 month, 3 weeks ago

In the constructor, we have:

self.encoding_list = encoding_list or ['iso-8859-15']

So there's always a class-level list, and we wouldn't have skipped it in the past. In the case where theoretically would have skipped the normalize calls, old and new would still have been set to bytes, which would break things later on. I'm not sure iso-8859-15 is the right thing in the modern era, but that's a change I'd want to make much more cautiously.

Joining to strings is something I was doing to make it so old_str, old_bytes, new_str, and new_bytes were always bound. Given that we don't apply syntax highlighting for the is_lists case, I'm just going to rearrange things so this isn't an issue. If we wanted to make it so highlighting could apply in the is_lists case as well, we'd need to create the bytes and str versions of the files. From what I can tell, the lists version is only used for the rendered version of text-based review UIs, where we don't want to apply highlighting.

reviewboard/diffviewer/chunk_generator.py (Diff revision 7)

The issue has been resolved. Show all issues

All diff operations use i1:i2, j1:j2, and tag as the variables. I'd like to keep these as they were wherever we can. They're technically ranges, and not specifically lines (even if we are using them as lines in this use case).

reviewboard/diffviewer/chunk_generator.py (Diff revision 7)

The issue has been resolved. Show all issues

Do we have to cast to this? It'd be nice to find away around having to do this for every range of opcodes.

If we have to, we shouldn't redefine the type for every line. We should have a type alias to refer to. But it'd be nice to figure out how to avoid this entirely.

reviewboard/diffviewer/chunk_generator.py (Diff revision 7)

The issue has been resolved. Show all issues

While here, can we pull self.filediff out into a local variable? We access it a lot.

reviewboard/diffviewer/diffutils.py (Diff revision 7)

The issue has been resolved. Show all issues

This needs to be a : TypeAlias.

Can we define it near the top of the module where we normally define these?

reviewboard/diffviewer/diffutils.py (Diff revision 7)

The issue has been resolved. Show all issues

Let's keep all this as None. It's an explicit "There are no ranges" response, and avoids creating a new object for each time (ideal since this is all called quite a lot).

reviewboard/diffviewer/templatetags/difftags.py (Diff revision 7)

The issue has been resolved. Show all issues

The old code was this way to avoid needing to perform comparisons when we know we have our answer. Any optimizations we can get out of diff code is beneficial, since this is an expensive path.

reviewboard/diffviewer/templatetags/difftags.py (Diff revision 7)

The issue has been resolved. Show all issues

These can be linenum1 or '' and linenum2 or ''.

But I'm not sure this matters, because in the cases where these are None, we don't make use of them in the template. So the None value is fine.

Change Summary:


Switch back to a regular tuple for lines.
Make other requested changes.

Description:

~		This change defines a `TypedDict` for diff chunks, and a `NamedTuple`
~		for diff lines, with definitions and documentation for each element.
	~	This change defines a `TypedDict` for diff chunks, and a `tuple` for
	~	diff lines, with definitions and documentation for each element.

		Before, these were just an opaque dictionary, and a heterogenous list.
		The diff lines were particularly annoying because they were variable
		length (if there was no metadata, that element was just left off the
		end), and types were annoying (for "real" line numbers, we'd have an
~		int if the line was present on that side of the diff, but was set to
~		an empty string if it wasn't). This change technically breaks
~		compatibility in those two cases. We now always include the metadata
	~	int if the line was present on that side of the diff, but an empty
	~	string if not). This change technically breaks compatibility in those
	~	two cases.
-		field, set to `None` if there's no metadata for the line. In the case of
-		real line numbers which are not present on one side of the diff (for
-		example, the line number of an added line for the left-hand side of the
-		diff), we set the data for that to `None`.

~		The `diff_lines` template tag and other consumers now use names instead
~		of indexing magically into the structure, making it much clearer what's
~		going on.
	~	We now always include the metadata field, set to `None` if there's no
	~	metadata for the line. This increases the cached size of a very large
	~	diff with thousands of lines by less that 0.1% (much better than many
	+	years ago with older pickle protocols--the tuple is probably also
	+	helping here).

~		This also updates the regions field to always be a list. In some cases
~		we had empty lists, and in others we had `None`.
	~	In the case of real line numbers which are not present on one side of
	~	the diff (for example, the line number of an added line for the
	+	left-hand side of the diff), we set the data for that to `None`. In
	+	order to maintain compatibility with current versions of Review Bot, we
	+	change `None` line numbers back to an empty string for the filediff
	+	resource's `diff_data` mode.

		This change only touches code related to chunks and lines. Other diff
		viewer cleanup is coming in later changes.

Commits:

	Summary	ID
	Update the types for diff chunks and lines. This change defines a `TypedDict` for diff chunks, and a `NamedTuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but was set to an empty string if it wasn't). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. The `diff_lines` template tag and other consumers now use names instead of indexing magically into the structure, making it much clearer what's going on. This also updates the regions field to always be a list. In some cases we had empty lists, and in others we had `None`. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.	43f82cea94288ecb0ca769be4b38c080b27fd028
	Update the types for diff chunks and lines. This change defines a `TypedDict` for diff chunks, and a `tuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but an empty string if not). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. This increases the cached size of a very large diff with thousands of lines by less that 0.1% (much better than many years ago with older pickle protocols--the tuple is probably also helping here). In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. In order to maintain compatibility with current versions of Review Bot, we change `None` line numbers back to an empty string for the filediff resource's `diff_data` mode. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.	c5d90d8cfabbec35fda0b34155bf69a2f4bf300c

Diff:

Revision 8 (+1498 -616)

Show changes

	reviewboard/diffviewer/chunk_generator.py
	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/templatetags/difftags.py
	reviewboard/diffviewer/tests/test_diff_chunk_generator.py
	reviewboard/diffviewer/tests/test_raw_diff_chunk_generator.py
	reviewboard/reviews/tests/test_markdown_review_ui.py
	reviewboard/reviews/tests/test_text_based_review_ui.py
	reviewboard/webapi/resources/filediff.py
	1 more

Checks run (1 failed, 1 succeeded)

flake8 failed.

JSHint passed.

flake8

reviewboard/diffviewer/chunk_generator.py (Diff revision 8)
The issue has been dropped. Show all issues
```
'typing.Union' imported but unused

Column: 1
Error code: F401
```

Commits:

	Summary	ID
	Update the types for diff chunks and lines. This change defines a `TypedDict` for diff chunks, and a `tuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but an empty string if not). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. This increases the cached size of a very large diff with thousands of lines by less that 0.1% (much better than many years ago with older pickle protocols--the tuple is probably also helping here). In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. In order to maintain compatibility with current versions of Review Bot, we change `None` line numbers back to an empty string for the filediff resource's `diff_data` mode. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.	c5d90d8cfabbec35fda0b34155bf69a2f4bf300c
	Update the types for diff chunks and lines. This change defines a `TypedDict` for diff chunks, and a `tuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but an empty string if not). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. This increases the cached size of a very large diff with thousands of lines by less that 0.1% (much better than many years ago with older pickle protocols--the tuple is probably also helping here). In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. In order to maintain compatibility with current versions of Review Bot, we change `None` line numbers back to an empty string for the filediff resource's `diff_data` mode. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.	0f3328ecb62eb5bcd98f8721e4dca0bebe733e84

Diff:

Revision 9 (+1498 -616)

Show changes

	reviewboard/diffviewer/chunk_generator.py
	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/templatetags/difftags.py
	reviewboard/diffviewer/tests/test_diff_chunk_generator.py
	reviewboard/diffviewer/tests/test_raw_diff_chunk_generator.py
	reviewboard/reviews/tests/test_markdown_review_ui.py
	reviewboard/reviews/tests/test_text_based_review_ui.py
	reviewboard/webapi/resources/filediff.py
	1 more

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

reviewboard/diffviewer/chunk_generator.py (Diff revision 9)
The issue has been resolved. Show all issues
```
Can we move this into the TYPE_CHECKING?
```
reviewboard/diffviewer/chunk_generator.py (Diff revision 9)
The issue has been resolved. Show all issues
```
convert_to_unicode should be before the get_*.
```
reviewboard/diffviewer/chunk_generator.py (Diff revision 9)
The issue has been resolved. Show all issues
```
This sentence ("This compatibility ...") is missing a word, I think.
```
reviewboard/diffviewer/diffutils.py (Diff revision 9)
The issue has been resolved. Show all issues
```
TypeAlias can live in TYPE_CHECKING.
```

Commits:

	Summary	ID
	Update the types for diff chunks and lines. This change defines a `TypedDict` for diff chunks, and a `tuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but an empty string if not). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. This increases the cached size of a very large diff with thousands of lines by less that 0.1% (much better than many years ago with older pickle protocols--the tuple is probably also helping here). In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. In order to maintain compatibility with current versions of Review Bot, we change `None` line numbers back to an empty string for the filediff resource's `diff_data` mode. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.	0f3328ecb62eb5bcd98f8721e4dca0bebe733e84
	Update the types for diff chunks and lines. This change defines a `TypedDict` for diff chunks, and a `tuple` for diff lines, with definitions and documentation for each element. Before, these were just an opaque dictionary, and a heterogenous list. The diff lines were particularly annoying because they were variable length (if there was no metadata, that element was just left off the end), and types were annoying (for "real" line numbers, we'd have an int if the line was present on that side of the diff, but an empty string if not). This change technically breaks compatibility in those two cases. We now always include the metadata field, set to `None` if there's no metadata for the line. This increases the cached size of a very large diff with thousands of lines by less that 0.1% (much better than many years ago with older pickle protocols--the tuple is probably also helping here). In the case of real line numbers which are not present on one side of the diff (for example, the line number of an added line for the left-hand side of the diff), we set the data for that to `None`. In order to maintain compatibility with current versions of Review Bot, we change `None` line numbers back to an empty string for the filediff resource's `diff_data` mode. This change only touches code related to chunks and lines. Other diff viewer cleanup is coming in later changes. Testing Done: - Tested a ton of different diffs, both in the diff viewer as well as the text and markdown Review UIs. Verified that all types of edits (added/edited/deleted/moved) lines all worked as expected. - Ran unit tests.	9daae1bba689c7145420b91cfb69c0aee56928ea

Diff:

Revision 10 (+1502 -616)

Show changes

	reviewboard/diffviewer/chunk_generator.py
	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/templatetags/difftags.py
	reviewboard/diffviewer/tests/test_diff_chunk_generator.py
	reviewboard/diffviewer/tests/test_raw_diff_chunk_generator.py
	reviewboard/reviews/tests/test_markdown_review_ui.py
	reviewboard/reviews/tests/test_text_based_review_ui.py
	reviewboard/webapi/resources/filediff.py
	1 more

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

Ship it!

reviewboard/diffviewer/chunk_generator.py (Diff revision 10)
The issue has been resolved. Show all issues
```
Should use :py:meth: for the reference.
```

Status:: Completed
Change Summary:: Pushed to master (802810e)