Summary

Make SVN revision parsing a little more complete and reliable.

Review Request #13580 — Created Feb. 27, 2024 and submitted April 1, 2024, 10:35 p.m.

Information

Owner

david

Repository

Review Board

Branch

release-7.x

Bugs

Depends On

Reviewers

Groups

reviewboard

People

Description

The way we deal with parsing SVN revisions had two major problems.

First, SVN diffs traditionally didn't have any revision information for
binary files. I'm making a change in RBTools to add the --force
argument to svn diff, which will end up including that in the "Binary
files ... differ" line, but that needed to be added to the parser.

The more general problem is that we didn't have any parsed revision
information for the modified version of the file. This meant that
ParsedDiffFile.modified_file_details / FileDiff.dest_detail would
just be whatever the raw string was in the diff, whether it was a real
revision, "(nonexistent)", or "(working copy)".

Our current design of the way diffs parse is that the SCMTool's
DiffParser is responsible for most of the actual parsing. The
create_filediffs method would then call the tool's
parse_diff_revision method to do further parsing on the filename and
revision, but only for the original file/revision, not the modified.

This change moves things around so that all parsing happens within the
SVNDiffParser class. We do the same revision parsing as before, but
it's now applied to both the original and the modified versions. We can
also then make some better determinations about how to handle things
like IntelliJ's awkward revisionless revision labels, since we can
process that when we have the full context of the file header instead of
acting on a filename/revision in isolation without knowing whether it's
for the original or modified file.

My thinking is that this should become the general pattern. We should
have a goal of basically deprecating SCMTool.parse_diff_revision
entirely, and move all revision parsing into the diff parsers.

Testing Done


Ran unit tests.
Posted a variety of changes, including some with binary files in both

  the working copy and in revision ranges. Saw everything get parsed and

  displayed correctly.

Commits

Summary	ID
Make SVN revision parsing a little more complete and reliable. The way we deal with parsing SVN revisions had two major problems. First, SVN diffs traditionally didn't have any revision information for binary files. I'm making a change in RBTools to add the `--force` argument to `svn diff`, which will end up including that in the "Binary files ... differ" line, but that needed to be added to the parser. The more general problem is that we didn't have any parsed revision information for the modified version of the file. This meant that `ParsedDiffFile.modified_file_details` / `FileDiff.dest_detail` would just be whatever the raw string was in the diff, whether it was a real revision, "(nonexistent)", or "(working copy)". Our current design of the way diffs parse is that the SCMTool's DiffParser is responsible for most of the actual parsing. The `create_filediffs` method would then call the tool's `parse_diff_revision` method to do further parsing on the filename and revision, but only for the original file/revision, not the modified. This change moves things around so that all parsing happens within the `SVNDiffParser` class. We do the same revision parsing as before, but it's now applied to both the original and the modified versions. We can also then make some better determinations about how to handle things like IntelliJ's awkward revisionless revision labels, since we can process that when we have the full context of the file header instead of acting on a filename/revision in isolation without knowing whether it's for the original or modified file. My thinking is that this should become the general pattern. We should have a goal of basically deprecating `SCMTool.parse_diff_revision` entirely, and move all revision parsing into the diff parsers. Testing Done: - Ran unit tests. - Posted a variety of changes, including some with binary files in both the working copy and in revision ranges. Saw everything get parsed and displayed correctly. Reviewed at https://reviews.reviewboard.org/r/13580/	fa21f08fa7e6b9c303941a38d5e938d0a9be374b

Summary

Make SVN revision parsing a little more complete and reliable.

The way we deal with parsing SVN revisions had two major problems. First, SVN diffs traditionally didn't have any revision information for binary files. I'm making a change in RBTools to add the `--force` argument to `svn diff`, which will end up including that in the "Binary files ... differ" line, but that needed to be added to the parser. The more general problem is that we didn't have any parsed revision information for the modified version of the file. This meant that `ParsedDiffFile.modified_file_details` / `FileDiff.dest_detail` would just be whatever the raw string was in the diff, whether it was a real revision, "(nonexistent)", or "(working copy)". Our current design of the way diffs parse is that the SCMTool's DiffParser is responsible for most of the actual parsing. The `create_filediffs` method would then call the tool's `parse_diff_revision` method to do further parsing on the filename and revision, but only for the original file/revision, not the modified. This change moves things around so that all parsing happens within the `SVNDiffParser` class. We do the same revision parsing as before, but it's now applied to both the original and the modified versions. We can also then make some better determinations about how to handle things like IntelliJ's awkward revisionless revision labels, since we can process that when we have the full context of the file header instead of acting on a filename/revision in isolation without knowing whether it's for the original or modified file. My thinking is that this should become the general pattern. We should have a goal of basically deprecating `SCMTool.parse_diff_revision` entirely, and move all revision parsing into the diff parsers. Testing Done: - Ran unit tests. - Posted a variety of changes, including some with binary files in both the working copy and in revision ranges. Saw everything get parsed and displayed correctly. Reviewed at https://reviews.reviewboard.org/r/13580/

fa21f08fa7e6b9c303941a38d5e938d0a9be374b

Issues

Description	From	Last Updated
'typing.cast' imported but unused Column: 1 Error code: F401	reviewbot	Feb. 27, 2024, 9:13 a.m.
'reviewboard.scmtools.models.Repository' imported but unused Column: 1 Error code: F401	reviewbot	Feb. 27, 2024, 9:14 a.m.
I think we want AnyStr, as that'll ensure that the type (str or bytes) being used in data will be …	chipx86	March 4, 2024, 7:02 p.m.
Missing a trailing period.	chipx86	March 4, 2024, 7:02 p.m.
While adding this, can you also add types for base_commit_id, new_commit_id, parsed_diff, and parsed_diff_change?	chipx86	March 4, 2024, 7:04 p.m.
Since we're making changes to all this, can we cache the results of these compiles somewhere? It'd be nice to …	chipx86	March 4, 2024, 7:19 p.m.
This feels overly verbose. It's a pretty simple regex.	chipx86	March 4, 2024, 7:14 p.m.
Can these be combined to one line now?	chipx86	March 4, 2024, 7:33 p.m.
Missing a trailing period.	chipx86	March 4, 2024, 7:33 p.m.
'typing.Union' imported but unused Column: 1 Error code: F401	reviewbot	March 4, 2024, 7:34 p.m.
I think the type checkers all infer the type when they're initiated with a cast.	chipx86	March 26, 2024, 10:06 p.m.
These should be indented one more level, I think. Are the comments aligned? Or is that just an issue in …	chipx86	March 26, 2024, 10:07 p.m.
Docs are missing a Raises for SCMError.	chipx86	March 26, 2024, 10:08 p.m.
The tuple contents need to be indented one more level.	chipx86	March 26, 2024, 10:08 p.m.
Alignment issue with the parameters.	chipx86	March 26, 2024, 10:08 p.m.

flake8 failed.

JSHint passed.

flake8

reviewboard/diffviewer/parser.py (Diff revision 1)
The issue has been resolved. Show all issues
```
'typing.cast' imported but unused

Column: 1
Error code: F401
```
reviewboard/scmtools/tests/test_svn.py (Diff revision 1)
The issue has been resolved. Show all issues
```
'reviewboard.scmtools.models.Repository' imported but unused

Column: 1
Error code: F401
```

Commits:

	Summary	ID
	Make SVN revision parsing a little more complete and reliable. The way we deal with parsing SVN revisions had two major problems. First, SVN diffs traditionally didn't have any revision information for binary files. I'm making a change in RBTools to add the `--force` argument to `svn diff`, which will end up including that in the "Binary files ... differ" line, but that needed to be added to the parser. The more general problem is that we didn't have any parsed revision information for the modified version of the file. This meant that `ParsedDiffFile.modified_file_details` / `FileDiff.dest_detail` would just be whatever the raw string was in the diff, whether it was a real revision, "(nonexistent)", or "(working copy)". Our current design of the way diffs parse is that the SCMTool's DiffParser is responsible for most of the actual parsing. The `create_filediffs` method would then call the tool's `parse_diff_revision` method to do further parsing on the filename and revision, but only for the original file/revision, not the modified. This change moves things around so that all parsing happens within the `SVNDiffParser` class. We do the same revision parsing as before, but it's now applied to both the original and the modified versions. We can also then make some better determinations about how to handle things like IntelliJ's awkward revisionless revision labels, since we can process that when we have the full context of the file header instead of acting on a filename/revision in isolation without knowing whether it's for the original or modified file. My thinking is that this should become the general pattern. We should have a goal of basically deprecating `SCMTool.parse_diff_revision` entirely, and move all revision parsing into the diff parsers. Testing Done: - Ran unit tests. - Posted a variety of changes, including some with binary files in both the working copy and in revision ranges. Saw everything get parsed and displayed correctly.	9bdca9b922c074523dab967e9640da6dbe6e4f38
	Make SVN revision parsing a little more complete and reliable. The way we deal with parsing SVN revisions had two major problems. First, SVN diffs traditionally didn't have any revision information for binary files. I'm making a change in RBTools to add the `--force` argument to `svn diff`, which will end up including that in the "Binary files ... differ" line, but that needed to be added to the parser. The more general problem is that we didn't have any parsed revision information for the modified version of the file. This meant that `ParsedDiffFile.modified_file_details` / `FileDiff.dest_detail` would just be whatever the raw string was in the diff, whether it was a real revision, "(nonexistent)", or "(working copy)". Our current design of the way diffs parse is that the SCMTool's DiffParser is responsible for most of the actual parsing. The `create_filediffs` method would then call the tool's `parse_diff_revision` method to do further parsing on the filename and revision, but only for the original file/revision, not the modified. This change moves things around so that all parsing happens within the `SVNDiffParser` class. We do the same revision parsing as before, but it's now applied to both the original and the modified versions. We can also then make some better determinations about how to handle things like IntelliJ's awkward revisionless revision labels, since we can process that when we have the full context of the file header instead of acting on a filename/revision in isolation without knowing whether it's for the original or modified file. My thinking is that this should become the general pattern. We should have a goal of basically deprecating `SCMTool.parse_diff_revision` entirely, and move all revision parsing into the diff parsers. Testing Done: - Ran unit tests. - Posted a variety of changes, including some with binary files in both the working copy and in revision ranges. Saw everything get parsed and displayed correctly.	80048677a803b302935d064e018445bf0684e62a

Diff:

Revision 2 (+870 -406)

Show changes

	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/parser.py
	reviewboard/diffviewer/tests/test_filediff_creator.py
	reviewboard/scmtools/svn/__init__.py
	reviewboard/scmtools/tests/test_svn.py

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

Ship it!

```
Ship It!
```

reviewboard/diffviewer/diffutils.py (Diff revision 2)

The issue has been resolved. Show all issues

I think we want AnyStr, as that'll ensure that the type (str or bytes) being used in data will be the same type as the that in the resulting list.

reviewboard/diffviewer/parser.py (Diff revision 2)
The issue has been resolved. Show all issues
```
Missing a trailing period.
```

reviewboard/diffviewer/parser.py (Diff revision 2)

The issue has been resolved. Show all issues

While adding this, can you also add types for base_commit_id, new_commit_id, parsed_diff, and parsed_diff_change?

reviewboard/scmtools/svn/__init__.py (Diff revision 2)

The issue has been resolved. Show all issues

Since we're making changes to all this, can we cache the results of these compiles somewhere? It'd be nice to not have to do this every time we instantiate (but also not unconditionally do it in the module).

reviewboard/scmtools/svn/__init__.py (Diff revision 2)

The issue has been resolved. Show all issues

This feels overly verbose. It's a pretty simple regex.

reviewboard/scmtools/svn/__init__.py (Diff revision 2)
The issue has been resolved. Show all issues
```
Can these be combined to one line now?
```
reviewboard/scmtools/svn/__init__.py (Diff revision 2)
The issue has been resolved. Show all issues
```
Missing a trailing period.
```

Commits:

	Summary	ID
	Make SVN revision parsing a little more complete and reliable. The way we deal with parsing SVN revisions had two major problems. First, SVN diffs traditionally didn't have any revision information for binary files. I'm making a change in RBTools to add the `--force` argument to `svn diff`, which will end up including that in the "Binary files ... differ" line, but that needed to be added to the parser. The more general problem is that we didn't have any parsed revision information for the modified version of the file. This meant that `ParsedDiffFile.modified_file_details` / `FileDiff.dest_detail` would just be whatever the raw string was in the diff, whether it was a real revision, "(nonexistent)", or "(working copy)". Our current design of the way diffs parse is that the SCMTool's DiffParser is responsible for most of the actual parsing. The `create_filediffs` method would then call the tool's `parse_diff_revision` method to do further parsing on the filename and revision, but only for the original file/revision, not the modified. This change moves things around so that all parsing happens within the `SVNDiffParser` class. We do the same revision parsing as before, but it's now applied to both the original and the modified versions. We can also then make some better determinations about how to handle things like IntelliJ's awkward revisionless revision labels, since we can process that when we have the full context of the file header instead of acting on a filename/revision in isolation without knowing whether it's for the original or modified file. My thinking is that this should become the general pattern. We should have a goal of basically deprecating `SCMTool.parse_diff_revision` entirely, and move all revision parsing into the diff parsers. Testing Done: - Ran unit tests. - Posted a variety of changes, including some with binary files in both the working copy and in revision ranges. Saw everything get parsed and displayed correctly.	80048677a803b302935d064e018445bf0684e62a
	Make SVN revision parsing a little more complete and reliable. The way we deal with parsing SVN revisions had two major problems. First, SVN diffs traditionally didn't have any revision information for binary files. I'm making a change in RBTools to add the `--force` argument to `svn diff`, which will end up including that in the "Binary files ... differ" line, but that needed to be added to the parser. The more general problem is that we didn't have any parsed revision information for the modified version of the file. This meant that `ParsedDiffFile.modified_file_details` / `FileDiff.dest_detail` would just be whatever the raw string was in the diff, whether it was a real revision, "(nonexistent)", or "(working copy)". Our current design of the way diffs parse is that the SCMTool's DiffParser is responsible for most of the actual parsing. The `create_filediffs` method would then call the tool's `parse_diff_revision` method to do further parsing on the filename and revision, but only for the original file/revision, not the modified. This change moves things around so that all parsing happens within the `SVNDiffParser` class. We do the same revision parsing as before, but it's now applied to both the original and the modified versions. We can also then make some better determinations about how to handle things like IntelliJ's awkward revisionless revision labels, since we can process that when we have the full context of the file header instead of acting on a filename/revision in isolation without knowing whether it's for the original or modified file. My thinking is that this should become the general pattern. We should have a goal of basically deprecating `SCMTool.parse_diff_revision` entirely, and move all revision parsing into the diff parsers. Testing Done: - Ran unit tests. - Posted a variety of changes, including some with binary files in both the working copy and in revision ranges. Saw everything get parsed and displayed correctly.	853aaf5ecffe5fb3c6d21857ea0f833b4d307056

Diff:

Revision 3 (+904 -408)

Show changes

	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/parser.py
	reviewboard/diffviewer/tests/test_filediff_creator.py
	reviewboard/scmtools/svn/__init__.py
	reviewboard/scmtools/tests/test_svn.py

Checks run (1 failed, 1 succeeded)

flake8 failed.

JSHint passed.

flake8

reviewboard/diffviewer/diffutils.py (Diff revision 3)
The issue has been dropped. Show all issues
```
'typing.Union' imported but unused

Column: 1
Error code: F401
```

Commits:

	Summary	ID
	Make SVN revision parsing a little more complete and reliable. The way we deal with parsing SVN revisions had two major problems. First, SVN diffs traditionally didn't have any revision information for binary files. I'm making a change in RBTools to add the `--force` argument to `svn diff`, which will end up including that in the "Binary files ... differ" line, but that needed to be added to the parser. The more general problem is that we didn't have any parsed revision information for the modified version of the file. This meant that `ParsedDiffFile.modified_file_details` / `FileDiff.dest_detail` would just be whatever the raw string was in the diff, whether it was a real revision, "(nonexistent)", or "(working copy)". Our current design of the way diffs parse is that the SCMTool's DiffParser is responsible for most of the actual parsing. The `create_filediffs` method would then call the tool's `parse_diff_revision` method to do further parsing on the filename and revision, but only for the original file/revision, not the modified. This change moves things around so that all parsing happens within the `SVNDiffParser` class. We do the same revision parsing as before, but it's now applied to both the original and the modified versions. We can also then make some better determinations about how to handle things like IntelliJ's awkward revisionless revision labels, since we can process that when we have the full context of the file header instead of acting on a filename/revision in isolation without knowing whether it's for the original or modified file. My thinking is that this should become the general pattern. We should have a goal of basically deprecating `SCMTool.parse_diff_revision` entirely, and move all revision parsing into the diff parsers. Testing Done: - Ran unit tests. - Posted a variety of changes, including some with binary files in both the working copy and in revision ranges. Saw everything get parsed and displayed correctly.	853aaf5ecffe5fb3c6d21857ea0f833b4d307056
	Make SVN revision parsing a little more complete and reliable. The way we deal with parsing SVN revisions had two major problems. First, SVN diffs traditionally didn't have any revision information for binary files. I'm making a change in RBTools to add the `--force` argument to `svn diff`, which will end up including that in the "Binary files ... differ" line, but that needed to be added to the parser. The more general problem is that we didn't have any parsed revision information for the modified version of the file. This meant that `ParsedDiffFile.modified_file_details` / `FileDiff.dest_detail` would just be whatever the raw string was in the diff, whether it was a real revision, "(nonexistent)", or "(working copy)". Our current design of the way diffs parse is that the SCMTool's DiffParser is responsible for most of the actual parsing. The `create_filediffs` method would then call the tool's `parse_diff_revision` method to do further parsing on the filename and revision, but only for the original file/revision, not the modified. This change moves things around so that all parsing happens within the `SVNDiffParser` class. We do the same revision parsing as before, but it's now applied to both the original and the modified versions. We can also then make some better determinations about how to handle things like IntelliJ's awkward revisionless revision labels, since we can process that when we have the full context of the file header instead of acting on a filename/revision in isolation without knowing whether it's for the original or modified file. My thinking is that this should become the general pattern. We should have a goal of basically deprecating `SCMTool.parse_diff_revision` entirely, and move all revision parsing into the diff parsers. Testing Done: - Ran unit tests. - Posted a variety of changes, including some with binary files in both the working copy and in revision ranges. Saw everything get parsed and displayed correctly.	bedfee3d660e57fb767095b43692063d23393997

Diff:

Revision 4 (+904 -408)

Show changes

	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/parser.py
	reviewboard/diffviewer/tests/test_filediff_creator.py
	reviewboard/scmtools/svn/__init__.py
	reviewboard/scmtools/tests/test_svn.py

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

reviewboard/scmtools/svn/__init__.py (Diff revision 4)
The issue has been resolved. Show all issues
```
I think the type checkers all infer the type when they're initiated with a cast.
```

reviewboard/scmtools/svn/__init__.py (Diff revision 4)

The issue has been resolved. Show all issues

These should be indented one more level, I think.

Are the comments aligned? Or is that just an issue in the web UI?

david March 26, 2024, 10:08 p.m.

They are aligned properly in a terminal.

reviewboard/scmtools/svn/__init__.py (Diff revision 4)
The issue has been resolved. Show all issues
```
Docs are missing a Raises for SCMError.
```

reviewboard/scmtools/svn/__init__.py (Diff revision 4)

The issue has been resolved. Show all issues

The tuple contents need to be indented one more level.

reviewboard/scmtools/tests/test_svn.py (Diff revision 4)
The issue has been resolved. Show all issues
```
Alignment issue with the parameters.
```

Commits:

	Summary	ID
	Make SVN revision parsing a little more complete and reliable. The way we deal with parsing SVN revisions had two major problems. First, SVN diffs traditionally didn't have any revision information for binary files. I'm making a change in RBTools to add the `--force` argument to `svn diff`, which will end up including that in the "Binary files ... differ" line, but that needed to be added to the parser. The more general problem is that we didn't have any parsed revision information for the modified version of the file. This meant that `ParsedDiffFile.modified_file_details` / `FileDiff.dest_detail` would just be whatever the raw string was in the diff, whether it was a real revision, "(nonexistent)", or "(working copy)". Our current design of the way diffs parse is that the SCMTool's DiffParser is responsible for most of the actual parsing. The `create_filediffs` method would then call the tool's `parse_diff_revision` method to do further parsing on the filename and revision, but only for the original file/revision, not the modified. This change moves things around so that all parsing happens within the `SVNDiffParser` class. We do the same revision parsing as before, but it's now applied to both the original and the modified versions. We can also then make some better determinations about how to handle things like IntelliJ's awkward revisionless revision labels, since we can process that when we have the full context of the file header instead of acting on a filename/revision in isolation without knowing whether it's for the original or modified file. My thinking is that this should become the general pattern. We should have a goal of basically deprecating `SCMTool.parse_diff_revision` entirely, and move all revision parsing into the diff parsers. Testing Done: - Ran unit tests. - Posted a variety of changes, including some with binary files in both the working copy and in revision ranges. Saw everything get parsed and displayed correctly.	bedfee3d660e57fb767095b43692063d23393997
	Make SVN revision parsing a little more complete and reliable. The way we deal with parsing SVN revisions had two major problems. First, SVN diffs traditionally didn't have any revision information for binary files. I'm making a change in RBTools to add the `--force` argument to `svn diff`, which will end up including that in the "Binary files ... differ" line, but that needed to be added to the parser. The more general problem is that we didn't have any parsed revision information for the modified version of the file. This meant that `ParsedDiffFile.modified_file_details` / `FileDiff.dest_detail` would just be whatever the raw string was in the diff, whether it was a real revision, "(nonexistent)", or "(working copy)". Our current design of the way diffs parse is that the SCMTool's DiffParser is responsible for most of the actual parsing. The `create_filediffs` method would then call the tool's `parse_diff_revision` method to do further parsing on the filename and revision, but only for the original file/revision, not the modified. This change moves things around so that all parsing happens within the `SVNDiffParser` class. We do the same revision parsing as before, but it's now applied to both the original and the modified versions. We can also then make some better determinations about how to handle things like IntelliJ's awkward revisionless revision labels, since we can process that when we have the full context of the file header instead of acting on a filename/revision in isolation without knowing whether it's for the original or modified file. My thinking is that this should become the general pattern. We should have a goal of basically deprecating `SCMTool.parse_diff_revision` entirely, and move all revision parsing into the diff parsers. Testing Done: - Ran unit tests. - Posted a variety of changes, including some with binary files in both the working copy and in revision ranges. Saw everything get parsed and displayed correctly. Reviewed at https://reviews.reviewboard.org/r/13580/	fa21f08fa7e6b9c303941a38d5e938d0a9be374b

Diff:

Revision 5 (+910 -408)

Show changes

	reviewboard/diffviewer/diffutils.py
	reviewboard/diffviewer/parser.py
	reviewboard/diffviewer/tests/test_filediff_creator.py
	reviewboard/scmtools/svn/__init__.py
	reviewboard/scmtools/tests/test_svn.py

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

Ship it!

```
Ship It!
```

Status:: Completed
Change Summary:: Pushed to release-7.x (5fa72d6)