Make SVN revision parsing a little more complete and reliable.

Review Request #13580 — Created Feb. 27, 2024 and submitted

Information

Review Board
release-7.x

Reviewers

The way we deal with parsing SVN revisions had two major problems.

First, SVN diffs traditionally didn't have any revision information for
binary files. I'm making a change in RBTools to add the --force
argument to svn diff, which will end up including that in the "Binary
files ... differ" line, but that needed to be added to the parser.

The more general problem is that we didn't have any parsed revision
information for the modified version of the file. This meant that
ParsedDiffFile.modified_file_details / FileDiff.dest_detail would
just be whatever the raw string was in the diff, whether it was a real
revision, "(nonexistent)", or "(working copy)".

Our current design of the way diffs parse is that the SCMTool's
DiffParser is responsible for most of the actual parsing. The
create_filediffs method would then call the tool's
parse_diff_revision method to do further parsing on the filename and
revision, but only for the original file/revision, not the modified.

This change moves things around so that all parsing happens within the
SVNDiffParser class. We do the same revision parsing as before, but
it's now applied to both the original and the modified versions. We can
also then make some better determinations about how to handle things
like IntelliJ's awkward revisionless revision labels, since we can
process that when we have the full context of the file header instead of
acting on a filename/revision in isolation without knowing whether it's
for the original or modified file.

My thinking is that this should become the general pattern. We should
have a goal of basically deprecating SCMTool.parse_diff_revision
entirely, and move all revision parsing into the diff parsers.

  • Ran unit tests.
  • Posted a variety of changes, including some with binary files in both
    the working copy and in revision ranges. Saw everything get parsed and
    displayed correctly.
Summary ID
Make SVN revision parsing a little more complete and reliable.
The way we deal with parsing SVN revisions had two major problems. First, SVN diffs traditionally didn't have any revision information for binary files. I'm making a change in RBTools to add the `--force` argument to `svn diff`, which will end up including that in the "Binary files ... differ" line, but that needed to be added to the parser. The more general problem is that we didn't have any parsed revision information for the modified version of the file. This meant that `ParsedDiffFile.modified_file_details` / `FileDiff.dest_detail` would just be whatever the raw string was in the diff, whether it was a real revision, "(nonexistent)", or "(working copy)". Our current design of the way diffs parse is that the SCMTool's DiffParser is responsible for most of the actual parsing. The `create_filediffs` method would then call the tool's `parse_diff_revision` method to do further parsing on the filename and revision, but only for the original file/revision, not the modified. This change moves things around so that all parsing happens within the `SVNDiffParser` class. We do the same revision parsing as before, but it's now applied to both the original and the modified versions. We can also then make some better determinations about how to handle things like IntelliJ's awkward revisionless revision labels, since we can process that when we have the full context of the file header instead of acting on a filename/revision in isolation without knowing whether it's for the original or modified file. My thinking is that this should become the general pattern. We should have a goal of basically deprecating `SCMTool.parse_diff_revision` entirely, and move all revision parsing into the diff parsers. Testing Done: - Ran unit tests. - Posted a variety of changes, including some with binary files in both the working copy and in revision ranges. Saw everything get parsed and displayed correctly. Reviewed at https://reviews.reviewboard.org/r/13580/
fa21f08fa7e6b9c303941a38d5e938d0a9be374b
Description From Last Updated

'typing.cast' imported but unused Column: 1 Error code: F401

reviewbotreviewbot

'reviewboard.scmtools.models.Repository' imported but unused Column: 1 Error code: F401

reviewbotreviewbot

I think we want AnyStr, as that'll ensure that the type (str or bytes) being used in data will be …

chipx86chipx86

Missing a trailing period.

chipx86chipx86

While adding this, can you also add types for base_commit_id, new_commit_id, parsed_diff, and parsed_diff_change?

chipx86chipx86

Since we're making changes to all this, can we cache the results of these compiles somewhere? It'd be nice to …

chipx86chipx86

This feels overly verbose. It's a pretty simple regex.

chipx86chipx86

Can these be combined to one line now?

chipx86chipx86

Missing a trailing period.

chipx86chipx86

'typing.Union' imported but unused Column: 1 Error code: F401

reviewbotreviewbot

I think the type checkers all infer the type when they're initiated with a cast.

chipx86chipx86

These should be indented one more level, I think. Are the comments aligned? Or is that just an issue in …

chipx86chipx86

Docs are missing a Raises for SCMError.

chipx86chipx86

The tuple contents need to be indented one more level.

chipx86chipx86

Alignment issue with the parameters.

chipx86chipx86
Checks run (1 failed, 1 succeeded)
flake8 failed.
JSHint passed.

flake8

david
maubin
  1. Ship It!
  2. 
      
chipx86
  1. 
      
  2. reviewboard/diffviewer/diffutils.py (Diff revision 2)
     
     
     
     
    Show all issues

    I think we want AnyStr, as that'll ensure that the type (str or bytes) being used in data will be the same type as the that in the resulting list.

  3. reviewboard/diffviewer/parser.py (Diff revision 2)
     
     
    Show all issues

    Missing a trailing period.

  4. reviewboard/diffviewer/parser.py (Diff revision 2)
     
     
     
    Show all issues

    While adding this, can you also add types for base_commit_id, new_commit_id, parsed_diff, and parsed_diff_change?

  5. reviewboard/scmtools/svn/__init__.py (Diff revision 2)
     
     
    Show all issues

    Since we're making changes to all this, can we cache the results of these compiles somewhere? It'd be nice to not have to do this every time we instantiate (but also not unconditionally do it in the module).

  6. reviewboard/scmtools/svn/__init__.py (Diff revision 2)
     
     
     
     
     
     
     
     
     
     
     
    Show all issues

    This feels overly verbose. It's a pretty simple regex.

  7. reviewboard/scmtools/svn/__init__.py (Diff revision 2)
     
     
     
    Show all issues

    Can these be combined to one line now?

  8. reviewboard/scmtools/svn/__init__.py (Diff revision 2)
     
     
    Show all issues

    Missing a trailing period.

  9. 
      
david
Review request changed
Commits:
Summary ID
Make SVN revision parsing a little more complete and reliable.
The way we deal with parsing SVN revisions had two major problems. First, SVN diffs traditionally didn't have any revision information for binary files. I'm making a change in RBTools to add the `--force` argument to `svn diff`, which will end up including that in the "Binary files ... differ" line, but that needed to be added to the parser. The more general problem is that we didn't have any parsed revision information for the modified version of the file. This meant that `ParsedDiffFile.modified_file_details` / `FileDiff.dest_detail` would just be whatever the raw string was in the diff, whether it was a real revision, "(nonexistent)", or "(working copy)". Our current design of the way diffs parse is that the SCMTool's DiffParser is responsible for most of the actual parsing. The `create_filediffs` method would then call the tool's `parse_diff_revision` method to do further parsing on the filename and revision, but only for the original file/revision, not the modified. This change moves things around so that all parsing happens within the `SVNDiffParser` class. We do the same revision parsing as before, but it's now applied to both the original and the modified versions. We can also then make some better determinations about how to handle things like IntelliJ's awkward revisionless revision labels, since we can process that when we have the full context of the file header instead of acting on a filename/revision in isolation without knowing whether it's for the original or modified file. My thinking is that this should become the general pattern. We should have a goal of basically deprecating `SCMTool.parse_diff_revision` entirely, and move all revision parsing into the diff parsers. Testing Done: - Ran unit tests. - Posted a variety of changes, including some with binary files in both the working copy and in revision ranges. Saw everything get parsed and displayed correctly.
80048677a803b302935d064e018445bf0684e62a
Make SVN revision parsing a little more complete and reliable.
The way we deal with parsing SVN revisions had two major problems. First, SVN diffs traditionally didn't have any revision information for binary files. I'm making a change in RBTools to add the `--force` argument to `svn diff`, which will end up including that in the "Binary files ... differ" line, but that needed to be added to the parser. The more general problem is that we didn't have any parsed revision information for the modified version of the file. This meant that `ParsedDiffFile.modified_file_details` / `FileDiff.dest_detail` would just be whatever the raw string was in the diff, whether it was a real revision, "(nonexistent)", or "(working copy)". Our current design of the way diffs parse is that the SCMTool's DiffParser is responsible for most of the actual parsing. The `create_filediffs` method would then call the tool's `parse_diff_revision` method to do further parsing on the filename and revision, but only for the original file/revision, not the modified. This change moves things around so that all parsing happens within the `SVNDiffParser` class. We do the same revision parsing as before, but it's now applied to both the original and the modified versions. We can also then make some better determinations about how to handle things like IntelliJ's awkward revisionless revision labels, since we can process that when we have the full context of the file header instead of acting on a filename/revision in isolation without knowing whether it's for the original or modified file. My thinking is that this should become the general pattern. We should have a goal of basically deprecating `SCMTool.parse_diff_revision` entirely, and move all revision parsing into the diff parsers. Testing Done: - Ran unit tests. - Posted a variety of changes, including some with binary files in both the working copy and in revision ranges. Saw everything get parsed and displayed correctly.
853aaf5ecffe5fb3c6d21857ea0f833b4d307056

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

david
chipx86
  1. 
      
  2. reviewboard/scmtools/svn/__init__.py (Diff revision 4)
     
     
    Show all issues

    I think the type checkers all infer the type when they're initiated with a cast.

  3. reviewboard/scmtools/svn/__init__.py (Diff revision 4)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    Show all issues

    These should be indented one more level, I think.

    Are the comments aligned? Or is that just an issue in the web UI?

    1. They are aligned properly in a terminal.

  4. reviewboard/scmtools/svn/__init__.py (Diff revision 4)
     
     
    Show all issues

    Docs are missing a Raises for SCMError.

  5. reviewboard/scmtools/svn/__init__.py (Diff revision 4)
     
     
     
     
     
     
     
    Show all issues

    The tuple contents need to be indented one more level.

  6. reviewboard/scmtools/tests/test_svn.py (Diff revision 4)
     
     
     
    Show all issues

    Alignment issue with the parameters.

  7. 
      
david
chipx86
  1. Ship It!
  2. 
      
david
Review request changed
Status:
Completed
Change Summary:
Pushed to release-7.x (5fa72d6)