• 
      

    Make SVN revision parsing a little more complete and reliable.

    Review Request #13580 — Created Feb. 27, 2024 and submitted

    Information

    Review Board
    release-7.x

    Reviewers

    The way we deal with parsing SVN revisions had two major problems.

    First, SVN diffs traditionally didn't have any revision information for
    binary files. I'm making a change in RBTools to add the --force
    argument to svn diff, which will end up including that in the "Binary
    files ... differ" line, but that needed to be added to the parser.

    The more general problem is that we didn't have any parsed revision
    information for the modified version of the file. This meant that
    ParsedDiffFile.modified_file_details / FileDiff.dest_detail would
    just be whatever the raw string was in the diff, whether it was a real
    revision, "(nonexistent)", or "(working copy)".

    Our current design of the way diffs parse is that the SCMTool's
    DiffParser is responsible for most of the actual parsing. The
    create_filediffs method would then call the tool's
    parse_diff_revision method to do further parsing on the filename and
    revision, but only for the original file/revision, not the modified.

    This change moves things around so that all parsing happens within the
    SVNDiffParser class. We do the same revision parsing as before, but
    it's now applied to both the original and the modified versions. We can
    also then make some better determinations about how to handle things
    like IntelliJ's awkward revisionless revision labels, since we can
    process that when we have the full context of the file header instead of
    acting on a filename/revision in isolation without knowing whether it's
    for the original or modified file.

    My thinking is that this should become the general pattern. We should
    have a goal of basically deprecating SCMTool.parse_diff_revision
    entirely, and move all revision parsing into the diff parsers.

    • Ran unit tests.
    • Posted a variety of changes, including some with binary files in both
      the working copy and in revision ranges. Saw everything get parsed and
      displayed correctly.
    Summary ID
    Make SVN revision parsing a little more complete and reliable.
    The way we deal with parsing SVN revisions had two major problems. First, SVN diffs traditionally didn't have any revision information for binary files. I'm making a change in RBTools to add the `--force` argument to `svn diff`, which will end up including that in the "Binary files ... differ" line, but that needed to be added to the parser. The more general problem is that we didn't have any parsed revision information for the modified version of the file. This meant that `ParsedDiffFile.modified_file_details` / `FileDiff.dest_detail` would just be whatever the raw string was in the diff, whether it was a real revision, "(nonexistent)", or "(working copy)". Our current design of the way diffs parse is that the SCMTool's DiffParser is responsible for most of the actual parsing. The `create_filediffs` method would then call the tool's `parse_diff_revision` method to do further parsing on the filename and revision, but only for the original file/revision, not the modified. This change moves things around so that all parsing happens within the `SVNDiffParser` class. We do the same revision parsing as before, but it's now applied to both the original and the modified versions. We can also then make some better determinations about how to handle things like IntelliJ's awkward revisionless revision labels, since we can process that when we have the full context of the file header instead of acting on a filename/revision in isolation without knowing whether it's for the original or modified file. My thinking is that this should become the general pattern. We should have a goal of basically deprecating `SCMTool.parse_diff_revision` entirely, and move all revision parsing into the diff parsers. Testing Done: - Ran unit tests. - Posted a variety of changes, including some with binary files in both the working copy and in revision ranges. Saw everything get parsed and displayed correctly. Reviewed at https://reviews.reviewboard.org/r/13580/
    fa21f08fa7e6b9c303941a38d5e938d0a9be374b
    Description From Last Updated

    'typing.cast' imported but unused Column: 1 Error code: F401

    reviewbotreviewbot

    'reviewboard.scmtools.models.Repository' imported but unused Column: 1 Error code: F401

    reviewbotreviewbot

    I think we want AnyStr, as that'll ensure that the type (str or bytes) being used in data will be …

    chipx86chipx86

    Missing a trailing period.

    chipx86chipx86

    While adding this, can you also add types for base_commit_id, new_commit_id, parsed_diff, and parsed_diff_change?

    chipx86chipx86

    Since we're making changes to all this, can we cache the results of these compiles somewhere? It'd be nice to …

    chipx86chipx86

    This feels overly verbose. It's a pretty simple regex.

    chipx86chipx86

    Can these be combined to one line now?

    chipx86chipx86

    Missing a trailing period.

    chipx86chipx86

    'typing.Union' imported but unused Column: 1 Error code: F401

    reviewbotreviewbot

    I think the type checkers all infer the type when they're initiated with a cast.

    chipx86chipx86

    These should be indented one more level, I think. Are the comments aligned? Or is that just an issue in …

    chipx86chipx86

    Docs are missing a Raises for SCMError.

    chipx86chipx86

    The tuple contents need to be indented one more level.

    chipx86chipx86

    Alignment issue with the parameters.

    chipx86chipx86
    Checks run (1 failed, 1 succeeded)
    flake8 failed.
    JSHint passed.

    flake8

    david
    maubin
    1. Ship It!
    2. 
        
    chipx86
    1. 
        
    2. reviewboard/diffviewer/diffutils.py (Diff revision 2)
       
       
       
       
      Show all issues

      I think we want AnyStr, as that'll ensure that the type (str or bytes) being used in data will be the same type as the that in the resulting list.

    3. reviewboard/diffviewer/parser.py (Diff revision 2)
       
       
      Show all issues

      Missing a trailing period.

    4. reviewboard/diffviewer/parser.py (Diff revision 2)
       
       
       
      Show all issues

      While adding this, can you also add types for base_commit_id, new_commit_id, parsed_diff, and parsed_diff_change?

    5. reviewboard/scmtools/svn/__init__.py (Diff revision 2)
       
       
      Show all issues

      Since we're making changes to all this, can we cache the results of these compiles somewhere? It'd be nice to not have to do this every time we instantiate (but also not unconditionally do it in the module).

    6. reviewboard/scmtools/svn/__init__.py (Diff revision 2)
       
       
       
       
       
       
       
       
       
       
       
      Show all issues

      This feels overly verbose. It's a pretty simple regex.

    7. reviewboard/scmtools/svn/__init__.py (Diff revision 2)
       
       
       
      Show all issues

      Can these be combined to one line now?

    8. reviewboard/scmtools/svn/__init__.py (Diff revision 2)
       
       
      Show all issues

      Missing a trailing period.

    9. 
        
    david
    Review request changed
    Commits:
    Summary ID
    Make SVN revision parsing a little more complete and reliable.
    The way we deal with parsing SVN revisions had two major problems. First, SVN diffs traditionally didn't have any revision information for binary files. I'm making a change in RBTools to add the `--force` argument to `svn diff`, which will end up including that in the "Binary files ... differ" line, but that needed to be added to the parser. The more general problem is that we didn't have any parsed revision information for the modified version of the file. This meant that `ParsedDiffFile.modified_file_details` / `FileDiff.dest_detail` would just be whatever the raw string was in the diff, whether it was a real revision, "(nonexistent)", or "(working copy)". Our current design of the way diffs parse is that the SCMTool's DiffParser is responsible for most of the actual parsing. The `create_filediffs` method would then call the tool's `parse_diff_revision` method to do further parsing on the filename and revision, but only for the original file/revision, not the modified. This change moves things around so that all parsing happens within the `SVNDiffParser` class. We do the same revision parsing as before, but it's now applied to both the original and the modified versions. We can also then make some better determinations about how to handle things like IntelliJ's awkward revisionless revision labels, since we can process that when we have the full context of the file header instead of acting on a filename/revision in isolation without knowing whether it's for the original or modified file. My thinking is that this should become the general pattern. We should have a goal of basically deprecating `SCMTool.parse_diff_revision` entirely, and move all revision parsing into the diff parsers. Testing Done: - Ran unit tests. - Posted a variety of changes, including some with binary files in both the working copy and in revision ranges. Saw everything get parsed and displayed correctly.
    80048677a803b302935d064e018445bf0684e62a
    Make SVN revision parsing a little more complete and reliable.
    The way we deal with parsing SVN revisions had two major problems. First, SVN diffs traditionally didn't have any revision information for binary files. I'm making a change in RBTools to add the `--force` argument to `svn diff`, which will end up including that in the "Binary files ... differ" line, but that needed to be added to the parser. The more general problem is that we didn't have any parsed revision information for the modified version of the file. This meant that `ParsedDiffFile.modified_file_details` / `FileDiff.dest_detail` would just be whatever the raw string was in the diff, whether it was a real revision, "(nonexistent)", or "(working copy)". Our current design of the way diffs parse is that the SCMTool's DiffParser is responsible for most of the actual parsing. The `create_filediffs` method would then call the tool's `parse_diff_revision` method to do further parsing on the filename and revision, but only for the original file/revision, not the modified. This change moves things around so that all parsing happens within the `SVNDiffParser` class. We do the same revision parsing as before, but it's now applied to both the original and the modified versions. We can also then make some better determinations about how to handle things like IntelliJ's awkward revisionless revision labels, since we can process that when we have the full context of the file header instead of acting on a filename/revision in isolation without knowing whether it's for the original or modified file. My thinking is that this should become the general pattern. We should have a goal of basically deprecating `SCMTool.parse_diff_revision` entirely, and move all revision parsing into the diff parsers. Testing Done: - Ran unit tests. - Posted a variety of changes, including some with binary files in both the working copy and in revision ranges. Saw everything get parsed and displayed correctly.
    853aaf5ecffe5fb3c6d21857ea0f833b4d307056

    Checks run (1 failed, 1 succeeded)

    flake8 failed.
    JSHint passed.

    flake8

    david
    chipx86
    1. 
        
    2. reviewboard/scmtools/svn/__init__.py (Diff revision 4)
       
       
      Show all issues

      I think the type checkers all infer the type when they're initiated with a cast.

    3. reviewboard/scmtools/svn/__init__.py (Diff revision 4)
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
      Show all issues

      These should be indented one more level, I think.

      Are the comments aligned? Or is that just an issue in the web UI?

      1. They are aligned properly in a terminal.

    4. reviewboard/scmtools/svn/__init__.py (Diff revision 4)
       
       
      Show all issues

      Docs are missing a Raises for SCMError.

    5. reviewboard/scmtools/svn/__init__.py (Diff revision 4)
       
       
       
       
       
       
       
      Show all issues

      The tuple contents need to be indented one more level.

    6. reviewboard/scmtools/tests/test_svn.py (Diff revision 4)
       
       
       
      Show all issues

      Alignment issue with the parameters.

    7. 
        
    david
    chipx86
    1. Ship It!
    2. 
        
    david
    Review request changed
    Status:
    Completed
    Change Summary:
    Pushed to release-7.x (5fa72d6)