• 
      

    Introduce loose commit validation rules for Mercurial diffs.

    Review Request #14034 — Created July 12, 2024 and submitted

    Information

    Review Board
    release-7.x

    Reviewers

    Mercurial diffs (both standard and Git-like) lack any information
    connecting a file to the last revision or commit that modified or
    introduced that file. This poses problems for the way we do diff
    validation, since we can't determine the true parent for any given file,
    and the end result today is that files that were introduced in a parent
    commit would fail to validate in a later commit.

    To address this, we need a second, looser validation mode where we only
    compare filenames, along with storage of the most-likely revision.

    Normally, when performing a validation, we walk through the parent ID
    sequence in the commit validation graph and try to find a file entry
    that matches both the path and the revision. If we can't find a match,
    we check the repository. If we still can't find it, the file doesn't
    exist.

    In a loose validation mode, we still walk through the graph but we only
    check the file paths, not the revisions. We assume if we find any file
    path match, then that commit is the correct parent for the file we're
    looking for. We can then store that commit ID and store it as the new
    source revision for the file, so that we can look it up later when
    walking ancestors of a diff during rendering of the diff viewer.

    This isn't ideal, as the strict validation is intended to catch
    malformed comit chains, merge commits, or just bad uploads from tools.
    We now can't know for sure if commits are in the right order or if
    anything is missing. But unfortunately, without assistance from a
    Mercurial diff, this is the best we can do.

    This change implements the loose validation mode through a new
    has_per_file_revisions flag stored in a ParsedDiff and therefore
    FileLookupContext's diff-wide extra_data. This is considered True
    by default, but can be changed on BaseDiffParser subclasses by
    setting has_per_file_revisions = False.

    During loose validation, get_file_exists_in_history() will look for a
    probable-matching file and, if found, record the commit ID in
    FileLookupContext.file_extra_Data['__validated_parent_id']. This is
    then extracted and removed when creating the FileDiffs and set as the
    new source_revision.

    All of this logic is restricted to diff parsers that opt out of per-file
    revisions, so just Mercurial. If we were to add DiffX support in the
    future, we'd be able to return to strict validation for Mercurial.

    All unit tests passed.

    Tested this with some Mercurial repro cases where diff validation
    would fail. The repro case consists of 3 commits. The first two
    commits each introduce a new file, and the third commit modifies
    one or both of them. This previously failed diff validation, and
    now succeeds.

    Tested the resulting review requests, making sure I could view
    every combination of commits.

    Tested these with Git, making sure no behavioral changes were
    introduced.

    Summary ID
    Introduce loose commit validation rules for Mercurial diffs.
    Mercurial diffs (both standard and Git-like) lack any information connecting a file to the last revision or commit that modified or introduced that file. This poses problems for the way we do diff validation, since we can't determine the true parent for any given file, and the end result today is that files that were introduced in a parent commit would fail to validate in a later commit. To address this, we need a second, looser validation mode where we only compare filenames, along with storage of the most-likely revision. Normally, when performing a validation, we walk through the parent ID sequence in the commit validation graph and try to find a file entry that matches both the path and the revision. If we can't find a match, we check the repository. If we still can't find it, the file doesn't exist. In a loose validation mode, we still walk through the graph but we only check the file paths, not the revisions. We assume if we find any file path match, then that commit is the correct parent for the file we're looking for. We can then store that commit ID and store it as the new source revision for the file, so that we can look it up later when walking ancestors of a diff during rendering of the diff viewer. This isn't ideal, as the strict validation is intended to catch malformed comit chains, merge commits, or just bad uploads from tools. We now can't know for sure if commits are in the right order or if anything is missing. But unfortunately, without assistance from a Mercurial diff, this is the best we can do. This change implements the loose validation mode through a new `has_per_file_revisions` flag stored in a `ParsedDiff` and therefore `FileLookupContext`'s diff-wide `extra_data`. This is considered `True` by default, but can be changed on `BaseDiffParser` subclasses by setting `has_per_file_revisions = False`. During loose validation, `get_file_exists_in_history()` will look for a probable-matching file and, if found, record the commit ID in `FileLookupContext.file_extra_Data['__validated_parent_id']`. This is then extracted and removed when creating the `FileDiff`s and set as the new `source_revision`. All of this logic is restricted to diff parsers that opt out of per-file revisions, so just Mercurial. If we were to add DiffX support in the future, we'd be able to return to strict validation for Mercurial.
    37e7dcb4306c48278b32ba840249d465eb4595c3
    Description From Last Updated

    'typing.Dict' imported but unused Column: 1 Error code: F401

    reviewbot reviewbot

    'typing.Sequence' imported but unused Column: 1 Error code: F401

    reviewbot reviewbot

    block comment should start with '# ' Column: 9 Error code: E265

    reviewbot reviewbot

    This is wrong, has_per_file_revisions has to be set to False in order to opt in to loose validation.

    maubin maubin

    Typo: il-formed -> ill-formed

    maubin maubin
    Checks run (1 failed, 1 succeeded)
    flake8 failed.
    JSHint passed.

    flake8

    chipx86
    david
    1. Ship It!
    2. 
        
    maubin
    1. 
        
    2. reviewboard/diffviewer/commit_utils.py (Diff revision 2)
       
       
       
       
       
       
       
       
      Show all issues

      This is wrong, has_per_file_revisions has to be set to False in order to opt in to loose validation.

    3. reviewboard/scmtools/hg.py (Diff revision 2)
       
       
      Show all issues

      Typo: il-formed -> ill-formed

    4. 
        
    chipx86
    maubin
    1. Ship It!
    2. 
        
    chipx86
    Review request changed
    Status:
    Completed
    Change Summary:
    Pushed to release-7.x (c896678)