• 
      

    Introduce loose commit validation rules for Mercurial diffs.

    Review Request #14034 — Created July 12, 2024 and submitted — Latest diff uploaded

    Information

    Review Board
    release-7.x

    Reviewers

    Mercurial diffs (both standard and Git-like) lack any information
    connecting a file to the last revision or commit that modified or
    introduced that file. This poses problems for the way we do diff
    validation, since we can't determine the true parent for any given file,
    and the end result today is that files that were introduced in a parent
    commit would fail to validate in a later commit.

    To address this, we need a second, looser validation mode where we only
    compare filenames, along with storage of the most-likely revision.

    Normally, when performing a validation, we walk through the parent ID
    sequence in the commit validation graph and try to find a file entry
    that matches both the path and the revision. If we can't find a match,
    we check the repository. If we still can't find it, the file doesn't
    exist.

    In a loose validation mode, we still walk through the graph but we only
    check the file paths, not the revisions. We assume if we find any file
    path match, then that commit is the correct parent for the file we're
    looking for. We can then store that commit ID and store it as the new
    source revision for the file, so that we can look it up later when
    walking ancestors of a diff during rendering of the diff viewer.

    This isn't ideal, as the strict validation is intended to catch
    malformed comit chains, merge commits, or just bad uploads from tools.
    We now can't know for sure if commits are in the right order or if
    anything is missing. But unfortunately, without assistance from a
    Mercurial diff, this is the best we can do.

    This change implements the loose validation mode through a new
    has_per_file_revisions flag stored in a ParsedDiff and therefore
    FileLookupContext's diff-wide extra_data. This is considered True
    by default, but can be changed on BaseDiffParser subclasses by
    setting has_per_file_revisions = False.

    During loose validation, get_file_exists_in_history() will look for a
    probable-matching file and, if found, record the commit ID in
    FileLookupContext.file_extra_Data['__validated_parent_id']. This is
    then extracted and removed when creating the FileDiffs and set as the
    new source_revision.

    All of this logic is restricted to diff parsers that opt out of per-file
    revisions, so just Mercurial. If we were to add DiffX support in the
    future, we'd be able to return to strict validation for Mercurial.

    All unit tests passed.

    Tested this with some Mercurial repro cases where diff validation
    would fail. The repro case consists of 3 commits. The first two
    commits each introduce a new file, and the third commit modifies
    one or both of them. This previously failed diff validation, and
    now succeeds.

    Tested the resulting review requests, making sure I could view
    every combination of commits.

    Tested these with Git, making sure no behavioral changes were
    introduced.

    Commits

    Files