Introduce loose commit validation rules for Mercurial diffs.

Review Request #14034 — Created July 13, 2024 and submitted — Latest diff uploaded

Information

Review Board
release-7.x

Reviewers

Mercurial diffs (both standard and Git-like) lack any information
connecting a file to the last revision or commit that modified or
introduced that file. This poses problems for the way we do diff
validation, since we can't determine the true parent for any given file,
and the end result today is that files that were introduced in a parent
commit would fail to validate in a later commit.

To address this, we need a second, looser validation mode where we only
compare filenames, along with storage of the most-likely revision.

Normally, when performing a validation, we walk through the parent ID
sequence in the commit validation graph and try to find a file entry
that matches both the path and the revision. If we can't find a match,
we check the repository. If we still can't find it, the file doesn't
exist.

In a loose validation mode, we still walk through the graph but we only
check the file paths, not the revisions. We assume if we find any file
path match, then that commit is the correct parent for the file we're
looking for. We can then store that commit ID and store it as the new
source revision for the file, so that we can look it up later when
walking ancestors of a diff during rendering of the diff viewer.

This isn't ideal, as the strict validation is intended to catch
malformed comit chains, merge commits, or just bad uploads from tools.
We now can't know for sure if commits are in the right order or if
anything is missing. But unfortunately, without assistance from a
Mercurial diff, this is the best we can do.

This change implements the loose validation mode through a new
has_per_file_revisions flag stored in a ParsedDiff and therefore
FileLookupContext's diff-wide extra_data. This is considered True
by default, but can be changed on BaseDiffParser subclasses by
setting has_per_file_revisions = False.

During loose validation, get_file_exists_in_history() will look for a
probable-matching file and, if found, record the commit ID in
FileLookupContext.file_extra_Data['__validated_parent_id']. This is
then extracted and removed when creating the FileDiffs and set as the
new source_revision.

All of this logic is restricted to diff parsers that opt out of per-file
revisions, so just Mercurial. If we were to add DiffX support in the
future, we'd be able to return to strict validation for Mercurial.

All unit tests passed.

Tested this with some Mercurial repro cases where diff validation
would fail. The repro case consists of 3 commits. The first two
commits each introduce a new file, and the third commit modifies
one or both of them. This previously failed diff validation, and
now succeeds.

Tested the resulting review requests, making sure I could view
every combination of commits.

Tested these with Git, making sure no behavioral changes were
introduced.

Commits

Files