Introduce loose commit validation rules for Mercurial diffs.
Review Request #14034 — Created July 12, 2024 and submitted
Mercurial diffs (both standard and Git-like) lack any information
connecting a file to the last revision or commit that modified or
introduced that file. This poses problems for the way we do diff
validation, since we can't determine the true parent for any given file,
and the end result today is that files that were introduced in a parent
commit would fail to validate in a later commit.To address this, we need a second, looser validation mode where we only
compare filenames, along with storage of the most-likely revision.Normally, when performing a validation, we walk through the parent ID
sequence in the commit validation graph and try to find a file entry
that matches both the path and the revision. If we can't find a match,
we check the repository. If we still can't find it, the file doesn't
exist.In a loose validation mode, we still walk through the graph but we only
check the file paths, not the revisions. We assume if we find any file
path match, then that commit is the correct parent for the file we're
looking for. We can then store that commit ID and store it as the new
source revision for the file, so that we can look it up later when
walking ancestors of a diff during rendering of the diff viewer.This isn't ideal, as the strict validation is intended to catch
malformed comit chains, merge commits, or just bad uploads from tools.
We now can't know for sure if commits are in the right order or if
anything is missing. But unfortunately, without assistance from a
Mercurial diff, this is the best we can do.This change implements the loose validation mode through a new
has_per_file_revisions
flag stored in aParsedDiff
and therefore
FileLookupContext
's diff-wideextra_data
. This is consideredTrue
by default, but can be changed onBaseDiffParser
subclasses by
settinghas_per_file_revisions = False
.During loose validation,
get_file_exists_in_history()
will look for a
probable-matching file and, if found, record the commit ID in
FileLookupContext.file_extra_Data['__validated_parent_id']
. This is
then extracted and removed when creating theFileDiff
s and set as the
newsource_revision
.All of this logic is restricted to diff parsers that opt out of per-file
revisions, so just Mercurial. If we were to add DiffX support in the
future, we'd be able to return to strict validation for Mercurial.
All unit tests passed.
Tested this with some Mercurial repro cases where diff validation
would fail. The repro case consists of 3 commits. The first two
commits each introduce a new file, and the third commit modifies
one or both of them. This previously failed diff validation, and
now succeeds.Tested the resulting review requests, making sure I could view
every combination of commits.Tested these with Git, making sure no behavioral changes were
introduced.
Summary | ID |
---|---|
37e7dcb4306c48278b32ba840249d465eb4595c3 |
Description | From | Last Updated |
---|---|---|
'typing.Dict' imported but unused Column: 1 Error code: F401 |
reviewbot | |
'typing.Sequence' imported but unused Column: 1 Error code: F401 |
reviewbot | |
block comment should start with '# ' Column: 9 Error code: E265 |
reviewbot | |
This is wrong, has_per_file_revisions has to be set to False in order to opt in to loose validation. |
maubin | |
Typo: il-formed -> ill-formed |
maubin |
- Change Summary:
-
- Removed unused imports.
- Removed some leftover commented-out code.
- Commits:
-
Summary ID 493902803b1bfbd4cc971669f1b6b22962d1d204 a0d5d56bb4d3aa5334970d3d77924d6fe5867ec3