Introduce loose commit validation rules for Mercurial diffs.
Review Request #14034 — Created July 12, 2024 and submitted — Latest diff uploaded
Mercurial diffs (both standard and Git-like) lack any information
connecting a file to the last revision or commit that modified or
introduced that file. This poses problems for the way we do diff
validation, since we can't determine the true parent for any given file,
and the end result today is that files that were introduced in a parent
commit would fail to validate in a later commit.To address this, we need a second, looser validation mode where we only
compare filenames, along with storage of the most-likely revision.Normally, when performing a validation, we walk through the parent ID
sequence in the commit validation graph and try to find a file entry
that matches both the path and the revision. If we can't find a match,
we check the repository. If we still can't find it, the file doesn't
exist.In a loose validation mode, we still walk through the graph but we only
check the file paths, not the revisions. We assume if we find any file
path match, then that commit is the correct parent for the file we're
looking for. We can then store that commit ID and store it as the new
source revision for the file, so that we can look it up later when
walking ancestors of a diff during rendering of the diff viewer.This isn't ideal, as the strict validation is intended to catch
malformed comit chains, merge commits, or just bad uploads from tools.
We now can't know for sure if commits are in the right order or if
anything is missing. But unfortunately, without assistance from a
Mercurial diff, this is the best we can do.This change implements the loose validation mode through a new
has_per_file_revisions
flag stored in aParsedDiff
and therefore
FileLookupContext
's diff-wideextra_data
. This is consideredTrue
by default, but can be changed onBaseDiffParser
subclasses by
settinghas_per_file_revisions = False
.During loose validation,
get_file_exists_in_history()
will look for a
probable-matching file and, if found, record the commit ID in
FileLookupContext.file_extra_Data['__validated_parent_id']
. This is
then extracted and removed when creating theFileDiff
s and set as the
newsource_revision
.All of this logic is restricted to diff parsers that opt out of per-file
revisions, so just Mercurial. If we were to add DiffX support in the
future, we'd be able to return to strict validation for Mercurial.
All unit tests passed.
Tested this with some Mercurial repro cases where diff validation
would fail. The repro case consists of 3 commits. The first two
commits each introduce a new file, and the third commit modifies
one or both of them. This previously failed diff validation, and
now succeeds.Tested the resulting review requests, making sure I could view
every combination of commits.Tested these with Git, making sure no behavioral changes were
introduced.
Diff Revision 1
This is not the most recent revision of the diff. The latest diff is revision 3. See what's changed.
Commits
reviewboard/diffviewer/commit_utils.py | |||
---|---|---|---|
Revision 29d89e73285a70dd9aaf2169d0d05af95daa4e8f | New Change | ||
1 |
"""Utilities for dealing with DiffCommits."""
|
1 |
"""Utilities for dealing with DiffCommits."""
|
2 | 2 | ||
3 |
from __future__ import annotations |
3 |
from __future__ import annotations |
4 | 4 | ||
5 |
import base64 |
5 |
import base64 |
6 |
import json |
6 |
import json |
7 |
from itertools import chain |
7 |
from itertools import chain |
2 | 8 |
from typing import Any, Dict, Optional, Sequence, TYPE_CHECKING |
|
8 | 9 | ||
9 |
from django.utils.encoding import force_bytes |
10 |
from django.utils.encoding import force_bytes |
10 |
from typing_extensions import NotRequired, TypedDict |
11 |
from typing_extensions import NotRequired, TypedDict |
11 | 12 | ||
12 |
from reviewboard.scmtools.core import PRE_CREATION, UNKNOWN |
13 |
from reviewboard.scmtools.core import PRE_CREATION, UNKNOWN |
13 | 14 | ||
15 |
if TYPE_CHECKING: |
||
16 |
from reviewboard.scmtools.core import FileLookupContext |
||
17 |
from reviewboard.scmtools.models import Repository |
||
14 | 18 | ||
15 |
def get_file_exists_in_history(validation_info, repository, parent_id, path, |
19 | |
16 |
revision, **kwargs): |
20 |
def get_file_exists_in_history( |
21 |
validation_info: dict[str, Any], |
||
22 |
repository: Repository, |
||
23 |
parent_id: str, |
||
24 |
path: str, |
||
25 |
revision: str, |
||
26 |
base_commit_id: Optional[str] = None, |
||
27 |
*, |
||
28 |
context: Optional[FileLookupContext] = None, |
||
29 |
**kwargs, |
||
30 |
) -> bool: |
||
17 |
"""Return whether or not the file exists, given the validation information. |
31 |
"""Return whether or not the file exists, given the validation information. |
18 | 32 | ||
33 |
This will walk through the commit chain in ``validation_info``, starting
|
||
34 |
with ``parent_id``, looking for a file entry that matches the given
|
||
35 |
``path`` and (by default) ``revision``. If found, the file is considered
|
||
36 |
to exist. If not found, it will fall back to checking the repository.
|
||
37 | |||
38 |
This can also operate in a loose validation mode. In this mode, only
|
||
39 |
file paths are compared, not revisions. This is required for diffs that
|
||
40 |
don't provide per-file revision information (such as Mercurial's plain
|
||
41 |
and Git-style diffs). If a match is found, the associated commit ID is
|
||
42 |
stored as ``context.file_extra_data['__validated_parent_id']``, allowing
|
||
43 |
for post-processing of the source revision to occur.
|
||
44 | |||
45 |
Strict validation mode is the default. Loose validation must be opted into
|
||
46 |
by setting ``context.diff_extra_data['has_per_file_revisions'] = True``.
|
||
47 |
This is done automatically when a
|
||
48 |
:py:class:`~reviewboard.diffviewer.parser.BaseDiffParser` subclass sets
|
||
49 |
:py:attr:`has_per_file_revisions
|
||
50 |
<reviewboard.diffviewer.parser.BaseDiffParser.has_per_file_revisions>`
|
||
51 |
to ``True``.
|
||
52 | |||
53 |
Version Changed:
|
||
54 |
7.0.2:
|
||
55 |
* Added the ``context`` argument.
|
||
56 |
* Added the loose validation mode.
|
||
57 | |||
19 |
Version Changed:
|
58 |
Version Changed:
|
20 |
4.0.5:
|
59 |
4.0.5:
|
21 |
Removed explicit arguments for ``base_commit_id`` and ``request``, and
|
60 |
Removed explicit arguments for ``base_commit_id`` and ``request``, and
|
22 |
added ``**kwargs``.
|
61 |
added ``**kwargs``.
|
23 | 62 | ||
4 lines | |||
def get_file_exists_in_history(validation_info, repository, parent_id, path,
|
|||
28 |
ValidateDiffCommitResource`.
|
67 |
ValidateDiffCommitResource`.
|
29 | 68 | ||
30 |
repository (reviewboard.scmtools.models.Repository):
|
69 |
repository (reviewboard.scmtools.models.Repository):
|
31 |
The repository.
|
70 |
The repository.
|
32 | 71 | ||
33 |
parent_id (unicode):
|
72 |
parent_id (str):
|
34 |
The parent commit ID of the commit currently being processed.
|
73 |
The parent commit ID of the commit currently being processed.
|
35 | 74 | ||
36 |
path (unicode):
|
75 |
path (str):
|
37 |
The file path.
|
76 |
The file path.
|
38 | 77 | ||
39 |
revision (unicode):
|
78 |
revision (str):
|
40 |
The revision of the file to retrieve.
|
79 |
The revision of the file to retrieve.
|
41 | 80 | ||
81 |
context (reviewboard.scmtools.core.FileLookupContext, optional):
|
||
82 |
The file lookup context used to validate this repository.
|
||
83 | |||
84 |
Version Added:
|
||
85 |
7.0.2
|
||
86 | |||
42 |
**kwargs (dict):
|
87 |
**kwargs (dict):
|
43 |
Additional keyword arguments normally expected by
|
88 |
Additional keyword arguments normally expected by
|
44 |
:py:meth:`Repository.get_file_exists
|
89 |
:py:meth:`Repository.get_file_exists
|
45 |
<reviewboard.scmtools.models.Repository.get_file_exists>`.
|
90 |
<reviewboard.scmtools.models.Repository.get_file_exists>`.
|
46 | 91 | ||
47 |
Version Added:
|
92 |
Version Added:
|
48 |
4.0.5
|
93 |
4.0.5
|
49 | 94 | ||
50 |
Returns:
|
95 |
Returns:
|
51 |
bool:
|
96 |
bool:
|
52 |
Whether or not the file exists.
|
97 |
Whether or not the file exists.
|
53 |
"""
|
98 |
"""
|
99 |
match_parent_revisions = ( |
||
100 |
context is None or |
||
101 |
context.diff_extra_data.get('has_per_file_revisions', True)) |
||
102 | |||
54 |
while parent_id in validation_info: |
103 |
while parent_id in validation_info: |
55 |
entry = validation_info[parent_id] |
104 |
entry = validation_info[parent_id] |
56 |
tree = entry['tree'] |
105 |
tree = entry['tree'] |
57 | 106 | ||
58 |
if revision == UNKNOWN: |
107 |
if revision == UNKNOWN: |
59 |
for removed_info in tree['removed']: |
108 |
for removed_info in tree['removed']: |
60 |
if removed_info['filename'] == path: |
109 |
if removed_info['filename'] == path: |
61 |
return False |
110 |
return False |
62 | 111 | ||
63 |
for added_info in chain(tree['added'], tree['modified']): |
112 |
for change_info in chain(tree['added'], tree['modified']): |
64 |
if (added_info['filename'] == path and |
113 |
if change_info['filename'] == path: |
65 |
(revision == UNKNOWN or |
114 |
if revision == UNKNOWN: |
66 |
added_info['revision'] == revision)): |
115 |
return True |
116 | |||
117 |
if match_parent_revisions: |
||
118 |
# In the standard case, we have per-file revisions, and
|
||
119 |
# can use that to get a specific match within the
|
||
120 |
# validation history.
|
||
121 |
if change_info['revision'] == revision: |
||
122 |
return True |
||
123 |
else: |
||
124 |
# In a more limited case, we may not know the per-file
|
||
125 |
# revision, and instead have to limit our scan to
|
||
126 |
# the nearest filename. This is the case with Mercurial
|
||
127 |
# diffs.
|
||
128 |
#
|
||
129 |
# We'll be recording what we found, temporarily. This
|
||
130 |
# will be used to update the source filename of a
|
||
131 |
# generated FileDiff.
|
||
132 |
assert context is not None |
||
133 | |||
134 |
context.file_extra_data['__validated_parent_id'] = \ |
||
135 |
parent_id
|
||
136 | |||
67 |
return True |
137 |
>>>> return True |
68 | 138 | ||
69 |
parent_id = entry['parent_id'] |
139 |
parent_id = entry['parent_id'] |
70 | 140 | ||
71 |
# We did not find an entry in our validation info, so we need to fall back
|
141 |
# We did not find an entry in our validation info, so we need to fall back
|
72 |
# to checking the repository.
|
142 |
# to checking the repository.
|
73 |
return repository.get_file_exists(path=path, |
143 |
return repository.get_file_exists(path=path, |
74 |
revision=revision, |
144 |
revision=revision, |
145 |
base_commit_id=base_commit_id, |
||
146 |
context=context, |
||
75 |
**kwargs) |
147 |
**kwargs) |
76 | 148 | ||
77 | 149 | ||
78 |
def exclude_ancestor_filediffs(to_filter, all_filediffs=None): |
150 |
def exclude_ancestor_filediffs(to_filter, all_filediffs=None): |
79 |
"""Exclude all ancestor FileDiffs from the given list and return the rest. |
151 |
"""Exclude all ancestor FileDiffs from the given list and return the rest. |
420 lines |
reviewboard/diffviewer/filediff_creator.py |
---|
reviewboard/diffviewer/parser.py |
---|
reviewboard/diffviewer/tests/test_commit_utils.py |
---|
reviewboard/diffviewer/tests/test_filediff_creator.py |
---|
reviewboard/scmtools/hg.py |
---|