Deprecate and replace DiffParser.get_orig_commit_id().

Review Request #11757 — Created July 29, 2021 and submitted

Information

Review Board
release-4.0.x

Reviewers

DiffParser.get_orig_commit_id() was an old hack originally intended
for solving a complex issue with parent diffs for Mercurial. To help
understand it, we need to dive into the history of how parent diffs,
source revisions, and file lookups worked.

Originally, when parsing a diff + parent diff, we'd set the source
revision for a FileDiff to that of the source of the file for the
parent diff, enabling us to look up the right soure file. We'd then
apply the parent diff and then the diff on top of it. If a file didn't
exist in a parent diff, we could just use the source revision in the
main diff, since we could assume it would be present in the repository.

The problem at that point was that, if the SCM used a commit revision to
look up files, and the file wasn't present in the parent diff, we
wouldn't have a suitable revision to use. To address this,
DiffParser.get_orig_commit_id() was introduced. This would allow the
diff parser to capture the parent revision, and then we could use that
captured revision from the parent diff as the source revision for files.

More recently, we stopped assigning a parent diff's source revision to
the FileDiff, and instead preserved the main diff's source revisions.
We'd then store parent diff revisions and filenames in extra_data.
This was part of an effort to fix some file lookup issues and file
matching involving parent diffs. However, we still needed
get_orig_commit_id() to get those parent commit IDs, and we were still
storing them as the FileDiff's source revision.

That brings us to this change.

Now that we have the new parsed diff objects (ParsedDiff and
ParsedDiffChange), we have a new avenue for storing useful information
during parsing. This change adds commit_id and parent_commit_id to
ParsedDiffChange, which can be optionally set by the parser. It also
adds a capability flag, uses_commit_ids_as_revisions, to ParsedDiff,
which can also be set.

If this flag is set, and commit ID information is recorded, then it will
be used as the parent source revision when building the FileDiffs.
This works just like get_orig_commit_id(), except we're no longer
storing the value in extra_data like other parent source revisions.

get_orig_commit_id() is now considered deprecated, scheduled to be
removed in Review Board 5.0. DiffParser will check for a value after
parsing and, if found, will set it and the
uses_commit_ids_as_revisions flag, emitting a deprecation warning in
the process.

This is a big cleanup that addresses one of the worst remnants of our
older diff parsing code, and opens the door to allowing a DiffX parser
to provide the same parent revision behavior.

Unit tests passed.

Tested posting a Mercurial diff with a parent diff. Verified that the
result was correct and contained the right revision metadata. This test
also instrumented the compatibility for get_orig_commit_id(), since
the Mercurial implementation has not been updated.

Summary ID
Deprecate and replace DiffParser.get_orig_commit_id().
`DiffParser.get_orig_commit_id()` was an old hack originally intended for solving a complex issue with parent diffs for Mercurial. To help understand it, we need to dive into the history of how parent diffs, source revisions, and file lookups worked. Originally, when parsing a diff + parent diff, we'd set the source revision for a `FileDiff` to that of the source of the file for the parent diff, enabling us to look up the right soure file. We'd then apply the parent diff and then the diff on top of it. If a file didn't exist in a parent diff, we could just use the source revision in the main diff, since we could assume it would be present in the repository. The problem at that point was that, if the SCM used a commit revision to look up files, and the file wasn't present in the parent diff, we wouldn't have a suitable revision to use. To address this, `DiffParser.get_orig_commit_id()` was introduced. This would allow the diff parser to capture the parent revision, and then we could use that captured revision from the parent diff as the source revision for files. More recently, we stopped assigning a parent diff's source revision to the `FileDiff`, and instead preserved the main diff's source revisions. We'd then store parent diff revisions and filenames in `extra_data`. This was part of an effort to fix some file lookup issues and file matching involving parent diffs. However, we still needed `get_orig_commit_id()` to get those parent commit IDs, and we were still storing them as the `FileDiff`'s source revision. That brings us to this change. Now that we have the new parsed diff objects (`ParsedDiff` and `ParsedDiffChange`), we have a new avenue for storing useful information during parsing. This change adds `commit_id` and `parent_commit_id` to `ParsedDiffChange`, which can be optionally set by the parser. It also adds a capability flag, `uses_commit_ids_as_revisions`, to `ParsedDiff`, which can also be set. If this flag is set, and commit ID information is recorded, then it will be used as the parent source revision when building the `FileDiff`s. This works just like `get_orig_commit_id()`, except we're no longer storing the value in `extra_data` like other parent source revisions. `get_orig_commit_id()` is now considered deprecated, scheduled to be removed in Review Board 5.0. `DiffParser` will check for a value after parsing and, if found, will set it and the `uses_commit_ids_as_revisions` flag, emitting a deprecation warning in the process. This is a big cleanup that addresses one of the worst remnants of our older diff parsing code, and opens the door to allowing a DiffX parser to provide the same parent revision behavior.
96cc4d7358660ade08e5bbc139bea22ba0369ba6
chipx86
david
  1. Ship It!
  2. 
      
chipx86
Review request changed
Status:
Completed
Change Summary:
Pushed to release-4.0.x (b972f20)