Summary

Correctly store base revision for Mercurial diffs with parent diffs (fixes issue 2971)

Review Request #4121 — Created May 8, 2013 and submitted Aug. 5, 2013, 4:56 p.m.

Information

Owner

ccaughie

Repository

Review Board

Branch

release-1.7.x

Bugs

2971

Depends On

Reviewers

Groups

reviewboard

People

Description

Mercurial diffs with parent diffs do not work at all in RB 1.7.7.1 if there is no overlap between the files in the diff and those in the parent diff.

This change fixes the issue and improves the method of dealing with Mercurial diffs that was introduced changeset 6b1f537b4cf9. The old method was somewhat fragile and was broken by changeset 562a8b4a21a9.

Testing Done

Tested using a local repo on which the problem was originally seen.
Added a unit test that tests this scenario.

Issues

Description	From	Last Updated
Col: 13 E128 continuation line under-indented for visual indent	reviewbot	May 8, 2013, 11 p.m.
I'm concerned with this wording. Historically, RB's mentions of "changesets" were very centered around Perforce's model (server-side knowledge of changesets, …	chipx86	July 10, 2013, 9:22 p.m.
This needs a docstring, in the format of: """One-line summary. Multi-line description. """ Also, "base revision" is too generic. Need …	chipx86	July 10, 2013, 9:22 p.m.
No blank line.	chipx86	July 10, 2013, 9:22 p.m.
This seems rather expensive. Can we not compute this as we're parsing initially?	chipx86	July 10, 2013, 9:22 p.m.
You can do: if line.startswith(('# Parent'), ('diff -r')):	chipx86	July 10, 2013, 9:22 p.m.
Blank line before this.	chipx86	July 10, 2013, 9:23 p.m.

This is a review from Review Bot.
  Tool: PEP8 Style Checker
  Processed Files:
    reviewboard/diffviewer/forms.py
    reviewboard/diffviewer/parser.py
    reviewboard/scmtools/hg.py
    reviewboard/scmtools/core.py
    reviewboard/diffviewer/tests.py
  Ignored Files:

reviewboard/diffviewer/forms.py (Diff revision 1)
The issue has been resolved. Show all issues
```
Col: 13
 E128 continuation line under-indented for visual indent
```

Change Summary:

Fix formatting issue

Diff:

Revision 2 (+80 -13)

Show changes

	reviewboard/diffviewer/forms.py
	reviewboard/diffviewer/parser.py
	reviewboard/diffviewer/tests.py
	reviewboard/scmtools/core.py
	reviewboard/scmtools/hg.py

This is a review from Review Bot.
  Tool: PEP8 Style Checker
  Processed Files:
    reviewboard/diffviewer/forms.py
    reviewboard/diffviewer/parser.py
    reviewboard/scmtools/hg.py
    reviewboard/scmtools/core.py
    reviewboard/diffviewer/tests.py
  Ignored Files:

reviewboard/diffviewer/forms.py (Diff revision 2)

The issue has been resolved. Show all issues

I'm concerned with this wording. Historically, RB's mentions of "changesets" were very centered around Perforce's model (server-side knowledge of changesets, mapped to the changenum field). Can we clarify in the comment more what we're referring to?

(I know we already had some mention of this here, but I'd like to get this cleaned up.)

ccaughie

May 27, 2013, 4:41 p.m.

I'm not familiar with Perforce; is it similar to Subversion's revision IDs?

Mercurial's changeset IDs are the same concept as Git's commit IDs (see http://mercurial.selenic.com/wiki/ChangeSetID); each one identifies a revision of the entire repository. The big difference between the two is that Git also uses file revision IDs which are independent of the commit IDs; it is these file revision IDs that appear in Git's diffs. In contrast, Mercurial diffs only include the changeset IDs.

How about this wording:

"This will return a non-None value only for tools that use whole repo revision IDs (e.g. Mercurial's changeset IDs) to identify file versions, instead of individual file revision IDs (e.g. Git)."

I'm happy to use either the term "repo revision ID" or "changeset ID" to refer to a repository revision (in Mercurial, Subversion etc. they are the same, but I guess they might not be in all SCM tools). Or any other term you choose - just let me know.

chipx86

June 22, 2013, 12:40 a.m.

Perforce's changesets are pretty different. In Perforce, the server is aware of all pending changes and affected files. You create a changeset with a description and list of files and such, and the server knows about it. Review Board uses that change number to pull the description and testing done from that server-side changeset.

David recently pushed a change for a "commit ID," which is probably what you want. That means it will require RB 1.8. I'd pull David into this discussion and see if that's the right thing to use (I think it is).

Your description seems fine. Would "commit ID" be suitable terminology for this?

ccaughie

June 27, 2013, 7:19 a.m.

Yes, commit ID sounds fine. I looked at David's change and I don't think there's any interdependency between his and mine - he's added a "commit ID" field to the review request itself but I'm only dealing with diffs. So I don't think there's any need to wait for 1.8 to get this in. I'll definitely use his terminology though.

reviewboard/diffviewer/parser.py (Diff revision 2)

The issue has been resolved. Show all issues

This needs a docstring, in the format of:

"""One-line summary.

Multi-line description.
"""

Also, "base revision" is too generic. Need something more specific to the case here.

ccaughie

May 27, 2013, 4:41 p.m.

How about get_base_repo_revision_id()? or get_base_changeset_id()? I'm open to other suggestions, whatever is most consistent with the terminology used elsewhere in Review Board. In Mercurial a changeset ID is basically the same concept as a repository revision ID, but I guess that may not be the case with all tools.

chipx86

June 22, 2013, 12:40 a.m.

Maybe get_commit_id? Again, at this point, we should drag David into this. If he doesn't see this, I'll point it out to him.

reviewboard/diffviewer/tests.py (Diff revision 2)
The issue has been resolved. Show all issues
```
No blank line.
```

reviewboard/scmtools/hg.py (Diff revision 2)

The issue has been resolved. Show all issues

This seems rather expensive. Can we not compute this as we're parsing initially?

ccaughie

May 27, 2013, 4:41 p.m.

In practice it isn't, since the information we're searching for is always found in the first two or three lines.

But looking at this again it appears I've been somewhat boneheaded about this - the base revision ID is already found during the original parse so I should just be able to return that. Will change unless I find a reason why it won't work.

reviewboard/scmtools/hg.py (Diff revision 2)

The issue has been dropped. Show all issues

You can do:

if line.startswith(('# Parent'), ('diff -r')):

ccaughie

May 27, 2013, 4:41 p.m.

Nice, I learn something new about Python every day. :)

ccaughie

July 10, 2013, 9:23 p.m.

Dropped issue because this code is no longer present.

reviewboard/scmtools/hg.py (Diff revision 2)
The issue has been dropped. Show all issues
```
Blank line before this.
```
1. CC
  
  ccaughie
  
  July 10, 2013, 9:23 p.m.
  Dropped issue because this code is no longer present.

Diff:

Revision 3 (+76 -14)

Show changes

	reviewboard/diffviewer/forms.py
	reviewboard/diffviewer/parser.py
	reviewboard/diffviewer/tests.py
	reviewboard/scmtools/core.py
	reviewboard/scmtools/hg.py

This is a review from Review Bot.
  Tool: PEP8 Style Checker
  Processed Files:
    reviewboard/diffviewer/forms.py
    reviewboard/diffviewer/parser.py
    reviewboard/scmtools/hg.py
    reviewboard/scmtools/core.py
    reviewboard/diffviewer/tests.py
  Ignored Files:

This is a review from Review Bot.
  Tool: Pyflakes
  Processed Files:
    reviewboard/diffviewer/forms.py
    reviewboard/diffviewer/parser.py
    reviewboard/scmtools/hg.py
    reviewboard/scmtools/core.py
    reviewboard/diffviewer/tests.py
  Ignored Files:

Hey Colin,

Thanks for all the hard work on this. I know we've gone back and forth a lot, but this looks good.

I wanted to run something by you before committing this and get your thoughts on it.

I'm putting together a change that in some way overlaps. Basically, we're going to start storing the commit IDs even in cases where we don't natively fetch via commit ID on that type of repository.

Basically, this is for Git + BitBucket support. BitBucket's API only allows fetching of files using a commit SHA1, rather than a blob SHA1 (which is otherwise used in most Git services). Unlike Mercurial, Git diffs don't contain this information natively in the diff, so we'll be allowing RBTools to provide it via our API when uploading the diff. We can then use the commit SHA1 when fetching from BitBucket, and the blob SHA1 everywhere else.

So that seems pretty related to your change. The difference being that you're adding support to always use a commit SHA1 (if provided by the DiffParser) as the file revision.

So I'm trying to decide how to merge these concepts together well. We may want to augment other DiffParsers to pull out the commit SHA1 but *not* use it as the file's revision, since we may look up the file differently depending on service.

I think, then, that the only thing that would need to be modified with your change is to bring back an SCMTool capability field for saying whether to use commit IDs as revisions when parsing diffs. We had diff_uses_changeset_ids, and I think we should keep using that (but rename to diff_use_commit_id_as_revision). We can then safely return and store a commit ID from the DiffParser without necessarily having to look up via that ID.

My upcoming change would then add some additional logic to take that commit ID the DiffParser returns and store it in the DiffSet, in a new field I'll be introducing.

Does that seem sane?

chipx86

July 11, 2013, 11:58 a.m.

By the way, you don't have to do any of this. I just want to pick your brain.

Ship it!

```
Ship It!
```

Status:: Completed
Change Summary:: Pushed to release-1.7.x (45faed4). Thanks!