Correctly store base revision for Mercurial diffs with parent diffs (fixes issue 2971)

Review Request #4121 — Created May 8, 2013 and submitted

Information

Review Board
release-1.7.x

Reviewers

Mercurial diffs with parent diffs do not work at all in RB 1.7.7.1 if there is no overlap between the files in the diff and those in the parent diff.

This change fixes the issue and improves the method of dealing with Mercurial diffs that was introduced changeset 6b1f537b4cf9. The old method was somewhat fragile and was broken by changeset 562a8b4a21a9.
Tested using a local repo on which the problem was originally seen.
Added a unit test that tests this scenario.
Description From Last Updated

Col: 13 E128 continuation line under-indented for visual indent

reviewbotreviewbot

I'm concerned with this wording. Historically, RB's mentions of "changesets" were very centered around Perforce's model (server-side knowledge of changesets, …

chipx86chipx86

This needs a docstring, in the format of: """One-line summary. Multi-line description. """ Also, "base revision" is too generic. Need …

chipx86chipx86

No blank line.

chipx86chipx86

This seems rather expensive. Can we not compute this as we're parsing initially?

chipx86chipx86

You can do: if line.startswith(('# Parent'), ('diff -r')):

chipx86chipx86

Blank line before this.

chipx86chipx86
reviewbot
  1. This is a review from Review Bot.
      Tool: PEP8 Style Checker
      Processed Files:
        reviewboard/diffviewer/forms.py
        reviewboard/diffviewer/parser.py
        reviewboard/scmtools/hg.py
        reviewboard/scmtools/core.py
        reviewboard/diffviewer/tests.py
      Ignored Files:
    
    
  2. reviewboard/diffviewer/forms.py (Diff revision 1)
     
     
    Show all issues
    Col: 13
     E128 continuation line under-indented for visual indent
    
  3. 
      
CC
reviewbot
  1. This is a review from Review Bot.
      Tool: PEP8 Style Checker
      Processed Files:
        reviewboard/diffviewer/forms.py
        reviewboard/diffviewer/parser.py
        reviewboard/scmtools/hg.py
        reviewboard/scmtools/core.py
        reviewboard/diffviewer/tests.py
      Ignored Files:
    
    
  2. 
      
chipx86
  1. 
      
  2. reviewboard/diffviewer/forms.py (Diff revision 2)
     
     
     
     
    Show all issues
    I'm concerned with this wording. Historically, RB's mentions of "changesets" were very centered around Perforce's model (server-side knowledge of changesets, mapped to the changenum field). Can we clarify in the comment more what we're referring to?
    
    (I know we already had some mention of this here, but I'd like to get this cleaned up.)
    1. I'm not familiar with Perforce; is it similar to Subversion's revision IDs?
      
      Mercurial's changeset IDs are the same concept as Git's commit IDs (see http://mercurial.selenic.com/wiki/ChangeSetID); each one identifies a revision of the entire repository. The big difference between the two is that Git also uses file revision IDs which are independent of the commit IDs; it is these file revision IDs that appear in Git's diffs. In contrast, Mercurial diffs only include the changeset IDs.
      
      How about this wording:
      
      "This will return a non-None value only for tools that use whole repo revision IDs (e.g. Mercurial's changeset IDs) to identify file versions, instead of individual file revision IDs (e.g. Git)."
      
      I'm happy to use either the term "repo revision ID" or "changeset ID" to refer to a repository revision (in Mercurial, Subversion etc. they are the same, but I guess they might not be in all SCM tools). Or any other term you choose - just let me know.
      
    2. Perforce's changesets are pretty different. In Perforce, the server is aware of all pending changes and affected files. You create a changeset with a description and list of files and such, and the server knows about it. Review Board uses that change number to pull the description and testing done from that server-side changeset.
      
      David recently pushed a change for a "commit ID," which is probably what you want. That means it will require RB 1.8. I'd pull David into this discussion and see if that's the right thing to use (I think it is).
      
      Your description seems fine. Would "commit ID" be suitable terminology for this?
    3. Yes, commit ID sounds fine. I looked at David's change and I don't think there's any interdependency between his and mine - he's added a "commit ID" field to the review request itself but I'm only dealing with diffs. So I don't think there's any need to wait for 1.8 to get this in. I'll definitely use his terminology though.
  3. reviewboard/diffviewer/parser.py (Diff revision 2)
     
     
     
     
     
    Show all issues
    This needs a docstring, in the format of:
    
    """One-line summary.
    
    Multi-line description.
    """
    
    Also, "base revision" is too generic. Need something more specific to the case here.
    1. How about get_base_repo_revision_id()? or get_base_changeset_id()? I'm open to other suggestions, whatever is most consistent with the terminology used elsewhere in Review Board. In Mercurial a changeset ID is basically the same concept as a repository revision ID, but I guess that may not be the case with all tools.
    2. Maybe get_commit_id? Again, at this point, we should drag David into this. If he doesn't see this, I'll point it out to him.
  4. reviewboard/diffviewer/tests.py (Diff revision 2)
     
     
     
     
    Show all issues
    No blank line.
  5. reviewboard/scmtools/hg.py (Diff revision 2)
     
     
    Show all issues
    This seems rather expensive. Can we not compute this as we're parsing initially?
    1. In practice it isn't, since the information we're searching for is always found in the first two or three lines.
      
      But looking at this again it appears I've been somewhat boneheaded about this - the base revision ID is already found during the original parse so I should just be able to return that. Will change unless I find a reason why it won't work.
  6. reviewboard/scmtools/hg.py (Diff revision 2)
     
     
    Show all issues
    You can do:
    
    if line.startswith(('# Parent'), ('diff -r')):
    1. Nice, I learn something new about Python every day. :)
    2. Dropped issue because this code is no longer present.
  7. reviewboard/scmtools/hg.py (Diff revision 2)
     
     
    Show all issues
    Blank line before this.
    1. Dropped issue because this code is no longer present.
  8. 
      
CC
reviewbot
  1. This is a review from Review Bot.
      Tool: PEP8 Style Checker
      Processed Files:
        reviewboard/diffviewer/forms.py
        reviewboard/diffviewer/parser.py
        reviewboard/scmtools/hg.py
        reviewboard/scmtools/core.py
        reviewboard/diffviewer/tests.py
      Ignored Files:
    
    
  2. 
      
reviewbot
  1. This is a review from Review Bot.
      Tool: Pyflakes
      Processed Files:
        reviewboard/diffviewer/forms.py
        reviewboard/diffviewer/parser.py
        reviewboard/scmtools/hg.py
        reviewboard/scmtools/core.py
        reviewboard/diffviewer/tests.py
      Ignored Files:
    
    
  2. 
      
chipx86
  1. Hey Colin,
    
    Thanks for all the hard work on this. I know we've gone back and forth a lot, but this looks good.
    
    I wanted to run something by you before committing this and get your thoughts on it.
    
    I'm putting together a change that in some way overlaps. Basically, we're going to start storing the commit IDs even in cases where we don't natively fetch via commit ID on that type of repository.
    
    Basically, this is for Git + BitBucket support. BitBucket's API only allows fetching of files using a commit SHA1, rather than a blob SHA1 (which is otherwise used in most Git services). Unlike Mercurial, Git diffs don't contain this information natively in the diff, so we'll be allowing RBTools to provide it via our API when uploading the diff. We can then use the commit SHA1 when fetching from BitBucket, and the blob SHA1 everywhere else.
    
    So that seems pretty related to your change. The difference being that you're adding support to always use a commit SHA1 (if provided by the DiffParser) as the file revision.
    
    So I'm trying to decide how to merge these concepts together well. We may want to augment other DiffParsers to pull out the commit SHA1 but *not* use it as the file's revision, since we may look up the file differently depending on service.
    
    I think, then, that the only thing that would need to be modified with your change is to bring back an SCMTool capability field for saying whether to use commit IDs as revisions when parsing diffs. We had diff_uses_changeset_ids, and I think we should keep using that (but rename to diff_use_commit_id_as_revision). We can then safely return and store a commit ID from the DiffParser without necessarily having to look up via that ID.
    
    My upcoming change would then add some additional logic to take that commit ID the DiffParser returns and store it in the DiffSet, in a new field I'll be introducing.
    
    Does that seem sane?
    1. By the way, you don't have to do any of this. I just want to pick your brain.
  2. 
      
david
  1. Ship It!
  2. 
      
CC
Review request changed
Status:
Completed
Change Summary:
Pushed to release-1.7.x (45faed4). Thanks!