Adding in a smarter parent-base finding algorithm

Review Request #6797 — Created Jan. 17, 2015 and discarded

Information

RBTools
master

Reviewers

Main idea

Given a tree like this one:

A - * - * - * - * origin/master
  \ * - B - * - * origin/feature
         \- C - D topic

when we rbt post, it would like to have three things,
'base': A revision to use as the base of the resulting diff.
'tip': A revision to use as the tip of the resulting diff.
'parent_base': (optional) The revision to use as the base of a
parent diff.
Now, parent_base has to be in the remote repository, in other words, a ancestor of a commit of any branch origin/*. We would also like parent_base to be as young an ancestor of tip as possible, given the first constraint.

I've worked out a git-rev-list query to find this.

git rev-list $TIP --not --remotes=origin | tail -1

should give the oldest ancestor that is not reachable from a public branch, so it's parent will be the best parent base.

Now with tests!

test_diff_finding_parent is a new test, and I had to modify test_parse_revision_spec_no_args_parent to pass, since the parent base is no longer the same after my modifications.

Description From Last Updated

Col: 80 E501 line too long (98 > 79 characters)

reviewbotreviewbot

Col: 17 E265 block comment should start with '# '

reviewbotreviewbot

Col: 17 E113 unexpected indentation

reviewbotreviewbot

I need to fix this to listen to the upstream_branch if it's explicitly set.

TB tbelaire

Col: 50 E261 at least two spaces before inline comment

reviewbotreviewbot

Col: 80 E501 line too long (91 > 79 characters)

reviewbotreviewbot

Going to remove this hardcoded thing later.

TB tbelaire

local variable 'ri' is assigned to but never used

reviewbotreviewbot

Col: 80 E501 line too long (84 > 79 characters)

reviewbotreviewbot

Col: 17 E265 block comment should start with '# '

reviewbotreviewbot

Col: 17 E113 unexpected indentation

reviewbotreviewbot

Col: 50 E261 at least two spaces before inline comment

reviewbotreviewbot

Col: 80 E501 line too long (91 > 79 characters)

reviewbotreviewbot

local variable 'ri' is assigned to but never used

reviewbotreviewbot

Col: 80 E501 line too long (84 > 79 characters)

reviewbotreviewbot

local variable 'ri' is assigned to but never used

reviewbotreviewbot

local variable '_ri' is assigned to but never used

reviewbotreviewbot

Can we change the terminology here a little bit? I'd like to use: online -> remote offline -> local feature …

daviddavid

We should put blank lines around this block, since that's the way Christian likes it.

daviddavid

local variable '_' is assigned to but never used

reviewbotreviewbot

Col: 36 E127 continuation line over-indented for visual indent

reviewbotreviewbot

_ is often used for gettext. Best to call this unused, or just fetch get_origin()[0].

chipx86chipx86

Can you add a docstring covering exactly what this does and how it works?

chipx86chipx86

This can pretty easily spit out a large number of commits, particularly for large repositories. I'd give this a try …

chipx86chipx86

Should just be: if not local_commits:

chipx86chipx86

"a commit" "that's also" "available" Should end with a period.

chipx86chipx86

Can use self.assertIsInstance

VT VTL-Developer

Can use self.assertIn

VT VTL-Developer
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        rbtools/clients/git.py
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        rbtools/clients/git.py
    
    
  2. rbtools/clients/git.py (Diff revision 1)
     
     
    Show all issues
    Col: 80
     E501 line too long (98 > 79 characters)
    
  3. 
      
TB
TB
CH
  1. Ship It!
  2. 
      
TB
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        rbtools/clients/tests.py
        rbtools/clients/git.py
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        rbtools/clients/tests.py
        rbtools/clients/git.py
    
    
  2. rbtools/clients/git.py (Diff revision 2)
     
     
    Show all issues
    Col: 17
     E265 block comment should start with '# '
    
  3. rbtools/clients/git.py (Diff revision 2)
     
     
    Show all issues
    Col: 17
     E113 unexpected indentation
    
  4. rbtools/clients/tests.py (Diff revision 2)
     
     
    Show all issues
    Col: 50
     E261 at least two spaces before inline comment
    
  5. rbtools/clients/tests.py (Diff revision 2)
     
     
    Show all issues
    Col: 80
     E501 line too long (91 > 79 characters)
    
  6. rbtools/clients/tests.py (Diff revision 2)
     
     
    Show all issues
     local variable 'ri' is assigned to but never used
    
  7. rbtools/clients/tests.py (Diff revision 2)
     
     
    Show all issues
    Col: 80
     E501 line too long (84 > 79 characters)
    
  8. 
      
TB
  1. 
      
  2. rbtools/clients/git.py (Diff revision 2)
     
     
    Show all issues

    I need to fix this to listen to the upstream_branch if it's explicitly set.

  3. rbtools/clients/tests.py (Diff revision 2)
     
     
    Show all issues

    Going to remove this hardcoded thing later.

    1. I'm not sure if I should just delete this line. It was a useful sanity check when setting up the test, but it makes it rather brittle to changes later on.

      Eh, I'll just delete it.

  4. 
      
david
  1. I haven't looked at the code yet, but you can add bug 3510 to this.

    1. https://code.google.com/p/reviewboard/issues/detail?can=2&q=3510
      Hmm, I'm not trying to fix that explicitly.

      All this does it use the normal calculation for base, and try and find the most optimal parent_base.

    2. Ah yes, I misread the bug.

  2. 
      
mike_conley
  1. What's happened to this patch? We haven't seen a code drop in a while. Theo, can you please update this today?

    1. Sorry, I had forgottan to publish the draft after I rbt pushed it.

  2. 
      
TB
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        rbtools/clients/tests.py
        rbtools/clients/git.py
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        rbtools/clients/tests.py
        rbtools/clients/git.py
    
    
  2. rbtools/clients/git.py (Diff revision 3)
     
     
    Show all issues
    Col: 17
     E265 block comment should start with '# '
    
  3. rbtools/clients/git.py (Diff revision 3)
     
     
    Show all issues
    Col: 17
     E113 unexpected indentation
    
  4. rbtools/clients/tests.py (Diff revision 3)
     
     
    Show all issues
    Col: 50
     E261 at least two spaces before inline comment
    
  5. rbtools/clients/tests.py (Diff revision 3)
     
     
    Show all issues
    Col: 80
     E501 line too long (91 > 79 characters)
    
  6. rbtools/clients/tests.py (Diff revision 3)
     
     
    Show all issues
     local variable 'ri' is assigned to but never used
    
  7. rbtools/clients/tests.py (Diff revision 3)
     
     
    Show all issues
    Col: 80
     E501 line too long (84 > 79 characters)
    
  8. 
      
TB
  1. 
      
  2. rbtools/clients/tests.py (Diff revisions 2 - 3)
     
     

    This is because we are now selecting a better parent_base. I wouldn't mind someone doublechecking that this is a correct change to the test.

  3. 
      
TB
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        rbtools/clients/tests.py
        rbtools/clients/git.py
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        rbtools/clients/tests.py
        rbtools/clients/git.py
    
    
  2. rbtools/clients/tests.py (Diff revision 4)
     
     
    Show all issues
     local variable 'ri' is assigned to but never used
    
  3. rbtools/clients/tests.py (Diff revision 4)
     
     
    Show all issues
     local variable '_ri' is assigned to but never used
    
  4. 
      
TB
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        rbtools/clients/tests.py
        rbtools/clients/git.py
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        rbtools/clients/tests.py
        rbtools/clients/git.py
    
    
  2. rbtools/clients/tests.py (Diff revision 5)
     
     
    Show all issues
     local variable '_' is assigned to but never used
    
    1. I'm pretty sure that's a bug, _ is totally allowed to be assigned to and not used.

    2. Yeah, but it's probably more appropriate to just not save the result of the function to any variable (just have self.client.get_repository_info()).

  3. 
      
david
  1. 
      
  2. rbtools/clients/git.py (Diff revision 5)
     
     
    Show all issues

    Can we change the terminology here a little bit?

    I'd like to use:
    online -> remote
    offline -> local
    feature -> local_branch
    origin -> remote

  3. rbtools/clients/git.py (Diff revision 5)
     
     
     
     
     
     
    Show all issues

    We should put blank lines around this block, since that's the way Christian likes it.

  4. 
      
TB
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        rbtools/clients/tests.py
        rbtools/clients/git.py
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        rbtools/clients/tests.py
        rbtools/clients/git.py
    
    
  2. rbtools/clients/git.py (Diff revision 6)
     
     
    Show all issues
    Col: 36
     E127 continuation line over-indented for visual indent
    
  3. 
      
TB
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        rbtools/clients/tests.py
        rbtools/clients/git.py
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        rbtools/clients/tests.py
        rbtools/clients/git.py
    
    
  2. 
      
TB
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        rbtools/clients/tests.py
        rbtools/clients/git.py
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        rbtools/clients/tests.py
        rbtools/clients/git.py
    
    
  2. 
      
chipx86
  1. Good work so far!

    Since this is a pretty fundamental, hard-to-get-right change, I want to see a lot more than one test. I can think of several more that need to be added:

    1. Simple case: Posting a feature branch off a tracking branch aligned with the remote.

    This is the basic case of:

    o [feature-branch]
    |
    o [origin/remote-branch] [remote-branch]
    |
    |  o [origin/master] [master]
    |  |
    

    It should detect origin/remote-branch.

    2. Posting a feature branch off a tracking branch with changes since the remote.

    o [feature-branch]
    |
    o [remote-branch]
    |
    o [origin/remote-branch]
    |
    |  o [origin/master] [master]
    |  |
    

    It should still detect origin/remote-branch.

    3. Posting a feature branch off a branch not properly tracking the remote.

    Same as #1 above, but with remote-branch not being set to track origin/remote-branch (meaning it's a regular branch and not set to track in the git config).

    It should still detect origin/remote-branch.

    4. Posting a feature branch with a merged tracking branch off another tracking branch.

    o [feature-branch]
    |\
    | \
    |  o [origin/merged-branch]
    |  |
    | ...
    |
    o [origin/remote-branch] [remote-branch]
    |
    

    This is harder, because it could go either way. However, I think the user would expect that it would show the merged changes, meaning that origin/remote-branch would be the right branch to detect.

    5. Posting a feature branch off a tracking branch that merged another tracking branch.

    o [feature-branch]
    |
    o [origin/remote-branch] [remote-branch]
    |\
    | \
    |  o [origin/merged-branch]
    |  |
    

    Should be fairly straight-forward, but good to have for testing. Should be the same as #1 above, really.

    6. Posting a feature branch off a remote branch without any tracking branches.

    o [feature-branch]
    |
    o [origin/remote-branch]
    |
    

    The tracking branch shouldn't matter so much. It should still return origin/remote-branch.

    7. Posting a feature branch off a remote branch aligned to the same commit as another remote branch.

    o [feature-branch]
    |
    o [origin/remote-branch1] [origin/remote-branch2]
    |
    

    Obviously either one of these is fine, but it should reliably choose one of these in a predictable way.

    8. Posting a feature branch not up-to-date with the remote branch.

    o [feature-branch]
    |
    |   o [origin/remote-branch] [remote-branch]
    |  /
    | /
    o
    |
    o [origin/master]
    |
    

    I wish there was a good way to have this detect origin/remote-branch, but that's unlikely to be possible. It should return origin/master if auto-detecting.

    However, if the user has hard-coded TRACKING_BRANCH = 'origin/remote-branch', then it should be smart and use the common commit they share, to avoid a lot of the problems people hit today where the diff gets all screwed up with things reversed from origin/remote-branch.

    9. Posting a feature branch with branches from different remotes in the path.

    o [feature-branch]
    |
    o [other-remote/remote-branch]
    |
    o [origin/remote-branch]
    |
    

    The other remotes should be ignored. It should only fetch origin/remote-branch.

    Others?

    There may be other situations that come up, but this is a good set of cases that all need to be handled properly in order to not break things.

    We also need to ensure that performance is fine with large (multi-gigabyte) repositories.

  2. rbtools/clients/git.py (Diff revision 8)
     
     
    Show all issues

    _ is often used for gettext. Best to call this unused, or just fetch get_origin()[0].

    1. Sorry about that. I was used to languages that enforce _ as a throwaway like /dev/null.

  3. rbtools/clients/git.py (Diff revision 8)
     
     
    Show all issues

    Can you add a docstring covering exactly what this does and how it works?

  4. rbtools/clients/git.py (Diff revision 8)
     
     
     
     
    Show all issues

    This can pretty easily spit out a large number of commits, particularly for large repositories. I'd give this a try against some known large repositories, like the KDE or Linux kernel repositories, and see how it handles.

    When run against the Review Board repository with a commit on master, I get over 6,300 commits back.

    1. I downloaded a copy of the linux source code, but I am unable to reproduce this.
      It should only return the comments that exist locally, but not in the remote.

      $ cd linux
      $ time git rev-list master --not --remotes=origin
      
      real    0m0.088s
      user    0m0.007s
      sys     0m0.081s
      
  5. rbtools/clients/git.py (Diff revision 8)
     
     
    Show all issues

    Should just be:

    if not local_commits:
    
  6. rbtools/clients/git.py (Diff revision 8)
     
     
    Show all issues

    "a commit"

    "that's also"

    "available"

    Should end with a period.

  7. 
      
VT
  1. 
      
  2. rbtools/clients/tests.py (Diff revision 8)
     
     
    Show all issues

    Can use self.assertIsInstance

    1. Actually, we can't use self.assertIsInstance; it was added in Python 2.7 and we have to support Python 2.6.

  3. rbtools/clients/tests.py (Diff revision 8)
     
     
     
     
    Show all issues

    Can use self.assertIn

    1. I'm going to keep with the style used in the rest of the tests for now, though maybe I'll come back and change the whole file to use that.

  4. 
      
TB
Review request changed
Status:
Discarded