CVS scmtool

Review Request #50 — Created June 4, 2007 and submitted

Information

Review Board SVN (deprecated)
trunk

Reviewers

Update:
Added 100% unittests
Removed extrapath dependency

====
Update:
Back from vacation, fixed the remaining code format issues.
Interface now lists new files with correct name
Open problem: sort out and remove dependency of extrapath and add tests.

====
Update:
Uses the new diffparser, code gets a lot better now.
Open problem: interface lists new files as "/dev/null" (file.origName), not the real name found in file.newName, is this changes recently?
Also, removes dependency on external lib for CVS, needs to think a smart way to use other methods than pserver.

====
A first draft of the CVS scmtool. Whats needs to be done is to figure out how to store and display the complete filename when viewing the diff and change to a better cvs lib.
100% unit-tested (included)
david
  1. Got some comments for you.
  2. trunk/reviewboard/scmtools/cvs.py (Diff revision 1)
     
     
    I'm not sure about hardcoding the pserver method here.  Perhaps it'll do for now.
  3. trunk/reviewboard/scmtools/cvs.py (Diff revision 1)
     
     
    This seems really fragile, given that there's a datestring in there.  Can you convert this to a regular expression, or at least split on whitespace?
    1. Ok, new diff uploaded which fixes all current issues. Added compiled regex for getting full rcs file and also added a (tested) regex for getting the revision from revision_str.
  4. trunk/reviewboard/scmtools/cvs.py (Diff revision 1)
     
     
    Instead of the (truly ugly) isfloat method below, can you just do this?
    
    if not re.match('\d\\.\d', revision):
        raise SCMException('Unable to parse diff revision header "%s"' % \
                           revision_str)
  5. trunk/reviewboard/scmtools/cvs.py (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    This is a lot more complicated than it needs to be.  Here's what I recommend:
    
    for line in file.splitlines():
        m = re.match('^RCS file: %s/(.*),v$' % self.repopath, line) or \
            re.match('^RCS file: (.+)$', line)
        if m:
            self.extrapath.append(m.group(1))
            return
        if line[:10] == 'RCS file: ':
            raise SCMException('Unable to parse RCS line "%s"' % line)
    
    I'm not sure those last two lines are even necessary.  Will
    cvs diff -u ever do that?
    1. or even better:
      
      if line.startswith('RCS file: '):
      
      Might be nice to compile those two regexs first, but probably doesn't matter as we should hit those lines quickly in all correct diffs.
    2. David: We don't want to return there, do we? We're parsing an entire diff file, not just one filediff. We'd need to store all the RCS File paths in extrapath, not just the first one.
      
      Making sure I didn't misunderstand this.
    3. Nope it shouldn't return, that was dumb of me.
      
      Compiling the regexes once in the tool constructor would be great.
  6. As far as .cvspass is concerned, maybe this could read the file in CVSTool.__init__ and add the necessary line yourself?
chipx86
  1. Some more comments.
  2. trunk/reviewboard/scmtools/cvs.py (Diff revision 1)
     
     
    Might be time for us to come up with some sort of authmethod field in the database.
    1. Maybe.  I wonder if we couldn't invent some kind of magical format
      string for the repository path (so you could do something like
      ':pserver:%u@cvs.foo.com').
      
      Of course, any of this can wait until later.  All of the CVS servers
      I care about have at least an anonymous read-only version using
      pserver.
  3. trunk/reviewboard/scmtools/cvs.py (Diff revision 1)
     
     
    Excess Whitespace. I'm only going to mark this one, but it'd be nice to go through the diff and remove the others.
  4. trunk/reviewboard/scmtools/cvs.py (Diff revision 1)
     
     
     
     
    I can't remember.. Is there a case where we actually pass in a path that's None?
    1. I just used the same signature as the core scmtool uses. As it seems from CVS it's probably a valid revision or PRE-CREATION I guess.
  5. trunk/reviewboard/scmtools/cvs.py (Diff revision 1)
     
     
    Is there any exception we can catch that would give us a FileNotFound?
    1. Probably a lot, but as I stated I want to change cvs module. When that is done I add the appropriate exceptions.
  6. trunk/reviewboard/scmtools/cvs.py (Diff revision 1)
     
     
     
    You can condense this down to:
    
    return file_str, PRE_CREATION
  7. trunk/reviewboard/scmtools/cvs.py (Diff revision 1)
     
     
     
    This can be condensed to:
    
    return self.extrapath.pop(0), revision
    1. Yes, I was going to add extra safety to check that the filenames actually matched.
  8. 
      
chipx86
  1. 
      
  2. trunk/reviewboard/diffviewer/forms.py (Diff revision 1)
     
     
    I've been thinking about this parse_extra_info_headers thing a bit more. Passing the entire diff and parsing it a second time is kind of excessive.
    
    Perhaps we should go a slightly different route. Parse the file diff, store everything we don't recognize, and pass that to the SCMTool? Of course, this possibly opens up additional problems where we have to suddenly understand every kind of diff. This is food for thought, not something that would have to be done for this change :)
  3. 
      
david
  1. This is looking pretty good!  Just a couple comments on the code:
  2. trunk/reviewboard/scmtools/cvs.py (Diff revision 3)
     
     
     
     
    OK, very minor style comment.  Can you write these lines like this:
    
    import re
    
    import pycvs
    
    from reviewboard.scmtools.core import SCMException, FileNotFoundException, \
                                          SCMTool, HEAD, PRE_CREATION
    
    
    This is how PEP-8 recommends formatting imports.
  3. trunk/reviewboard/scmtools/cvs.py (Diff revision 3)
     
     
    Don't you need \d+ here?  This regex won't match "1.15" or "10.2"
  4. What more needs to be done with this before it can go into SVN?
CU
  1. 
      
  2. trunk/reviewboard/scmtools/cvs.py (Diff revision 3)
     
     
    As David mentioned, this won't match multiple digits.  It also needs to match branch revisions (1.2.3.4), and you should make sure that the .* doesn't get carried away.
    
    You can use r'' to avoid the extra escapes.
    
    My suggestion: r'^.*?(\d+(\.\d+)+)$'
    1. This looks great.
    2. Didn't think of branches, thanks!
  3. trunk/reviewboard/scmtools/cvs.py (Diff revision 3)
     
     
    I don't like this general FIFO mechanism you're using, because it assumes that parse_diff_revision will always be called in the exact same order of the diff file.  This may be true now, but that's a dangerous assumption.
    
    Can you at least throw in an assertion that the popped value .endswith(file_str)?
    1. I don't like it either, but at the moment the cvstool isn't able to get this extra information from the diff parser. I've added an extra check to be sure.
  4. 
      
JA
  1. 
      
  2. trunk/reviewboard/scmtools/cvs.py (Diff revision 4)
     
     
     
    This should be:
    
      raise SCMException("File '%s' doesn't match the parsed file '%s'" %\
      (file_str, fullfile))
  3. 
      
JA
  1. 
      
  2. trunk/reviewboard/scmtools/cvs.py (Diff revision 4)
     
     
    The CVS server may be accessed using something other than pserver (ext, for example). There should be a way to configure that, I suppose.
  3. trunk/reviewboard/scmtools/cvs.py (Diff revision 4)
     
     
    This regex doesn't work with diffs created on a Windows machine. I managed to fix that with the following:
    
      self.regex_rev   = re.compile(r'^.*?(\d+(\.\d+)+)\r?$')
  4. 
      
david
  1. Got some more comments for you on this.
  2. trunk/reviewboard/scmtools/cvs.py (Diff revision 5)
     
     
    Add a blank line after this as per PEP-8
  3. trunk/reviewboard/scmtools/cvs.py (Diff revision 5)
     
     
     
     
     
    1 space between each token of variable = X.  Aligning them up like this is useful in some cases but for disparate data like this, just makes it hard to maintain.  Notice that you've already got one that ended up misaligned during your revisions ;)
  4. trunk/reviewboard/scmtools/cvs.py (Diff revision 5)
     
     
    We don't want to compile this regex each time an instance is initialized, since SCMTools are short-lived.  Make this class data like this:
    
    class CVSTool(SCMTool):
        regex_rev = re.compile(...)
    
        def __init__(self, repository):
            ...
    
    That way the regexs are compiled once for the entire program.
  5. trunk/reviewboard/scmtools/cvs.py (Diff revision 5)
     
     
    Two blank lines between classes.
  6. trunk/reviewboard/scmtools/cvs.py (Diff revision 5)
     
     
     
    Same deal as above with class data.
    1. regex_full depends on a bit of instance data (repo), so it can't be moved to class data as-is.
    2. Oh, right.  OK.  Does repo need to be escaped at all for this?
    3. Good point -- yes, repo should go through re.escape().
  7. trunk/reviewboard/scmtools/cvs.py (Diff revision 5)
     
     
    One way you could fix the problem with /dev/null is to have this rewrite origFile, origInfo as something like (file.newFile, 'PRE-CREATION') if origFile == '/dev/null'.  Then above in parse_diff_revision, check that the info string is 'PRE-CREATION'
  8. trunk/reviewboard/scmtools/cvs.py (Diff revision 5)
     
     
    Add another blank line here.
  9. trunk/reviewboard/scmtools/cvs.py (Diff revision 5)
     
     
     
    Please wrap these to 80 columns.
  10. 
      
CU
  1. This is looking very good -- I'm already using it for our internal site.  I have a few more comments, but I hope to see it committed soon.
  2. trunk/reviewboard/scmtools/cvs.py (Diff revision 5)
     
     
    This member variable is never used.
    1. It's used internally by DiffParser
    2. Ok, I see -- but I think you should just call DiffParser.__init__(self, data) to let it setup self.data and self.lines.
  3. trunk/reviewboard/scmtools/cvs.py (Diff revision 5)
     
     
     
     
     
     
     
    It's a bit painful to scan every line for RCS info.  I think extrapath is not needed anymore -- see below.
  4. trunk/reviewboard/scmtools/cvs.py (Diff revision 5)
     
     
    Rather than relying on extrapath, which assumes things happen in an exact order, I think you can now just parse for the RCS line directly.  Just scan for a few lines before linenum for the matching RCS.  My patches always seem to have RCS at (linenum-3), but you can add a little slop to that.
    1. Yes, it should be at a fixed linenum, 2 or 3 depending on whether it's a new or an existing file, I'll see if I've got time tomorrow
  5. trunk/reviewboard/scmtools/cvs.py (Diff revision 5)
     
     
    Rather than format-printing '-d%s', just pass it as two separate arguments: '-d', self.repository.  Same for '-r%s'.
  6. 
      
david
  1. This looks great.  Is it ready to be committed?
    1. Well, it works for me(tm) :-). I'm not yet running this in production so there might be cornercases, etc. Also, support for other methods than pserver, but that's up to someone else as I'm only using pserver at the moment.
    2. I agree, this is good.  Let's get it in -- unseen cornercases can be worked out as more people try it.
    3. I just committed the patch to SVN, and will close out this review.  Thanks for your hard work!
  2.