Summary

CVS scmtool

Review Request #50 — Created June 4, 2007 and submitted

Information

Owner

lobbin

Repository

Review Board SVN (deprecated)

Branch

trunk

Bugs

Depends On

Reviewers

Groups

reviewboard

People

Description

Update:
Added 100% unittests
Removed extrapath dependency

====
Update:
Back from vacation, fixed the remaining code format issues.
Interface now lists new files with correct name
Open problem: sort out and remove dependency of extrapath and add tests.

====
Update:
Uses the new diffparser, code gets a lot better now.
Open problem: interface lists new files as "/dev/null" (file.origName), not the real name found in file.newName, is this changes recently?
Also, removes dependency on external lib for CVS, needs to think a smart way to use other methods than pserver.

====
A first draft of the CVS scmtool. Whats needs to be done is to figure out how to store and display the complete filename when viewing the diff and change to a better cvs lib.

Testing Done

100% unit-tested (included)

```
Got some comments for you.
```

trunk/reviewboard/scmtools/cvs.py (Diff revision 1)

I'm not sure about hardcoding the pserver method here.  Perhaps it'll do for now.

trunk/reviewboard/scmtools/cvs.py (Diff revision 1)

This seems really fragile, given that there's a datestring in there.  Can you convert this to a regular expression, or at least split on whitespace?

lobbin June 8, 2007, 7:42 a.m.

Ok, new diff uploaded which fixes all current issues. Added compiled regex for getting full rcs file and also added a (tested) regex for getting the revision from revision_str.

trunk/reviewboard/scmtools/cvs.py (Diff revision 1)

Instead of the (truly ugly) isfloat method below, can you just do this?

if not re.match('\d\\.\d', revision):
    raise SCMException('Unable to parse diff revision header "%s"' % \
                       revision_str)

trunk/reviewboard/scmtools/cvs.py (Diff revision 1)

This is a lot more complicated than it needs to be.  Here's what I recommend:

for line in file.splitlines():
    m = re.match('^RCS file: %s/(.*),v$' % self.repopath, line) or \
        re.match('^RCS file: (.+)$', line)
    if m:
        self.extrapath.append(m.group(1))
        return
    if line[:10] == 'RCS file: ':
        raise SCMException('Unable to parse RCS line "%s"' % line)

I'm not sure those last two lines are even necessary.  Will
cvs diff -u ever do that?

chipx86 June 4, 2007, 9:57 a.m.

or even better:

if line.startswith('RCS file: '):

Might be nice to compile those two regexs first, but probably doesn't matter as we should hit those lines quickly in all correct diffs.

chipx86 June 4, 2007, 11:05 a.m.

David: We don't want to return there, do we? We're parsing an entire diff file, not just one filediff. We'd need to store all the RCS File paths in extrapath, not just the first one.

Making sure I didn't misunderstand this.

david June 4, 2007, 11:55 a.m.

Nope it shouldn't return, that was dumb of me.

Compiling the regexes once in the tool constructor would be great.

As far as .cvspass is concerned, maybe this could read the file in CVSTool.__init__ and add the necessary line yourself?

cuviper June 6, 2007, 5:35 a.m.

.cvspass doesn't contain the raw password, but rather a hashed/encrypted variant.

Also, you can include the password in CVSROOT, so the .cvspass isn't strictly necessary.

  :pserver:username:password@server:/path

```
Some more comments.
```

trunk/reviewboard/scmtools/cvs.py (Diff revision 1)

Might be time for us to come up with some sort of authmethod field in the database.

david June 4, 2007, 11:59 a.m.

Maybe.  I wonder if we couldn't invent some kind of magical format
string for the repository path (so you could do something like
':pserver:%u@cvs.foo.com').

Of course, any of this can wait until later.  All of the CVS servers
I care about have at least an anonymous read-only version using
pserver.

trunk/reviewboard/scmtools/cvs.py (Diff revision 1)

Excess Whitespace. I'm only going to mark this one, but it'd be nice to go through the diff and remove the others.

trunk/reviewboard/scmtools/cvs.py (Diff revision 1)

I can't remember.. Is there a case where we actually pass in a path that's None?

lobbin June 4, 2007, 4:30 p.m.

I just used the same signature as the core scmtool uses. As it seems from CVS it's probably a valid revision or PRE-CREATION I guess.

trunk/reviewboard/scmtools/cvs.py (Diff revision 1)

Is there any exception we can catch that would give us a FileNotFound?

lobbin June 4, 2007, 4:31 p.m.

Probably a lot, but as I stated I want to change cvs module. When that is done I add the appropriate exceptions.

trunk/reviewboard/scmtools/cvs.py (Diff revision 1)

You can condense this down to:

return file_str, PRE_CREATION

trunk/reviewboard/scmtools/cvs.py (Diff revision 1)

This can be condensed to:

return self.extrapath.pop(0), revision

lobbin June 4, 2007, 4:31 p.m.

Yes, I was going to add extra safety to check that the filenames actually matched.

trunk/reviewboard/diffviewer/forms.py (Diff revision 1)

I've been thinking about this parse_extra_info_headers thing a bit more. Passing the entire diff and parsing it a second time is kind of excessive.

Perhaps we should go a slightly different route. Parse the file diff, store everything we don't recognize, and pass that to the SCMTool? Of course, this possibly opens up additional problems where we have to suddenly understand every kind of diff. This is food for thought, not something that would have to be done for this change :)

This is looking pretty good!  Just a couple comments on the code:

trunk/reviewboard/scmtools/cvs.py (Diff revision 3)

OK, very minor style comment.  Can you write these lines like this:

import re

import pycvs

from reviewboard.scmtools.core import SCMException, FileNotFoundException, \
                                      SCMTool, HEAD, PRE_CREATION


This is how PEP-8 recommends formatting imports.

trunk/reviewboard/scmtools/cvs.py (Diff revision 3)

Don't you need \d+ here?  This regex won't match "1.15" or "10.2"

What more needs to be done with this before it can go into SVN?

lobbin June 13, 2007, 7:47 p.m.

How does it look? All I want now is to change the cvs module, otherwise it seems good.

david June 17, 2007, 8:19 a.m.

Sounds good to me.  Some unit tests would be great, but
not necessary to start with.

trunk/reviewboard/scmtools/cvs.py (Diff revision 3)

As David mentioned, this won't match multiple digits.  It also needs to match branch revisions (1.2.3.4), and you should make sure that the .* doesn't get carried away.

You can use r'' to avoid the extra escapes.

My suggestion: r'^.*?(\d+(\.\d+)+)$'

david June 8, 2007, 11:34 a.m.
```
This looks great.
```
LO

lobbin June 8, 2007, 4:14 p.m.
```
Didn't think of branches, thanks!
```

trunk/reviewboard/scmtools/cvs.py (Diff revision 3)

I don't like this general FIFO mechanism you're using, because it assumes that parse_diff_revision will always be called in the exact same order of the diff file.  This may be true now, but that's a dangerous assumption.

Can you at least throw in an assertion that the popped value .endswith(file_str)?

lobbin June 8, 2007, 4:16 p.m.

I don't like it either, but at the moment the cvstool isn't able to get this extra information from the diff parser. I've added an extra check to be sure.

trunk/reviewboard/scmtools/cvs.py (Diff revision 4)

This should be:

  raise SCMException("File '%s' doesn't match the parsed file '%s'" %\
  (file_str, fullfile))

trunk/reviewboard/scmtools/cvs.py (Diff revision 4)

The CVS server may be accessed using something other than pserver (ext, for example). There should be a way to configure that, I suppose.

trunk/reviewboard/scmtools/cvs.py (Diff revision 4)

This regex doesn't work with diffs created on a Windows machine. I managed to fix that with the following:

  self.regex_rev   = re.compile(r'^.*?(\d+(\.\d+)+)\r?$')

```
Got some more comments for you on this.
```
trunk/reviewboard/scmtools/cvs.py (Diff revision 5)
```
Add a blank line after this as per PEP-8
```

trunk/reviewboard/scmtools/cvs.py (Diff revision 5)

1 space between each token of variable = X.  Aligning them up like this is useful in some cases but for disparate data like this, just makes it hard to maintain.  Notice that you've already got one that ended up misaligned during your revisions ;)

trunk/reviewboard/scmtools/cvs.py (Diff revision 5)

We don't want to compile this regex each time an instance is initialized, since SCMTools are short-lived.  Make this class data like this:

class CVSTool(SCMTool):
    regex_rev = re.compile(...)

    def __init__(self, repository):
        ...

That way the regexs are compiled once for the entire program.

trunk/reviewboard/scmtools/cvs.py (Diff revision 5)
```
Two blank lines between classes.
```

trunk/reviewboard/scmtools/cvs.py (Diff revision 5)

Same deal as above with class data.

cuviper July 5, 2007, 3:58 a.m.

regex_full depends on a bit of instance data (repo), so it can't be moved to class data as-is.

david July 5, 2007, 4:06 a.m.

Oh, right.  OK.  Does repo need to be escaped at all for this?

cuviper July 5, 2007, 4:31 a.m.

Good point -- yes, repo should go through re.escape().

trunk/reviewboard/scmtools/cvs.py (Diff revision 5)

One way you could fix the problem with /dev/null is to have this rewrite origFile, origInfo as something like (file.newFile, 'PRE-CREATION') if origFile == '/dev/null'.  Then above in parse_diff_revision, check that the info string is 'PRE-CREATION'

trunk/reviewboard/scmtools/cvs.py (Diff revision 5)
```
Add another blank line here.
```
trunk/reviewboard/scmtools/cvs.py (Diff revision 5)
```
Please wrap these to 80 columns.
```

This is looking very good -- I'm already using it for our internal site.  I have a few more comments, but I hope to see it committed soon.

trunk/reviewboard/scmtools/cvs.py (Diff revision 5)

This member variable is never used.

LO

lobbin July 18, 2007, 8:07 a.m.
```
It's used internally by DiffParser
```

cuviper July 18, 2007, 9:34 a.m.

Ok, I see -- but I think you should just call DiffParser.__init__(self, data) to let it setup self.data and self.lines.

trunk/reviewboard/scmtools/cvs.py (Diff revision 5)

It's a bit painful to scan every line for RCS info.  I think extrapath is not needed anymore -- see below.

trunk/reviewboard/scmtools/cvs.py (Diff revision 5)

Rather than relying on extrapath, which assumes things happen in an exact order, I think you can now just parse for the RCS line directly.  Just scan for a few lines before linenum for the matching RCS.  My patches always seem to have RCS at (linenum-3), but you can add a little slop to that.

lobbin July 18, 2007, 8:08 a.m.

Yes, it should be at a fixed linenum, 2 or 3 depending on whether it's a new or an existing file, I'll see if I've got time tomorrow

trunk/reviewboard/scmtools/cvs.py (Diff revision 5)

Rather than format-printing '-d%s', just pass it as two separate arguments: '-d', self.repository.  Same for '-r%s'.

Ship it!

This looks great.  Is it ready to be committed?

lobbin July 29, 2007, 12:39 a.m.

Well, it works for me(tm) :-). I'm not yet running this in production so there might be cornercases, etc. Also, support for other methods than pserver, but that's up to someone else as I'm only using pserver at the moment.

cuviper July 29, 2007, 3:06 a.m.

I agree, this is good.  Let's get it in -- unseen cornercases can be worked out as more people try it.

david July 29, 2007, 6:11 a.m.

I just committed the patch to SVN, and will close out this review.  Thanks for your hard work!