Git SCMTool
Review Request #80 — Created June 18, 2007 and submitted
Update: Style changes as per David's review. I also switched SCMException to SCMError ====================================================================== Update: Include changes to the diff parsing test :) ====================================================================== Update: Only change is storing the whole diff for a file, including the 'diff --git' line ====================================================================== Update: This change now relies on the DiffParser class in http://reviews.review-board.org/r/99/ You can still only talk to a local git repository. - Refactored diff parsing to better support Git diff format - Added Unittests for diff parsing ====================================================================== Update: New patch coming without DiffParser changes, better git diff parsing, tests. ====================================================================== Update: - made new DiffParser class - updated unittests - minimized code duplication in diff parsing I have made a DiffParser object now, as it seemed to be the easiest way to override specific behaviour without duplicating chunks of code. For other SCM systems, it may pay to split out the DiffParser class even more, so you only need to override discreet operations, but it works for parsing diffs generated with Git, and there is no real code duplication. ====================================================================== Update: - changed form.py call the parseDiff method on the SCMTool - made SCMTool.parseDiff use the default parser - update GitTool to reflect the above changes, but with custom per file parser I have not looked into abstracting out the per file parser to remove code duplication ====================================================================== Update: This diff now abstracts out the diff parsing in reviewboard a little bit: - an SCMTool subclass can optionally provide a whole new diff parsing method - an SCMTool subclass can now optionally provide a per-file diff parsing method (this was needed for Git) - the GitTool has been updated to provide its own per-file diff parsing method The CVS tool could be updated to override the main diff parsing method, extract the information needed and then proceed by calling the standard parse function. This would abstract away the need for all SCMTools to have a parse_extra_info_headers method. ====================================================================== This is the start of my effort to get Git support for Review Board. I don't think it is really ready, but I would like to see if I am heading in the right direction. At the moment it only handles talking to a Git repo on the local filesystem. This can be expanded when needed (works fine for us). I feel like the whole usage of a revision is a bit clunky, as Git uses objects with unique identifiers. I have pretty much just used an object id as a revision, just so i could get it working.
Update: Unittest passes again :) ====================================================================== Update: Current tests still pass, and I have added a unittest for testing the parsing of Git patches. ====================================================================== Update: I updated the unittests for using the DiffParser class, and it works fine. Again, I don't have/use Perforce, so it may pay off to try that with the new class. ====================================================================== Update: All tests still pass ====================================================================== Update: All current unit tests for diff parsing work. The Perforce tests are skipped, as I do not use this SCM or have the python modules installed. ====================================================================== We have it talking to a Git repository, applying patches and general review functionality is working. No accompanying unittests yet.
-
-
I think what we should do instead of these checks is to always call SCMTool.parseDiff. The default implementation can just turn around and call diffparser.parse. This allows this whole block of code to simply be: files = tool.parseDiff(file["content"]) And in the SCMTool: def parseDiff(self, data): return diffparser.parse(data)
-
We need to find out what we can abstract or make available in some form so that all this logic is never duplicated.
-
Thanks for your hard work on this. I have some additional changes I'd like to see before this goes in.
-
No need for the \ on the first line here or the second line. Python doesn't need this when the statements are in the parens. This applies to the other tests as well. It might be best to have a utility function in that class that takes the base filename ("git_simple.diff"), opens it, and returns the result of parse()[0].
-
-
-
-
Can you describe what the problem is in more detail here? It'll help if other people are looking to contribute.
-
-
I'm hoping we can reduce this in the future. I assume the main problem is that lsdiff doesn't return the correct things for git diffs? I'll be writing a replacement for lsdiff that will likely be part of DiffParser. It would be nice if we could then condense this further so that the parsing of this diff would be handled once in the base class and then you could just override parseFile and such.
-
-
-
Adding a blank line between all these comments will help for readability so that the code is less clumped together. Also before try: for:, etc.
-
-
Are we guaranteed that paths will start with a/ and b/? If so, it'd be nice to document this, but that seems strange to me.
-
It'd be nice to have some small comment showing how exactly the resulting fields will look so that the filenames[0][3:] makes more sense.
-
-
-
-
-
-
-
There's a new diff parser in SVN (r852) that will change how this all works. Now you should have no need to overload parse(). Instead, you want to provide new parse_diff_header() and parse_special_header() methods. General rule is that if the header you care about is a replacement for the --- and +++ lines, you want to use parse_diff_header(). Otherwise, if it comes before that, use parse_special_header(). You can see how this works in the other SCMTools.