Summary

Detect and display moved parts in a file.

Review Request #926 — Created July 21, 2009 and submitted

Information

Owner

edufelipe

Repository

Review Board SVN (deprecated)

Branch

Bugs

Depends On

Reviewers

Groups

reviewboard

People

Description

This diff builds up on my previous whitespace diff, so it also contains those changes, as mercurial could not keep them apart.

Further improving the diff viewer is the detection of whole blocks move inside a file. A move is characterized by a delete in one place and the insert of the same text on another location. The main problem for this effect is the complexity of the analysis required for this detection. So for now, I opted to do whole blocks only, but the interface and internal representation supports an arbitrary technique, so it is possible to build upon them.

Once a move is detected it is shown according to the screenshots below. Each moved line gets a handle containing information as from where that line came from, or were it went. When the handle is clicked the page jumps to the surrounding chunk, that is highlighted, and the corresponding line is highlighted for 2 seconds, with a bright yellow.

Testing Done

Tested all sorts of moves. File with only moves, mixed moves and edit and files where there is several instances of the destination (i.e. when the user cuts a chunk and pastes several times). In the case of multiple destinations, the last one found is the one considered target, while the others are ignored.

The effects it self were tested on Safari 4.0, Chrome 3.0 preview, Firefox 3.5 and Firefox 3.0

Screenshots

Visualization of a move

Highlight effect

chipx86 July 22, 2009, 5:02 a.m.

Many of these comments have to do with the whitespace change, so I'll just move them over to there.

/trunk/reviewboard/diffviewer/diffutils.py (Diff revision 1)

This doesn't really describe the function. Might be better as a comment.

Also, "Access"

edufelipe July 24, 2009, 5:24 a.m.
```
Fixed on whitespace patch.
```

/trunk/reviewboard/diffviewer/diffutils.py (Diff revision 1)
```
Trailing whitespace.
```
/trunk/reviewboard/diffviewer/diffutils.py (Diff revision 1)
```
Should be removed before committing.
```
/trunk/reviewboard/diffviewer/diffutils.py (Diff revision 1)
```
Same here.
```
/trunk/reviewboard/diffviewer/diffutils.py (Diff revision 1)
```
Should also be removed.
```

/trunk/reviewboard/diffviewer/diffutils.py (Diff revision 1)

Trailing whitespace.

I don't know that the comment is really necessary.

edufelipe July 24, 2009, 5:25 a.m.
```
Fixed on whitespace patch
```

/trunk/reviewboard/diffviewer/diffutils.py (Diff revision 1)
```
"WS" -> "whitespace"
```

/trunk/reviewboard/diffviewer/diffutils.py (Diff revision 1)

Space after the :

Also, on the comments, there should be a space after #, and the comment should end with a period.

/trunk/reviewboard/diffviewer/diffutils.py (Diff revision 1)

Would you mind adding some comments explaining this code? It's sufficiently complex.

/trunk/reviewboard/diffviewer/diffutils.py (Diff revision 1)
```
Space on both sides of +
```

/trunk/reviewboard/diffviewer/diffutils.py (Diff revision 1)

When does r ever change?

edufelipe July 24, 2009, 5:33 a.m.

Major infinite loop bug, that wasn't triggered in my tests because of the break inside the 'if' statement. Fixed.

/trunk/reviewboard/diffviewer/diffutils.py (Diff revision 1)

Since we do each range twice, we might as well store this in variables and reuse it.

/trunk/reviewboard/htdocs/media/rb/css/diffviewer.css (Diff revision 1)
```
Space before "left"
```
/trunk/reviewboard/htdocs/media/rb/css/diffviewer.css (Diff revision 1)
```
Can be: 0px 15px;
```
/trunk/reviewboard/htdocs/media/rb/css/diffviewer.css (Diff revision 1)
```
Space before "right"
```

/trunk/reviewboard/htdocs/media/rb/js/diffviewer.js (Diff revision 1)

Please add a documentation comment above this describing the function.

/trunk/reviewboard/htdocs/media/rb/js/diffviewer.js (Diff revision 1)

Since we reference table.sidebyside instances many times, we should store the result of $("table.sidebyside") in a variable and reference that variable for all other uses.

/trunk/reviewboard/htdocs/media/rb/js/diffviewer.js (Diff revision 1)

Can you put comments on hteir own lines? With a blank line preceding them.

/trunk/reviewboard/htdocs/media/rb/js/diffviewer.js (Diff revision 1)
```
Blank line before this.
```
/trunk/reviewboard/htdocs/media/rb/js/diffviewer.js (Diff revision 1)
```
Two blank lines here.
```
/trunk/reviewboard/htdocs/media/rb/js/diffviewer.js (Diff revision 1)
```
No space after parent().
```
/trunk/reviewboard/templates/diffviewer/diff_file_fragment.html (Diff revision 1)
```
Should have translation for the "Moved from ..."
```
/trunk/reviewboard/templates/diffviewer/diff_file_fragment.html (Diff revision 1)
```
Translation here too.
```

/trunk/reviewboard/templates/diffviewer/view_diff.html (Diff revision 1)

This is probably part of the other change, but rather than having onclick, I'd prefer that diffviewer.js register these actions for these links. You can give them IDs to reference them.

Change Summary:

Made all the changes requested by Christian.

Diff:

Revision 2 (+212 -15)

Show changes

	/trunk/reviewboard/diffviewer/diffutils.py
	/trunk/reviewboard/htdocs/media/rb/css/diffviewer.css
	/trunk/reviewboard/htdocs/media/rb/js/diffviewer.js
	/trunk/reviewboard/templates/diffviewer/diff_file_fragment.html
	/trunk/reviewboard/templates/diffviewer/view_diff.html

Hey Eduardo,

Can you update this diff now that the whitespace change is in?

I'm trying to think of ways to optimize the algorithm so we don't loop so much at the end. I think we can do this a few different ways.

While doing the initial iteration and deciding whether to dump the range into an inserts or deletes bucket, we could generate some sort of a hash of the data and store it in a dictionary. The dictionary would map the hash to a list of groups. This may only be needed for the removes bucket.

Then, after putting them into buckets, we can go through the inserts list and see if its generated hash exists in the removes bucket. If it does, loop through the list associated with that hash and do the comparison.

This has the benefit of only introducing one mandatory loop, the one for the inserts list. Instead of looping through all the remaining removes on each insert, we only need to do one cheap lookup, and then iterate through a small list (probably 1 or 2) of items for comparison.

The hash tag could be something as simple as "%d-%d-%s-%s" % (i2 - i1, j2 - j1, differ.a[:10], differ.a[-10:]), which would potentially have more matches to search through, but would be simple and quick to generate. Probably better than going through all the data and generating a checksum, which would cost more than a couple extra small loops.

So, something like:


    for ... in differ.get_opcodes():
        ...
        if tag == 'delete':
            key = "%d-%d-%s-%s" % (j2 - j1, i2 - i1, differ.b[10:], differ.b[-10:])

            if not key in removes:
                removes[key] = []

            removes[key].append(groups[-1])
        else:
            key = "%d-%d-%s-%s" % (i2 - i1, j2 - j1, differ.a[:10], differ.a[-10:])

            if not key in inserts:
                inserts[key] = []

            inserts[key].append(groups[-1])

    for insert_key, groups in inserts.iteritems():
        if insert_key in removes:
            # Do comparisons here


This is just a rough idea at 1 in the morning, so it's probably buggy at best, but try out a version of this and see if you can make it work. I want to be sure this algorithm is as optimal as possible.

Thanks for your hard work on this so far!

chipx86 July 29, 2009, 6:10 p.m.

Er, that differ.a[:10] should be differ.a[10:]

edufelipe July 30, 2009, 3:14 a.m.

I'll look into it and post an update later, for now I'll just post without whitespace.

/trunk/reviewboard/diffviewer/diffutils.py (Diff revision 2)

For this, you can use:

for r in xrange(0, len(removes)):

/trunk/reviewboard/diffviewer/diffutils.py (Diff revision 2)

As an optimization, I think we can check to see if the length of each range is the same before doing any data comparison.

edufelipe July 30, 2009, 2:33 a.m.
```
Ok. Done.
```

/trunk/reviewboard/diffviewer/diffutils.py (Diff revision 2)

This pretty much has the same effect as:

return groups

edufelipe July 30, 2009, 2:22 a.m.
```
And I miss the obvious thing. Thanks.
```

Change Summary:

This is an updated version, that does not include tagging, but it no longer have whitespace changes.

Diff:

Revision 3 (+98 -11)

Show changes

	/trunk/reviewboard/diffviewer/diffutils.py
	/trunk/reviewboard/htdocs/media/rb/css/diffviewer.css
	/trunk/reviewboard/htdocs/media/rb/js/diffviewer.js
	/trunk/reviewboard/templates/diffviewer/diff_file_fragment.html