This change has been marked as completed.

Describe the completed change (optional):

Summary

Re-enable an important optimization in the Myers diff algorithm.

Review Request #5228 — Created Jan. 9, 2014 and submitted 11 years, 6 months ago

Information

Owner

chipx86*

Repository

Review Board

Branch

release-1.7.x

Bugs

Depends On

Reviewers

Groups

reviewboard

People

Description*

Re-enable an important optimization in the Myers diff algorithm.

There's an optimization we turned off many years ago in MyersDiffer
that results in excessively long diff times for very large files. This
was the source of some huge CPU spikes and memory usage, especially if
users dogpiled on the review request.

We turned this off in the early days of Review Board, and the commit
message was vague about why. It seems that it resulted in a breakage
with some diffs. I have not been able to reproduce this, and this part
of the algorithm matches GNU diff's perfectly. I know we've made other
fixes to the differ since then, so most likely, the breakage was due to
one of those.

To ensure compatibility with older diffs, I've bumped the diff compat
version. If it turns out that this does break things, we can easily
revert it, and test manually with any diffs by setting the stored compat
version in the database per-diffset.

This gets some large, more insane diffs from 1-2 minutes down to around
10 seconds.

Testing Done

Unit tests pass.

Tested with some large diffs that went down the optimized code path. Didn't
see any problems.

Issues

Description	From	Last Updated
Just to improve forward-compatibility for the next time we change this code, can you do this instead? if compat_version in …	david	11 years, 6 months ago
Instead of using xrange here, you should add this to the top of the file: from djblets.util.compat.six.moves import range	david	11 years, 6 months ago
Actually, how about we define symbolic COMPAT_VERSION_*s in differ.py next to the DEFAULT one, and then import it here?	david	11 years, 6 months ago
There are no open issues

Change Summary:: Targeting 1.7.x for this.
Branch:: master
release-1.7.x

reviewboard/diffviewer/differ.py (Diff revision 1)

The issue has been resolved. Show all issues

Just to improve forward-compatibility for the next time we change this code, can you do this instead?

if compat_version in (1, 2):

You've also got a tyop in your description: "rever"

Change Summary:


Changed the capability version check to be more specific.
Fixde a tyop.

Description:

		Re-enable an important optimization in the Myers diff algorithm.

		There's an optimization we turned off many years ago in MyersDiffer
		that results in excessively long diff times for very large files. This
		was the source of some huge CPU spikes and memory usage, especially if
		users dogpiled on the review request.

		We turned this off in the early days of Review Board, and the commit
		message was vague about why. It seems that it resulted in a breakage
		with some diffs. I have not been able to reproduce this, and this part
		of the algorithm matches GNU diff's perfectly. I know we've made other
		fixes to the differ since then, so most likely, the breakage was due to
		one of those.

		To ensure compatibility with older diffs, I've bumped the diff compat
		version. If it turns out that this does break things, we can easily
~		rever it, and test manually with any diffs by setting the stored compat
	~	revert it, and test manually with any diffs by setting the stored compat
		version in the database per-diffset.

		This gets some large, more insane diffs from 1-2 minutes down to around
		10 seconds.

Diff:

Revision 2 (+17 -9)

Show changes

	reviewboard/diffviewer/differ.py
	reviewboard/diffviewer/myersdiff.py

reviewboard/diffviewer/myersdiff.py (Diff revisions 1 - 2)

The issue has been dropped. Show all issues

Instead of using xrange here, you should add this to the top of the file:

from djblets.util.compat.six.moves import range

chipx86 11 years, 6 months ago

The interdiff is being tricky. I moved this back to release-1.7.x, and that line was close enough to get caught up in the valid interdiff ranges. I actually didn't touch that line in either change (and six wouldn't be appropriate for 1.7.x).

reviewboard/diffviewer/myersdiff.py (Diff revision 2)

The issue has been resolved. Show all issues

Actually, how about we define symbolic COMPAT_VERSION_*s in differ.py next to the DEFAULT one, and then import it here?

Change Summary:


Added a DiffCompatVersion containing all the versions, the default, and a utility list for all supported Myers versions.
Updated MyersDiffer to use DiffCompatVersion instead of keeping its own constant.

Diff:

Revision 3 (+29 -14)

Show changes

	reviewboard/diffviewer/differ.py
	reviewboard/diffviewer/managers.py
	reviewboard/diffviewer/myersdiff.py

Ship it!

```
Ship It!
```

Status:: Completed