Improve move detection

Review Request #1683 — Created June 26, 2010 and submitted


Review Board


This change greatly improves move detection by finding moved ranges within insert/delete chunks. Previously, move detection only worked when entire insert/delete chunks matched, but now if only a few lines within a chunk appear to have moved, it'll detect that.

It also filters out some things that used to appear as moves but were pretty useless. Mainly, blank or whitespace-only lines, and lines containing non-alphanumeric data only (such as "}", "/*", " *", "*/", and other such things). These are filtered only if the move range consists *only* of such things. If some real content exists within the move range, the entire thing is preserved.

We also filter out moves that have more than one destination. If a particular block was deleted from one section and then appears to be inserted in two or more places, then it hasn't really moved, and it's likely something that is not particularly interesting. We may decide later to list all the destination areas instead, but for now it should be fine.

The new algorithm is a little more complicated, but it's been fully documented. Due to the more complex checking, it is in theory a little slower, but only at the worst-case scenarios of many identical insert/delete ranges.
Tested with several diffs in my local branch. Also made some sample files and a unit test for checking the move ranges. The sample files were tested first in my browser, and the unit tests were made with the data viewed there in order to be sure it worked right.

The sample files test both "real" moves (in this case, a whole function) and filtered out moves (using an empty comment block without any text).
  2. reviewboard/diffviewer/ (Diff revision 1)
    Left over?
  3. reviewboard/diffviewer/ (Diff revision 1)
    Left over?