• 
      

    Fix interdiff changes with line reorders.

    Review Request #15055 — Created May 13, 2026 and submitted — Latest diff uploaded

    Information

    Review Board
    release-8.x

    Reviewers

    The new interdiff algorithm had an edge case where changes could get
    lost when reordering a sequence of lines. There were a few ways this
    could be triggered.

    An easy example would be to have a change in revision 1 that looks like:

    99   +...
    100   aaaa
    101  +bbbb
    102  +...
    

    And in revision 2:

    99   +...
    100  +bbbb
    101   aaaa
    102
    103  +...
    

    (Assume 0-based indexes above, instead of 1-based line numbers, to help
    the following comparisons make more sense.)

    The resulting unfiltered interdiff could look like:

      99   ...   |   99  ...
    - 100  aaaa  |
      101  bbbb  |  100  bbbb
    +            |  101  aaaa
      102  ...   |  102  ...
    

    In this case, the whole set of changes would be filtered out. The reason
    is that the computed ranges for side A would be [99,100) and side B
    would be [99,101). This wouldn't match the diff opcodes, which are:

    ("delete", 100, 101,  99, 99),
    ("equal",  101, 102, 100, 101),
    ("insert", 102, 102, 101, 102),
    

    That delete of [100,101) falls out of the range of [99,100), and
    the insert of [101,102) falls out of the range of [99,101). In the
    existing implementation, this would result in these being filtered out.

    In the cases above, note how the op starts at the exclusive end of
    the change ranges (which I refer to as the "boundary"). This is due to
    an insert becoming a delete or vice-versa.

    Issues can also occur for deletes in-between ranges related to the
    above cases, where, for example, aaaa would be surrounded by equal
    lines. The resulting interdiff could continue to move aaaa as above,
    but in a way where the intermediate delete would be lost.

    This has been addressed with some new heuristics that check the
    boundaries for shifted changes, and to check gaps between boundaries
    for a delete (which should be from the legitimate interdiff and not
    upstream changes). Guards were added that make sure we only do this if
    the previous change range (which may be the remnants of a split range)
    don't overlap with an opcode (indicating it may have shifted).

    This is done separately in Phase 1 and Phase 2 of the algorithm, and by
    keeping track of previous ranges for both sides of the diff and which
    ranges have already been entered once (a change range in a revision's
    diff that hasn't been used in the interdiff is a good candidate for
    something that shifted to/below the boundary).

    If starting on a boundary in Phase 1 (pre-range split), a split will be
    considered if the previous range hasn't yet been entered and there's a
    current range for processing. If the normally-filtered part starts
    right on the boundary, it will be emitted. This will catch the cases
    above. It may result in upstream changes not being filtered out, but it
    does a good job of catching these reorder issues.

    In Phase 2 (starts-in-range validity + cap), a check is then made if the
    op exists either at the boundary of a previously unused range, or
    sometime after it but with a change on the other side of the diff also
    in a gap between ranges. If so, it's emitted as a valid op, capped at
    the start of the next range, and with the remainder queued for
    re-processing.

    This should catch the cases most likely to trigger shifted lines from
    nearby line reorders without the risk of loss or bringing in much of the
    upstream changes (though this is always possible with these heuristics).

    All interdiff unit tests (using real datasets) pass.

    The original repro case is now fixed.

    Built a bunch of hand-crafted and tested diffs that exhibited variations
    on this issue, including them in my testing tool and in the unit tests.
    Verified that they're all showing the desired output.

    Haven't seen any regression with upstream changes, but we'll need more
    real-world testing.

    Commits

    Files