Optimize and improve diff line-related operations, and get_lines().

Review Request #11574 — Created April 5, 2021 and submitted

Information

ReviewBot
release-3.0.x

Reviewers

We've had a long-standing TODO item for optimizing how line matching
works. We previously had to do a full row scan of all chunks of a diff
in order to find a line, for each line being commented on. This was
slow, and it was often done several times per diff.

The logic for the scanning was also repeated in various ways a few
times, and was about to be repeated again.

Part of this change redoes this logic, giving us a single function for
iterating through lines, with an optional start line. That start line is
found through a binary search of chunks, and a relative offset into the
matching chunk's lines.

The other parts introduce a new method and a fix to an existing method.

The new get_lines() can be used to fetch a range of lines from the
original or modified file. This will be useful for the shellcheck tool
in an upcoming change.

The _is_modified() function has been revised to consider a range of
lines to be modified if any lines within it are modified. Previously,
the entire range had to be considered modified, and this could lead to
tools choosing not to comment on a line because some part of the range
wasn't modified.

All unit tests pass on Python 2.7 and 3.x.

Tested this code along with some new logic coming to shellcheck.

Summary ID
Optimize and improve diff line-related operations, and get_lines().
We've had a long-standing TODO item for optimizing how line matching works. We previously had to do a full row scan of all chunks of a diff in order to find a line, for each line being commented on. This was slow, and it was often done several times per diff. The logic for the scanning was also repeated in various ways a few times, and was about to be repeated again. Part of this change redoes this logic, giving us a single function for iterating through lines, with an optional start line. That start line is found through a binary search of chunks, and a relative offset into the matching chunk's lines. The other parts introduce a new method and a fix to an existing method. The new `get_lines()` can be used to fetch a range of lines from the original or modified file. This will be useful for the shellcheck tool in an upcoming change. The `_is_modified()` function has been revised to consider a range of lines to be modified if any lines within it are modified. Previously, the entire range had to be considered modified, and this could lead to tools choosing not to comment on a line because some part of the range wasn't modified.
fb90c8b886eb8d8b37a4f050b3178d1a774d100a
Description From Last Updated

F401 'bisect' imported but unused

reviewbotreviewbot
Checks run (1 failed, 1 succeeded)
flake8 failed.
JSHint passed.

flake8

david
  1. Ship It!
  2. 
      
chipx86
Review request changed
Status:
Completed
Change Summary:
Pushed to release-3.0.x (fb3a33d)