Optimize and improve diff line-related operations, and get_lines().
Review Request #11574 — Created April 5, 2021 and submitted — Latest diff uploaded
We've had a long-standing TODO item for optimizing how line matching
works. We previously had to do a full row scan of all chunks of a diff
in order to find a line, for each line being commented on. This was
slow, and it was often done several times per diff.The logic for the scanning was also repeated in various ways a few
times, and was about to be repeated again.Part of this change redoes this logic, giving us a single function for
iterating through lines, with an optional start line. That start line is
found through a binary search of chunks, and a relative offset into the
matching chunk's lines.The other parts introduce a new method and a fix to an existing method.
The new
get_lines()
can be used to fetch a range of lines from the
original or modified file. This will be useful for the shellcheck tool
in an upcoming change.The
_is_modified()
function has been revised to consider a range of
lines to be modified if any lines within it are modified. Previously,
the entire range had to be considered modified, and this could lead to
tools choosing not to comment on a line because some part of the range
wasn't modified.
All unit tests pass on Python 2.7 and 3.x.
Tested this code along with some new logic coming to shellcheck.