Summary

Optimize and improve diff line-related operations, and get_lines().

Review Request #11574 — Created April 5, 2021 and submitted June 2, 2021, 6:22 p.m.

Information

Owner

chipx86

Repository

ReviewBot

Branch

release-3.0.x

Bugs

Depends On

Reviewers

Groups

reviewbot

People

Description

We've had a long-standing TODO item for optimizing how line matching

works. We previously had to do a full row scan of all chunks of a diff

in order to find a line, for each line being commented on. This was

slow, and it was often done several times per diff.
The logic for the scanning was also repeated in various ways a few

times, and was about to be repeated again.
Part of this change redoes this logic, giving us a single function for

iterating through lines, with an optional start line. That start line is

found through a binary search of chunks, and a relative offset into the

matching chunk's lines.
The other parts introduce a new method and a fix to an existing method.
The new get_lines() can be used to fetch a range of lines from the

original or modified file. This will be useful for the shellcheck tool

in an upcoming change.
The _is_modified() function has been revised to consider a range of

lines to be modified if any lines within it are modified. Previously,

the entire range had to be considered modified, and this could lead to

tools choosing not to comment on a line because some part of the range

wasn't modified.

Testing Done

All unit tests pass on Python 2.7 and 3.x.

Tested this code along with some new logic coming to shellcheck.

Commits

Summary	ID
Optimize and improve diff line-related operations, and get_lines(). We've had a long-standing TODO item for optimizing how line matching works. We previously had to do a full row scan of all chunks of a diff in order to find a line, for each line being commented on. This was slow, and it was often done several times per diff. The logic for the scanning was also repeated in various ways a few times, and was about to be repeated again. Part of this change redoes this logic, giving us a single function for iterating through lines, with an optional start line. That start line is found through a binary search of chunks, and a relative offset into the matching chunk's lines. The other parts introduce a new method and a fix to an existing method. The new `get_lines()` can be used to fetch a range of lines from the original or modified file. This will be useful for the shellcheck tool in an upcoming change. The `_is_modified()` function has been revised to consider a range of lines to be modified if any lines within it are modified. Previously, the entire range had to be considered modified, and this could lead to tools choosing not to comment on a line because some part of the range wasn't modified.	fb90c8b886eb8d8b37a4f050b3178d1a774d100a

Summary

Optimize and improve diff line-related operations, and get_lines().

We've had a long-standing TODO item for optimizing how line matching works. We previously had to do a full row scan of all chunks of a diff in order to find a line, for each line being commented on. This was slow, and it was often done several times per diff. The logic for the scanning was also repeated in various ways a few times, and was about to be repeated again. Part of this change redoes this logic, giving us a single function for iterating through lines, with an optional start line. That start line is found through a binary search of chunks, and a relative offset into the matching chunk's lines. The other parts introduce a new method and a fix to an existing method. The new `get_lines()` can be used to fetch a range of lines from the original or modified file. This will be useful for the shellcheck tool in an upcoming change. The `_is_modified()` function has been revised to consider a range of lines to be modified if any lines within it are modified. Previously, the entire range had to be considered modified, and this could lead to tools choosing not to comment on a line because some part of the range wasn't modified.

fb90c8b886eb8d8b37a4f050b3178d1a774d100a

Issues

Description	From	Last Updated
F401 'bisect' imported but unused	reviewbot	April 5, 2021, 8:58 p.m.

flake8 failed.

JSHint passed.

flake8

bot/reviewbot/processing/review.py (Diff revision 1)
The issue has been resolved. Show all issues
```
F401 'bisect' imported but unused
```

Ship it!

```
Ship It!
```

Status:: Completed
Change Summary:: Pushed to release-3.0.x (fb3a33d)