Summary

Greatly improve performance of the diff parser.

Review Request #8880 — Created April 6, 2017 and submitted April 17, 2017, 6:41 p.m.

Information

Owner

chipx86

Repository

Review Board

Branch

release-2.5.x

Bugs

Depends On

Commit

d8c19be...

Reviewers

Groups

reviewboard

People

Description

The diff parser had some major performance problems that didn't really
affect standard diffs, but were absolutely noticeable for multi-megabyte
diffs.

The primary performance problems had to do with the building of the diff
data strings. We were concatenating strings, which is not fast or
memory-efficient. To solve this, a breaking change to the diff parser
API had to be made (which is unlikely to affect too many people out
there). We no longer work with the data string, but instead call methods
to prepend or append data to the diff's ParsedDiffFile object (renamed
from File, while we're breaking stuff anyway -- further changes will be
coming in for parsing/attribute/doc improvements). These methods in turn
work on a cStringIO, which is a much faster way of performing the string
building.

Along with this, there were many places where we did conditional lookups
or range checks to determine whether to process one or more lines of
content. These checks were preventing us from unintentionally indexing
past the end of the diff. Since in each case it's more likely than not
that the current line will be somewhere before the end of the diff, the
parser now attempts the index unconditionally, and handles alternative
logic if hitting an IndexError.

Prior to these speedups, a 13MB diff took so long to parse that I gave
up trying to time it. After, it took a few seconds. This should
certainly help lighten the load on busy Review Board servers, and may
even make general uploading of diffs faster in many real-world cases.

Testing Done

All unit tests passed.

Uploaded a very large diff. Saw that it parsed correctly in seconds.

Issues

Description	From	Last Updated
'django.utils.six.moves.range' imported but unused	reviewbot	April 6, 2017, 3:12 a.m.
Change to single quotes?	david	April 11, 2017, 4:29 p.m.
While we're in here can we stop shadowing the builtin file?	david	April 11, 2017, 4:30 p.m.
Change to single quotes?	david	April 11, 2017, 4:30 p.m.
Let's change this variable name too.	david	April 11, 2017, 4:30 p.m.

JSHint passed.

PEP8 Style Checker passed.

Pyflakes failed.

Pyflakes

reviewboard/diffviewer/parser.py (Diff revision 1)
The issue has been resolved. Show all issues
```
 'django.utils.six.moves.range' imported but unused
```

Change Summary:

Fixed a PyFlakes warning.

Commit:

78d3a9e8647938e9d2be2ceb73f853814487e328

cc2cca72645eef3dca64cabfa1edeb97805810e6

Diff:

Revision 2 (+202 -70)

Show changes

	reviewboard/diffviewer/parser.py
	reviewboard/scmtools/git.py
	reviewboard/scmtools/svn/__init__.py

Checks run (3 succeeded)

JSHint passed.

PEP8 Style Checker passed.

Pyflakes passed.

```
Just a few formatting tweaks.
```

reviewboard/diffviewer/parser.py (Diff revision 2)

The issue has been dropped. Show all issues

Change to single quotes?

chipx86 April 11, 2017, 4:31 p.m.

I have another change that will be altering this line anyway. I'm going to drop this one for now, minimize the damage in this change.

reviewboard/diffviewer/parser.py (Diff revision 2)
The issue has been resolved. Show all issues
```
While we're in here can we stop shadowing the builtin file?
```
reviewboard/diffviewer/parser.py (Diff revision 2)
The issue has been dropped. Show all issues
```
Change to single quotes?
```
1. chipx86 April 11, 2017, 4:31 p.m.
  Same as above.
reviewboard/diffviewer/parser.py (Diff revision 2)
The issue has been resolved. Show all issues
```
Let's change this variable name too.
```

Change Summary:

Renamed the file variables to parsed_file.

Commit:

cc2cca72645eef3dca64cabfa1edeb97805810e6

d8c19bec0a2f31af756b4d2319a5ce996f225781

Diff:

Revision 3 (+226 -94)

Show changes

	reviewboard/diffviewer/parser.py
	reviewboard/scmtools/git.py
	reviewboard/scmtools/svn/__init__.py

Checks run (3 succeeded)

JSHint passed.

PEP8 Style Checker passed.

Pyflakes passed.

Ship it!

```
Ship It!
```

Status:: Completed
Change Summary:: Pushed to release-2.5.x (de3b797)