Summary

Optimize diff processing for Subversion.

Review Request #12640 — Created Sept. 25, 2022 and submitted Sept. 30, 2022, 6:17 p.m.

Information

Owner

chipx86

Repository

RBTools

Branch

release-4.x

Bugs

Depends On

Reviewers

Groups

rbtools

People

Description

When generating Subversion diffs, RBTools does a lot of diff processing
to:

Ensure that added/deleted empty files are present
Ensure renamed files have the right information
Convert relative paths to absolute paths
Filter for any excluded files

Each step of this would iterate over the previous diff and then generate
a new one. If the diff had, say, 10,000 lines, we'd parse and then
re-build those 10,000 lines in almost every one of those stages, before
finally returning the result. This was slow.

This change converts each processing stage to a generator, allowing one
pass through the diff. Each stage will iterate through and yield lines
for the next stage, the result of which is iteratively joined into a
final byte string.

The stages themselves have been optimized a bit. We used to perform
repeated lookups on SVNClient for the same attributes (compiled
regexes and constants) on each loop. Those are now pulled out of the
loops.

A potential bug was also found and fixed that could have led to an
infinite loop during diff generation when processing empty files. If a
very specific (unlikely) set of conditions were met (deleted empty file,
revision values of 0 for the diff operation, and no Revision found for
the file in svn info), we'd repeate the loop, but with no advancement.
That'd cause the same condition to be hit over and over. We now advance
the iterator.

Testing Done

Unit tests pass.

Commits

Summary	ID
Optimize diff processing for Subversion. When generating Subversion diffs, RBTools does a lot of diff processing to: 1. Ensure that added/deleted empty files are present 2. Ensure renamed files have the right information 3. Convert relative paths to absolute paths 4. Filter for any excluded files Each step of this would iterate over the previous diff and then generate a new one. If the diff had, say, 10,000 lines, we'd parse and then re-build those 10,000 lines in almost every one of those stages, before finally returning the result. This was slow. This change converts each processing stage to a generator, allowing one pass through the diff. Each stage will iterate through and yield lines for the next stage, the result of which is iteratively joined into a final byte string. The stages themselves have been optimized a bit. We used to perform repeated lookups on `SVNClient` for the same attributes (compiled regexes and constants) on each loop. Those are now pulled out of the loops. A potential bug was also found and fixed that could have led to an infinite loop during diff generation when processing empty files. If a very specific (unlikely) set of conditions were met (deleted empty file, revision values of 0 for the diff operation, and no `Revision` found for the file in `svn info`), we'd repeate the loop, but with no advancement. That'd cause the same condition to be hit over and over. We now advance the iterator.	82872b7df55c81c4524a67a696c3cdd8a620fdd0

Summary

Optimize diff processing for Subversion.

When generating Subversion diffs, RBTools does a lot of diff processing to: 1. Ensure that added/deleted empty files are present 2. Ensure renamed files have the right information 3. Convert relative paths to absolute paths 4. Filter for any excluded files Each step of this would iterate over the previous diff and then generate a new one. If the diff had, say, 10,000 lines, we'd parse and then re-build those 10,000 lines in almost every one of those stages, before finally returning the result. This was slow. This change converts each processing stage to a generator, allowing one pass through the diff. Each stage will iterate through and yield lines for the next stage, the result of which is iteratively joined into a final byte string. The stages themselves have been optimized a bit. We used to perform repeated lookups on `SVNClient` for the same attributes (compiled regexes and constants) on each loop. Those are now pulled out of the loops. A potential bug was also found and fixed that could have led to an infinite loop during diff generation when processing empty files. If a very specific (unlikely) set of conditions were met (deleted empty file, revision values of 0 for the diff operation, and no `Revision` found for the file in `svn info`), we'd repeate the loop, but with no advancement. That'd cause the same condition to be hit over and over. We now advance the iterator.

82872b7df55c81c4524a67a696c3cdd8a620fdd0

Issues

Description	From	Last Updated
'typing.Iterable' imported but unused Column: 1 Error code: F401	reviewbot	Sept. 25, 2022, 2 p.m.
'typing.Iterable' imported but unused Column: 1 Error code: F401	reviewbot	Sept. 25, 2022, 2 p.m.
This needs the type.	david	Sept. 28, 2022, 1 a.m.
This needs the type.	david	Sept. 28, 2022, 1 a.m.

flake8 failed.

JSHint passed.

flake8

rbtools/clients/svn.py (Diff revision 1)
The issue has been dropped. Show all issues
```
'typing.Iterable' imported but unused

Column: 1
Error code: F401
```

Change Summary:

Fixed an issue that could occur with garbage data in the empty file processor.

Commits:

	Summary	ID
	Optimize diff processing for Subversion. When generating Subversion diffs, RBTools does a lot of diff processing to: 1. Ensure that added/deleted empty files are present 2. Ensure renamed files have the right information 3. Convert relative paths to absolute paths 4. Filter for any excluded files Each step of this would iterate over the previous diff and then generate a new one. If the diff had, say, 10,000 lines, we'd parse and then re-build those 10,000 lines in almost every one of those stages, before finally returning the result. This was slow. This change converts each processing stage to a generator, allowing one pass through the diff. Each stage will iterate through and yield lines for the next stage, the result of which is iteratively joined into a final byte string. The stages themselves have been optimized a bit. We used to perform repeated lookups on `SVNClient` for the same attributes (compiled regexes and constants) on each loop. Those are now pulled out of the loops. A potential bug was also found and fixed that could have led to an infinite loop during diff generation when processing empty files. If a very specific (unlikely) set of conditions were met (deleted empty file, revision values of 0 for the diff operation, and no `Revision` found for the file in `svn info`), we'd repeate the loop, but with no advancement. That'd cause the same condition to be hit over and over. We now advance the iterator.	8a9fe1bc0095c50f45cfd225efd03f30e2982143
	Optimize diff processing for Subversion. When generating Subversion diffs, RBTools does a lot of diff processing to: 1. Ensure that added/deleted empty files are present 2. Ensure renamed files have the right information 3. Convert relative paths to absolute paths 4. Filter for any excluded files Each step of this would iterate over the previous diff and then generate a new one. If the diff had, say, 10,000 lines, we'd parse and then re-build those 10,000 lines in almost every one of those stages, before finally returning the result. This was slow. This change converts each processing stage to a generator, allowing one pass through the diff. Each stage will iterate through and yield lines for the next stage, the result of which is iteratively joined into a final byte string. The stages themselves have been optimized a bit. We used to perform repeated lookups on `SVNClient` for the same attributes (compiled regexes and constants) on each loop. Those are now pulled out of the loops. A potential bug was also found and fixed that could have led to an infinite loop during diff generation when processing empty files. If a very specific (unlikely) set of conditions were met (deleted empty file, revision values of 0 for the diff operation, and no `Revision` found for the file in `svn info`), we'd repeate the loop, but with no advancement. That'd cause the same condition to be hit over and over. We now advance the iterator.	3f8f0c3abce3aa475d4da34feac65f9b427bcb36

Diff:

Revision 2 (+174 -174)

Show changes

rbtools/clients/svn.py

Checks run (1 failed, 1 succeeded)

flake8 failed.

JSHint passed.

flake8

rbtools/clients/svn.py (Diff revision 2)
The issue has been resolved. Show all issues
```
'typing.Iterable' imported but unused

Column: 1
Error code: F401
```

rbtools/clients/svn.py (Diff revision 2)
The issue has been resolved. Show all issues
```
This needs the type.
```
rbtools/clients/svn.py (Diff revision 2)
The issue has been resolved. Show all issues
```
This needs the type.
```

Change Summary:

Added missing type documentation.

Commits:

	Summary	ID
	Optimize diff processing for Subversion. When generating Subversion diffs, RBTools does a lot of diff processing to: 1. Ensure that added/deleted empty files are present 2. Ensure renamed files have the right information 3. Convert relative paths to absolute paths 4. Filter for any excluded files Each step of this would iterate over the previous diff and then generate a new one. If the diff had, say, 10,000 lines, we'd parse and then re-build those 10,000 lines in almost every one of those stages, before finally returning the result. This was slow. This change converts each processing stage to a generator, allowing one pass through the diff. Each stage will iterate through and yield lines for the next stage, the result of which is iteratively joined into a final byte string. The stages themselves have been optimized a bit. We used to perform repeated lookups on `SVNClient` for the same attributes (compiled regexes and constants) on each loop. Those are now pulled out of the loops. A potential bug was also found and fixed that could have led to an infinite loop during diff generation when processing empty files. If a very specific (unlikely) set of conditions were met (deleted empty file, revision values of 0 for the diff operation, and no `Revision` found for the file in `svn info`), we'd repeate the loop, but with no advancement. That'd cause the same condition to be hit over and over. We now advance the iterator.	3f8f0c3abce3aa475d4da34feac65f9b427bcb36
	Optimize diff processing for Subversion. When generating Subversion diffs, RBTools does a lot of diff processing to: 1. Ensure that added/deleted empty files are present 2. Ensure renamed files have the right information 3. Convert relative paths to absolute paths 4. Filter for any excluded files Each step of this would iterate over the previous diff and then generate a new one. If the diff had, say, 10,000 lines, we'd parse and then re-build those 10,000 lines in almost every one of those stages, before finally returning the result. This was slow. This change converts each processing stage to a generator, allowing one pass through the diff. Each stage will iterate through and yield lines for the next stage, the result of which is iteratively joined into a final byte string. The stages themselves have been optimized a bit. We used to perform repeated lookups on `SVNClient` for the same attributes (compiled regexes and constants) on each loop. Those are now pulled out of the loops. A potential bug was also found and fixed that could have led to an infinite loop during diff generation when processing empty files. If a very specific (unlikely) set of conditions were met (deleted empty file, revision values of 0 for the diff operation, and no `Revision` found for the file in `svn info`), we'd repeate the loop, but with no advancement. That'd cause the same condition to be hit over and over. We now advance the iterator.	82872b7df55c81c4524a67a696c3cdd8a620fdd0

Diff:

Revision 3 (+178 -174)

Show changes

rbtools/clients/svn.py

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

Ship it!

```
Ship It!
```

Ship it!

```
Ship It!
```

Status:: Completed
Change Summary:: Pushed to release-4.x (fbf07f4)