Add Patience Differ to diff viewer.

Review Request #11239 — Created Oct. 22, 2020 and updated

Information

Review Board
release-4.0.x

Reviewers

Review Board uses a version of the Meyers Diff algorithm for the diff viewer,
which is pretty standard and widely-used. However, unclear diff might be
reported under some scenarios. For example, when user switched order of a big
chunk of code with each other, meyer diff will see it as a change line by line
instead of being aware that the whole block is switched. Patience differ
tackles issues like these by scanning the whole file and actively seeking out
interesting lines before applying diff while sacrificing some performances.

This is a backend feature and it has not been updated to be the default method
of diffing. To test this feature, you could go to
reviewboard/diffviewer/differ.py, and update line 26 to DEFAULT = PATIENCE.

Unit tests are created in reviewboard/diffviewer/tests/test_patiencediff.py.
You can run the test by
./tests/runtests.py reviewboard.diffviewer.tests.test_patiencediff:PatienceDifferTest.
All unit tests have passed.

Summary ID
Added patience differ as a new option in differ class.
962bbe21c1466137f81987e6e41723ecf3ffdd63
Added patience diff unit tests.
c049933abe62a1cd1b7246fc3a4c20f43bb742a1
Added PatienceDiffer to Diff Viewer.
afb55222047babced031559e8afc462c39be6c20
Fixed issues flagged by flake8
29b81e90a417881d4a89c81ff73fc08addf68024
Description From Last Updated

Can you adjust the Description and Testing Done to be <= 76 columns?

chipx86chipx86

It looks like your changes for the start indexes wound up in here. Can you rebase on top of that …

chipx86chipx86

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

E302 expected 2 blank lines, found 1

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

E501 line too long (84 > 79 characters)

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E501 line too long (107 > 79 characters)

reviewbotreviewbot

F821 undefined name 'get_filenames'

reviewbotreviewbot

F821 undefined name 'read_input'

reviewbotreviewbot

F821 undefined name 'find_match'

reviewbotreviewbot

F821 undefined name 'rm_duplicate'

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E265 block comment should start with '# '

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E305 expected 2 blank lines after class or function definition, found 1

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E303 too many blank lines (3)

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

E302 expected 2 blank lines, found 1

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

E501 line too long (84 > 79 characters)

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E501 line too long (107 > 79 characters)

reviewbotreviewbot

F821 undefined name 'get_filenames'

reviewbotreviewbot

F821 undefined name 'read_input'

reviewbotreviewbot

F821 undefined name 'find_match'

reviewbotreviewbot

F821 undefined name 'rm_duplicate'

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E265 block comment should start with '# '

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E305 expected 2 blank lines after class or function definition, found 1

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E303 too many blank lines (3)

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

E302 expected 2 blank lines, found 1

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

E501 line too long (84 > 79 characters)

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E501 line too long (107 > 79 characters)

reviewbotreviewbot

F821 undefined name 'get_filenames'

reviewbotreviewbot

F821 undefined name 'read_input'

reviewbotreviewbot

F821 undefined name 'find_match'

reviewbotreviewbot

F821 undefined name 'rm_duplicate'

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E265 block comment should start with '# '

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E305 expected 2 blank lines after class or function definition, found 1

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E303 too many blank lines (3)

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

E302 expected 2 blank lines, found 1

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (84 > 79 characters)

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E501 line too long (107 > 79 characters)

reviewbotreviewbot

F821 undefined name 'get_filenames'

reviewbotreviewbot

F821 undefined name 'read_input'

reviewbotreviewbot

F821 undefined name 'find_match'

reviewbotreviewbot

F821 undefined name 'rm_duplicate'

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E265 block comment should start with '# '

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E305 expected 2 blank lines after class or function definition, found 1

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E303 too many blank lines (3)

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

E302 expected 2 blank lines, found 1

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

E501 line too long (84 > 79 characters)

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E501 line too long (107 > 79 characters)

reviewbotreviewbot

F821 undefined name 'get_filenames'

reviewbotreviewbot

F821 undefined name 'read_input'

reviewbotreviewbot

F821 undefined name 'find_match'

reviewbotreviewbot

F821 undefined name 'rm_duplicate'

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E265 block comment should start with '# '

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E305 expected 2 blank lines after class or function definition, found 1

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E303 too many blank lines (3)

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

F401 'reviewboard.diffviewer.differ.DiffCompatVersion' imported but unused

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E303 too many blank lines (2)

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E128 continuation line under-indented for visual indent

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E128 continuation line under-indented for visual indent

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

Can you add a file-level docstring above this? (Blank line separating it from this import.)

chipx86chipx86

No blank line here.

chipx86chipx86

This needs to subclass object. Also, this should probably be outside of the PatienceDiffer class.

chipx86chipx86

The trailing """ can go on the first line.

chipx86chipx86

Blank line between a class-level docstring and the body.

chipx86chipx86

This is missing a docstring describing the arguments.

chipx86chipx86

These would all go into an Attributes: section in the class's docstring (to deal with parsability issues).

chipx86chipx86

"Return whether ..." Same below.

chipx86chipx86

Type goes on its own line. It also needs to be bool (casing matters). Same comments below.

chipx86chipx86

This can be a single statement, since the conditional evaluates to true or false. Same below.

chipx86chipx86

How about: "Return a string representation of this slice."

chipx86chipx86

The type must be unicode

chipx86chipx86

So normally, this should be formatted like: return ( 'A: %s\n' 'B: %s\n' '%s' % ((self.a_low, self.a_high), (self.b_low, self.b_high), '-' …

chipx86chipx86

If this doesn't need to perform any logic of its own, you can omit it.

chipx86chipx86

Summary docstrings cannot wrap. How about: """Return an opcode generator for the contents of the diff.

chipx86chipx86

This should just be tuple, with the contents described in the description.

chipx86chipx86

For readability, can you use keyword arguments? That'll help convey what's going into this class without having to jump up …

chipx86chipx86

There should be a blank line separating a statement from a block on the same indentation level, but not on …

chipx86chipx86

Let's store as a tuple. It'll take less memory and will be faster to operate on, which is preferred here.

chipx86chipx86

"Differ" -> "Diff"

chipx86chipx86

The description must be on the next line, indented 4. Rather than describe the opcode format, you can just say …

chipx86chipx86

Same as above.

chipx86chipx86

Blank line between these.

chipx86chipx86

So ultimately, what we'd want to do is set up a deeper generator pattern for the Myers work and the …

chipx86chipx86

As mentioned above, slice is a reserved word.

chipx86chipx86

"Return all ..."

chipx86chipx86

Description on the next line.

chipx86chipx86

Description gets its own lines.

chipx86chipx86

Because you use range, you'll need to a six.moves import for range. Grep in the codebase for import range and …

chipx86chipx86

Bcause we use self.a and self.b a lot, let's pull these out into local variables to avoid the attribute lookup …

chipx86chipx86

A couple notes here: When we have an if/else, it's a bit easier mentally work with if the first condition …

chipx86chipx86

Comments need to always be proper sentences, so this should have a period.

chipx86chipx86

It doesn't look like we use this for anything else. Can we just reference i?

chipx86chipx86

No blank line here.

chipx86chipx86

Let's avoid all the lookups and instead iterate using six.iteritems, which will give us keys and values.

chipx86chipx86

"If"

chipx86chipx86

This can be one statement.

chipx86chipx86

This must be a single line.

chipx86chipx86

Same comments as above regarding the formatting.

chipx86chipx86

No blank line here.

chipx86chipx86

Rather than separate elif lines with comments separating them, put the comments inside the nested block. That will keep the …

chipx86chipx86

No blank line at the start of a nested block.

chipx86chipx86

Must use sentence casing.

chipx86chipx86

Rather than have to compare to stacks[-1] each time, let's fetch the value of stacks[-1] once and then compare against …

chipx86chipx86

Rather than single-line code, which is harder to reason about and maintain, can you split this up as: prev = …

chipx86chipx86

while True:

chipx86chipx86

This needs sentence casing.

chipx86chipx86

We can just check if not prev here, which will be faster (as it uses a different operator).

chipx86chipx86

No blank line at the start of a nested block.

chipx86chipx86

Same doc comments as above. I'll let you handle the rest :)

chipx86chipx86

Rather than the shorthand, can you set these individually? It's more clear and maintainable that way.

chipx86chipx86

Blank line between these.

chipx86chipx86

Rather than look up match[1] for each iteration, let's pull it out above.

chipx86chipx86

Rather than looking up matched_slices[-1] each time, let's pull it out into a last_matched_slice variable and access that.

chipx86chipx86

Blank line before a statement opening a new block.

chipx86chipx86

Can you use parens? Like: while (not slice.is_empty() and self.a[slice.a_low] == self.b[slice.b_low]): Also, let's fetch self.a and self.b to avoid …

chipx86chipx86

With this conditional, we're returning slice either way. This code can be cleaned up by doing: if old_a_low != slice.a_low: …

chipx86chipx86

This can be one statement.

chipx86chipx86

Much of my comments in match_head apply here as well.

chipx86chipx86

Can you add a file-level docstring?

chipx86chipx86

"PatienceDiffer"

chipx86chipx86

""" goes on the next line.

chipx86chipx86

Can you add a docstring for this?

chipx86chipx86

E127 continuation line over-indented for visual indent

reviewbotreviewbot

E127 continuation line over-indented for visual indent

reviewbotreviewbot

E127 continuation line over-indented for visual indent

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E127 continuation line over-indented for visual indent

reviewbotreviewbot

E127 continuation line over-indented for visual indent

reviewbotreviewbot

E502 the backslash is redundant between brackets

reviewbotreviewbot

E127 continuation line over-indented for visual indent

reviewbotreviewbot
Checks run (1 failed, 1 succeeded)
flake8 failed.
JSHint passed.

flake8

bnie
Review request changed
Change Summary:

Updated patiencediff to call myers with no errors, added new test cases to check if the correct result is returned, so far the tests do not pass.

Commits:
Summary ID
[WIP] Add Patience Differ to diff viewer.
c9e2112bea9ba3e65aa103cfb8f5737831c276a2
Check in work done so far that will not be part of Review Board:
Added patience diff folder with jupyter notebook and some sample files insidefor testing.
971b64d11610de99cc6b4ea4aa291dce93b19487
[WIP] Add Patience Differ to diff viewer.
c9e2112bea9ba3e65aa103cfb8f5737831c276a2
Check in work done so far that will not be part of Review Board:
Added patience diff folder with jupyter notebook and some sample files insidefor testing.
ffeb1e9454cd67c12c71a1ea9449d4d6e180c4b4
Added two more parameter to indicate start index of a and b file, and modified
unit tests accordingly.
eeab960675da6c2ae7eea8f6c84059cfbec09660
Updated patiencediff to call myers with no errors, added new test cases to check if the correct result is returned, so far the tests do not pass.
dbef3c32f854dfedf8de19922edaf1b46f5435dc

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

bnie
Review request changed
Commits:
Summary ID
[WIP] Add Patience Differ to diff viewer.
c9e2112bea9ba3e65aa103cfb8f5737831c276a2
Check in work done so far that will not be part of Review Board:
Added patience diff folder with jupyter notebook and some sample files insidefor testing.
ffeb1e9454cd67c12c71a1ea9449d4d6e180c4b4
Added two more parameter to indicate start index of a and b file, and modified
unit tests accordingly.
eeab960675da6c2ae7eea8f6c84059cfbec09660
Updated patiencediff to call myers with no errors, added new test cases to check if the correct result is returned, so far the tests do not pass.
dbef3c32f854dfedf8de19922edaf1b46f5435dc
[WIP] Add Patience Differ to diff viewer.
c9e2112bea9ba3e65aa103cfb8f5737831c276a2
Check in work done so far that will not be part of Review Board:
Added patience diff folder with jupyter notebook and some sample files insidefor testing.
ffeb1e9454cd67c12c71a1ea9449d4d6e180c4b4
Added two more parameter to indicate start index of a and b file, and modified
unit tests accordingly.
eeab960675da6c2ae7eea8f6c84059cfbec09660
Updated patiencediff to call myers with no errors, added new test cases to check if the correct result is returned, so far the tests do not pass.
dbef3c32f854dfedf8de19922edaf1b46f5435dc
Update patience differ test cases and added patience diff as one new differ option in differ class.
3a8a96d0518478d597cc1d1b76fb6252fa4fa4ba
Removed unecessary tabs and spaces in files
36b33edda96fe5a6e6ff6851dd0fe4c760b71407

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

bnie
Review request changed
Commits:
Summary ID
[WIP] Add Patience Differ to diff viewer.
c9e2112bea9ba3e65aa103cfb8f5737831c276a2
Check in work done so far that will not be part of Review Board:
Added patience diff folder with jupyter notebook and some sample files insidefor testing.
ffeb1e9454cd67c12c71a1ea9449d4d6e180c4b4
Added two more parameter to indicate start index of a and b file, and modified
unit tests accordingly.
eeab960675da6c2ae7eea8f6c84059cfbec09660
Updated patiencediff to call myers with no errors, added new test cases to check if the correct result is returned, so far the tests do not pass.
dbef3c32f854dfedf8de19922edaf1b46f5435dc
Update patience differ test cases and added patience diff as one new differ option in differ class.
3a8a96d0518478d597cc1d1b76fb6252fa4fa4ba
Removed unecessary tabs and spaces in files
36b33edda96fe5a6e6ff6851dd0fe4c760b71407
[WIP] Add Patience Differ to diff viewer.
c9e2112bea9ba3e65aa103cfb8f5737831c276a2
Check in work done so far that will not be part of Review Board:
Added patience diff folder with jupyter notebook and some sample files insidefor testing.
ffeb1e9454cd67c12c71a1ea9449d4d6e180c4b4
Added two more parameter to indicate start index of a and b file, and modified
unit tests accordingly.
eeab960675da6c2ae7eea8f6c84059cfbec09660
Updated patiencediff to call myers with no errors, added new test cases to check if the correct result is returned, so far the tests do not pass.
dbef3c32f854dfedf8de19922edaf1b46f5435dc
Update patience differ test cases and added patience diff as one new differ option in differ class.
3a8a96d0518478d597cc1d1b76fb6252fa4fa4ba
Removed unecessary tabs and spaces in files
36b33edda96fe5a6e6ff6851dd0fe4c760b71407
Updated PatienceDiffer tests and added new functions to pre-process code blocks
before generating opcodes
b009817dbb7bd7e9b36438eb48adfb9cb2eb8a4f

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

bnie
Review request changed
Commits:
Summary ID
[WIP] Add Patience Differ to diff viewer.
c9e2112bea9ba3e65aa103cfb8f5737831c276a2
Check in work done so far that will not be part of Review Board:
Added patience diff folder with jupyter notebook and some sample files insidefor testing.
ffeb1e9454cd67c12c71a1ea9449d4d6e180c4b4
Added two more parameter to indicate start index of a and b file, and modified
unit tests accordingly.
eeab960675da6c2ae7eea8f6c84059cfbec09660
Updated patiencediff to call myers with no errors, added new test cases to check if the correct result is returned, so far the tests do not pass.
dbef3c32f854dfedf8de19922edaf1b46f5435dc
Update patience differ test cases and added patience diff as one new differ option in differ class.
3a8a96d0518478d597cc1d1b76fb6252fa4fa4ba
Removed unecessary tabs and spaces in files
36b33edda96fe5a6e6ff6851dd0fe4c760b71407
Updated PatienceDiffer tests and added new functions to pre-process code blocks
before generating opcodes
b009817dbb7bd7e9b36438eb48adfb9cb2eb8a4f
[WIP] Add Patience Differ to diff viewer.
c9e2112bea9ba3e65aa103cfb8f5737831c276a2
Check in work done so far that will not be part of Review Board:
Added patience diff folder with jupyter notebook and some sample files insidefor testing.
ffeb1e9454cd67c12c71a1ea9449d4d6e180c4b4
Added two more parameter to indicate start index of a and b file, and modified
unit tests accordingly.
eeab960675da6c2ae7eea8f6c84059cfbec09660
Updated patiencediff to call myers with no errors, added new test cases to check if the correct result is returned, so far the tests do not pass.
dbef3c32f854dfedf8de19922edaf1b46f5435dc
Update patience differ test cases and added patience diff as one new differ option in differ class.
3a8a96d0518478d597cc1d1b76fb6252fa4fa4ba
Removed unecessary tabs and spaces in files
36b33edda96fe5a6e6ff6851dd0fe4c760b71407
Updated PatienceDiffer tests and added new functions to pre-process code blocks
before generating opcodes
b009817dbb7bd7e9b36438eb48adfb9cb2eb8a4f
Check in final version with all bug fixed. Need to polish comments.
Signed-off-by: Bruce Nie <ynie@ualberta.ca>
877e67a8673197202c367e02a5855b0e587f5906

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

bnie
Review request changed
Change Summary:

Polished PatienceDiffer with proper documentation and formatting, removed WIP tag.

Description:
~  

[WIP] Add Patience Differ to diff viewer.

  ~

Review Board uses a version of the Meyers Diff algorithm for the diff viewer, which is pretty standard and widely-used. However, unclear diff might be reported under some scenarios. For example, when user switched order of a big chunk of code with each other, meyer diff will see it as a change line by line instead of being aware that the whole block is switched. Patience differ tackles issues like these by scanning the whole file and actively seeking out interesting lines before applying diff while sacrificing some performances.

  +
  +

This is a backend feature and it has not been updated to be the default method of diffing. To test this feature, you could go to reviewboard/diffviewer/differ.py, and update line 26 to DEFAULT = PATIENCE.

Testing Done:
~  

Review Board uses a version of the Meyers Diff algorithm for the diff viewer, which is pretty standard and widely-used. However, unclear diff might be reported under some scenarios. For example, when user switched order of a big chunk of code with each other, meyer diff will see it as a change line by line instead of being aware that the whole block is switched. Patience differ tackles issues like these by scanning the whole file and actively seeking out interesting lines before applying diff while sacrificing some performances.

~  
~  

This is a WIP request with all of the work I've done so far checked in, including two jupyter notebooks (patience_diff_v2 is the most updated one), some sample files for diff tests, and a py file containing my attempt to integrate patiencediff into RB.

  ~

Unit tests are created in reviewboard/diffviewer/tests/test_patiencediff.py.

  ~ You can run the test by ./tests/runtests.py reviewboard.diffviewer.tests.test_patiencediff:PatienceDifferTest.
  ~ All unit tests have passed.

Commits:
Summary ID
[WIP] Add Patience Differ to diff viewer.
c9e2112bea9ba3e65aa103cfb8f5737831c276a2
Check in work done so far that will not be part of Review Board:
Added patience diff folder with jupyter notebook and some sample files insidefor testing.
ffeb1e9454cd67c12c71a1ea9449d4d6e180c4b4
Added two more parameter to indicate start index of a and b file, and modified
unit tests accordingly.
eeab960675da6c2ae7eea8f6c84059cfbec09660
Updated patiencediff to call myers with no errors, added new test cases to check if the correct result is returned, so far the tests do not pass.
dbef3c32f854dfedf8de19922edaf1b46f5435dc
Update patience differ test cases and added patience diff as one new differ option in differ class.
3a8a96d0518478d597cc1d1b76fb6252fa4fa4ba
Removed unecessary tabs and spaces in files
36b33edda96fe5a6e6ff6851dd0fe4c760b71407
Updated PatienceDiffer tests and added new functions to pre-process code blocks
before generating opcodes
b009817dbb7bd7e9b36438eb48adfb9cb2eb8a4f
Check in final version with all bug fixed. Need to polish comments.
Signed-off-by: Bruce Nie <ynie@ualberta.ca>
877e67a8673197202c367e02a5855b0e587f5906
Added two more parameter to indicate start index of a and b file, and modified
unit tests accordingly.
14cfcd3b1e28ef1093b7248d898df9b25d94845f
Added patience differ as a new option in differ class.
962bbe21c1466137f81987e6e41723ecf3ffdd63
Added patience diff unit tests.
c049933abe62a1cd1b7246fc3a4c20f43bb742a1
Added PatienceDiffer to Diff Viewer.
afb55222047babced031559e8afc462c39be6c20

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

bnie
Review request changed
Summary:
[WIP] Add Patience Differ to diff viewer.
Add Patience Differ to diff viewer.
Commits:
Summary ID
Added two more parameter to indicate start index of a and b file, and modified
unit tests accordingly.
14cfcd3b1e28ef1093b7248d898df9b25d94845f
Added patience differ as a new option in differ class.
962bbe21c1466137f81987e6e41723ecf3ffdd63
Added patience diff unit tests.
c049933abe62a1cd1b7246fc3a4c20f43bb742a1
Added PatienceDiffer to Diff Viewer.
afb55222047babced031559e8afc462c39be6c20
Added two more parameter to indicate start index of a and b file, and modified
unit tests accordingly.
14cfcd3b1e28ef1093b7248d898df9b25d94845f
Added patience differ as a new option in differ class.
962bbe21c1466137f81987e6e41723ecf3ffdd63
Added patience diff unit tests.
c049933abe62a1cd1b7246fc3a4c20f43bb742a1
Added PatienceDiffer to Diff Viewer.
afb55222047babced031559e8afc462c39be6c20
Fixed issues flagged by flake8
2829587371d814581d92dc7a3a4b979b8906a290
Groups:

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

bnie
chipx86
  1. This is exciting! I want to play with it.

    There's a lot of comments. Some are repeats of previous ones. There's really two categories of comments here:

    1. Code/doc guideline inconsistencies
    2. Performance improvements (small things that add up in diff logic, like repeated attribute accesses or lookups by index)
  2. Show all issues

    Can you adjust the Description and Testing Done to be <= 76 columns?

  3. Show all issues

    It looks like your changes for the start indexes wound up in here. Can you rebase on top of that and post the range for these commits only?

  4. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    Can you add a file-level docstring above this? (Blank line separating it from this import.)

  5. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
    Show all issues

    No blank line here.

  6. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    This needs to subclass object.

    Also, this should probably be outside of the PatienceDiffer class.

    1. In this case, I left the Slice class inside because it defines a specific structure patience differ uses to portion code blocks. Should I keep this inside of Patience Differ then? What's a good way of telling if I should leave a class inside or outside of another class?

    2. The patience differ owns this entire file, so there's no need to nest the class in further. Anything in this file is related to the diff algorithm.

      Having the class outside of it helps with maintainability. It's not nested as deep and it reduces the footprint of PatienceDiffer, which are two wins for that. Generally, there's rarely a good reason to nest classes in Python, and once you begin to do that to keep a class more clearly bound to another class, it's probably time to move both into a new module.

  7. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
    Show all issues

    The trailing """ can go on the first line.

  8. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
    Show all issues

    Blank line between a class-level docstring and the body.

  9. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    This is missing a docstring describing the arguments.

  10. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     
     
     
     
     
     
     
     
    Show all issues

    These would all go into an Attributes: section in the class's docstring (to deal with parsability issues).

  11. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    "Return whether ..."

    Same below.

  12. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    Type goes on its own line.

    It also needs to be bool (casing matters).

    Same comments below.

  13. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     
    Show all issues

    This can be a single statement, since the conditional evaluates to true or false.

    Same below.

  14. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    How about: "Return a string representation of this slice."

  15. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    The type must be unicode

  16. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
    Show all issues

    So normally, this should be formatted like:

    return (
      'A: %s\n'
      'B: %s\n'
      '%s'
      % ((self.a_low, self.a_high),
         (self.b_low, self.b_high),
         '-' * 79)
    )
    

    That's faster and separates out the structure of the string from the variables going into it.

    Now that said, this feels really debuggy to me, and not something that ultimately needs to remain in here.

  17. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
    Show all issues

    If this doesn't need to perform any logic of its own, you can omit it.

  18. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
    Show all issues

    Summary docstrings cannot wrap.

    How about:

    """Return an opcode generator for the contents of the diff.
    
  19. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    This should just be tuple, with the contents described in the description.

  20. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    For readability, can you use keyword arguments? That'll help convey what's going into this class without having to jump up to the definition.

  21. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     
    Show all issues

    There should be a blank line separating a statement from a block on the same indentation level, but not on a new nested block. So this should look like:

    opcodes = []
    
    for slice in matched_slices:
        ...
    

    Also of note: slice is a reserved word in Python, so you'll need to rethink the variable.

  22. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    Let's store as a tuple. It'll take less memory and will be faster to operate on, which is preferred here.

    1. I am changing the value of the elements in this list in merge_opcodes():

      current_opcode = opcodes[i]
                      while i + 1 != op_size and opcodes[i][0] == opcodes[i + 1][0]:
                          current_opcode[2] = opcodes[i + 1][2]
                          current_opcode[4] = opcodes[i + 1][4]
                          i += 1
      

      If we change the opcode to a tuple, we have to convert it back here.
      But we only need to convert probably 60% of all the opcodes.
      I am not sure which way will be more efficient:
      1. Append as a tuple, convert back 60% later.
      2. Append as a list.

    2. It's technically faster to keep it a tuple. They're more compact behind the scenes, easy to rebuild, and don't require some of the operator-based invocations needed with setting list indexes.

      As proof:

      $ python3 -m timeit 'a = ["insert", 2, 3, 4, 5]; a[1] = 3; a[4] = 5;'
      10000000 loops, best of 3: 0.152 usec per loop
      
      $ python3 -m timeit 'a = ("insert", 2, 3, 4, 5); a = (a[0], 3, a[2], a[3], 5)'
      10000000 loops, best of 3: 0.0881 usec per loop
      
  23. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    "Differ" -> "Diff"

  24. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
    Show all issues

    The description must be on the next line, indented 4.

    Rather than describe the opcode format, you can just say that each entry is a standard opcode tuple. For diff viewer work, this will be a standard format.

  25. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
    Show all issues

    Same as above.

  26. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
    Show all issues

    Blank line between these.

  27. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    Show all issues

    So ultimately, what we'd want to do is set up a deeper generator pattern for the Myers work and the merging, so that we don't have to take in opcodes up-front, iterate through, store, and then re-iterate through for merging (and then again for yielding).

    Consider the following a future TODO, since we're at the end of the semester, for either yourself if you're interested or for us.

    The approach I'd use would be to iterate through the opcodes, one-by-one, storing the previous opcode. On each opcode, check if it should be merged into the previous. If yes, merge. If no, yield the previous, and reset merge state for the current one. So something like:

    last_opcode = next(opcodes)
    
    for opcode in opcodes:
        if opcode[0] == last_opcode[0]:
            last_opcode = (tag,
                           last_opcode[1],
                           last_opcode[2] + opcode[2],
                           last_opcode[4] + opcode[4])
        else:
            yield last_opcode
            last_opcode = opcode
    
    yield last_opcode
    

    Untested, but something like that.

    What you can then do is have get_opcodes() call a function that loops through the slices and does the Myers work (yielding results from its own get_opcodes(), rather than storing), and pass that result directly into merge_opcodes.

    That then sets up the fastest route possible for getting any opcodes from Myers work into the merge and to the caller, with no storage (beyond the last_opcode).

  28. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    As mentioned above, slice is a reserved word.

  29. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    "Return all ..."

  30. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    Description on the next line.

  31. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
    Show all issues

    Description gets its own lines.

  32. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    Because you use range, you'll need to a six.moves import for range. Grep in the codebase for import range and you'll find it.

    The reason for this is that Python 2's range builds and returns a list, whereas Python 3 uses a generator. We want to be consistent and fast.

  33. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    Bcause we use self.a and self.b a lot, let's pull these out into local variables to avoid the attribute lookup on each use.

    Similarly, since we use a[i] repeatedly, pull that out as well.

    This helps keep diff logic fast.

    Same goes for b below.

  34. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
    Show all issues

    A couple notes here:

    1. When we have an if/else, it's a bit easier mentally work with if the first condition is the positive. So, if .. in would be better than if .. not in.
    2. We probably don't need position. We can reference i instead.
  35. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    Comments need to always be proper sentences, so this should have a period.

  36. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    It doesn't look like we use this for anything else. Can we just reference i?

  37. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
    Show all issues

    No blank line here.

  38. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     
     
     
    Show all issues

    Let's avoid all the lookups and instead iterate using six.iteritems, which will give us keys and values.

  39. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    "If"

  40. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
    Show all issues

    This can be one statement.

  41. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
    Show all issues

    This must be a single line.

  42. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     
     
     
     
     
     
    Show all issues

    Same comments as above regarding the formatting.

  43. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
    Show all issues

    No blank line here.

  44. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    Show all issues

    Rather than separate elif lines with comments separating them, put the comments inside the nested block. That will keep the conditionals clearly together, and provide an opportunity to better describe each of the conditions.

  45. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
    Show all issues

    No blank line at the start of a nested block.

  46. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    Must use sentence casing.

  47. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
    Show all issues

    Rather than have to compare to stacks[-1] each time, let's fetch the value of stacks[-1] once and then compare against that.

    Also, do we need to continue if we find a match?

  48. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    Rather than single-line code, which is harder to reason about and maintain, can you split this up as:

    prev = [
        match
        for match in directed_matches
        if match[0] == last[1][0]
    ]
    

    Also, last[1][0] has to be fetched every single time, so let's pull that out before we do this, and compare against that value.

    Same comments apply below.

    1. The [0] is actually outside of that list comprehension expression:

      prev = [
          match
          for match in directed_matches
          if match[0] == last[1]
      ][0]
      

      What's the best way for formatting if this is the case?

    2. Ah, see, all the more reason to break that into multiple lines :)

      Since we only want the first, we don't actually want to buid up a whole list of items. A better approach would be:

      prev = next(
          match
          for match in directed_matches
          if match[0] == last[1]
      )
      

      By using a generator expression instead of a list comprehension, we'll only execute until the first result.

      Now, if it's at all possible for this to not find a match, it'll be important to try/except this, catching a StopIteration.

  49. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    while True:

  50. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    This needs sentence casing.

  51. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    We can just check if not prev here, which will be faster (as it uses a different operator).

  52. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
    Show all issues

    No blank line at the start of a nested block.

  53. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     
     
     
     
     
     
     
     
    Show all issues

    Same doc comments as above.

    I'll let you handle the rest :)

  54. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    Rather than the shorthand, can you set these individually? It's more clear and maintainable that way.

  55. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
    Show all issues

    Blank line between these.

  56. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    Rather than look up match[1] for each iteration, let's pull it out above.

  57. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     
    Show all issues

    Rather than looking up matched_slices[-1] each time, let's pull it out into a last_matched_slice variable and access that.

  58. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
    Show all issues

    Blank line before a statement opening a new block.

  59. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
    Show all issues

    Can you use parens? Like:

    while (not slice.is_empty() and
           self.a[slice.a_low] == self.b[slice.b_low]):
    

    Also, let's fetch self.a and self.b to avoid the attribute lookups each iteration.

  60. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     
     
     
     
     
    Show all issues

    With this conditional, we're returning slice either way. This code can be cleaned up by doing:

    if old_a_low != slice.a_low:
        matched_slices.append(Slice(...))
    
    return slice
    
  61. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
    Show all issues

    This can be one statement.

  62. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
    Show all issues

    Much of my comments in match_head apply here as well.

  63. Show all issues

    Can you add a file-level docstring?

  64. Show all issues

    "PatienceDiffer"

  65. Show all issues

    """ goes on the next line.

  66. Show all issues

    Can you add a docstring for this?

  67. 
      
bnie
keanweng
  1. 
      
  2. Didn't see any issues other than the ones Christian mentioned. Checked out the patch on my local machine and all 7 unit tests passed.
    I like how you included the shortcut for testing your unit tests, made it alot easier.

  3. 
      
bnie
Review request changed
Change Summary:

Addressed comment from Christian.

Commits:
Summary ID
Added two more parameter to indicate start index of a and b file, and modified
unit tests accordingly.
14cfcd3b1e28ef1093b7248d898df9b25d94845f
Added patience differ as a new option in differ class.
962bbe21c1466137f81987e6e41723ecf3ffdd63
Added patience diff unit tests.
c049933abe62a1cd1b7246fc3a4c20f43bb742a1
Added PatienceDiffer to Diff Viewer.
afb55222047babced031559e8afc462c39be6c20
Fixed issues flagged by flake8
3b4f77a94a6d97245c18d0732dd94829ee6c37a0
Added patience differ as a new option in differ class.
962bbe21c1466137f81987e6e41723ecf3ffdd63
Added patience diff unit tests.
c049933abe62a1cd1b7246fc3a4c20f43bb742a1
Added PatienceDiffer to Diff Viewer.
afb55222047babced031559e8afc462c39be6c20
Fixed issues flagged by flake8
9d97f16613ef51587e9090844c36fab042fa499a

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

bnie
Review request changed
Commits:
Summary ID
Added patience differ as a new option in differ class.
962bbe21c1466137f81987e6e41723ecf3ffdd63
Added patience diff unit tests.
c049933abe62a1cd1b7246fc3a4c20f43bb742a1
Added PatienceDiffer to Diff Viewer.
afb55222047babced031559e8afc462c39be6c20
Fixed issues flagged by flake8
9d97f16613ef51587e9090844c36fab042fa499a
Added patience differ as a new option in differ class.
962bbe21c1466137f81987e6e41723ecf3ffdd63
Added patience diff unit tests.
c049933abe62a1cd1b7246fc3a4c20f43bb742a1
Added PatienceDiffer to Diff Viewer.
afb55222047babced031559e8afc462c39be6c20
Fixed issues flagged by flake8
29b81e90a417881d4a89c81ff73fc08addf68024

Checks run (2 succeeded)

flake8 passed.
JSHint passed.