Add Patience Differ to diff viewer.
Review Request #11239 — Created Oct. 22, 2020 and updated
Review Board uses a version of the Meyers Diff algorithm for the diff viewer,
which is pretty standard and widely-used. However, unclear diff might be
reported under some scenarios. For example, when user switched order of a big
chunk of code with each other, meyer diff will see it as a change line by line
instead of being aware that the whole block is switched. Patience differ
tackles issues like these by scanning the whole file and actively seeking out
interesting lines before applying diff while sacrificing some performances.This is a backend feature and it has not been updated to be the default method
of diffing. To test this feature, you could go to
reviewboard/diffviewer/differ.py
, and update line 26 toDEFAULT = PATIENCE
.
Unit tests are created in reviewboard/diffviewer/tests/test_patiencediff.py.
You can run the test by
./tests/runtests.py reviewboard.diffviewer.tests.test_patiencediff:PatienceDifferTest
.
All unit tests have passed.
Summary | ID |
---|---|
962bbe21c1466137f81987e6e41723ecf3ffdd63 | |
c049933abe62a1cd1b7246fc3a4c20f43bb742a1 | |
afb55222047babced031559e8afc462c39be6c20 | |
29b81e90a417881d4a89c81ff73fc08addf68024 |
Description | From | Last Updated |
---|---|---|
Can you adjust the Description and Testing Done to be <= 76 columns? |
chipx86 | |
It looks like your changes for the start indexes wound up in here. Can you rebase on top of that … |
chipx86 | |
E231 missing whitespace after ',' |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
F821 undefined name 'csv' |
reviewbot | |
E302 expected 2 blank lines, found 1 |
reviewbot | |
F821 undefined name 'os' |
reviewbot | |
E261 at least two spaces before inline comment |
reviewbot | |
F821 undefined name 'argparse' |
reviewbot | |
E501 line too long (80 > 79 characters) |
reviewbot | |
E501 line too long (84 > 79 characters) |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E501 line too long (107 > 79 characters) |
reviewbot | |
F821 undefined name 'get_filenames' |
reviewbot | |
F821 undefined name 'read_input' |
reviewbot | |
F821 undefined name 'find_match' |
reviewbot | |
F821 undefined name 'rm_duplicate' |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E265 block comment should start with '# ' |
reviewbot | |
W293 blank line contains whitespace |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E305 expected 2 blank lines after class or function definition, found 1 |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
F821 undefined name 'csv' |
reviewbot | |
W293 blank line contains whitespace |
reviewbot | |
E303 too many blank lines (3) |
reviewbot | |
F821 undefined name 'os' |
reviewbot | |
E261 at least two spaces before inline comment |
reviewbot | |
F821 undefined name 'argparse' |
reviewbot | |
E501 line too long (80 > 79 characters) |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
F821 undefined name 'csv' |
reviewbot | |
E302 expected 2 blank lines, found 1 |
reviewbot | |
F821 undefined name 'os' |
reviewbot | |
E261 at least two spaces before inline comment |
reviewbot | |
F821 undefined name 'argparse' |
reviewbot | |
E501 line too long (80 > 79 characters) |
reviewbot | |
E501 line too long (84 > 79 characters) |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E501 line too long (107 > 79 characters) |
reviewbot | |
F821 undefined name 'get_filenames' |
reviewbot | |
F821 undefined name 'read_input' |
reviewbot | |
F821 undefined name 'find_match' |
reviewbot | |
F821 undefined name 'rm_duplicate' |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E265 block comment should start with '# ' |
reviewbot | |
W293 blank line contains whitespace |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E305 expected 2 blank lines after class or function definition, found 1 |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
F821 undefined name 'csv' |
reviewbot | |
W293 blank line contains whitespace |
reviewbot | |
E303 too many blank lines (3) |
reviewbot | |
F821 undefined name 'os' |
reviewbot | |
E261 at least two spaces before inline comment |
reviewbot | |
F821 undefined name 'argparse' |
reviewbot | |
E501 line too long (80 > 79 characters) |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
F821 undefined name 'csv' |
reviewbot | |
E302 expected 2 blank lines, found 1 |
reviewbot | |
F821 undefined name 'os' |
reviewbot | |
E261 at least two spaces before inline comment |
reviewbot | |
F821 undefined name 'argparse' |
reviewbot | |
E501 line too long (80 > 79 characters) |
reviewbot | |
E501 line too long (84 > 79 characters) |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E501 line too long (107 > 79 characters) |
reviewbot | |
F821 undefined name 'get_filenames' |
reviewbot | |
F821 undefined name 'read_input' |
reviewbot | |
F821 undefined name 'find_match' |
reviewbot | |
F821 undefined name 'rm_duplicate' |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E265 block comment should start with '# ' |
reviewbot | |
W293 blank line contains whitespace |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E305 expected 2 blank lines after class or function definition, found 1 |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
F821 undefined name 'csv' |
reviewbot | |
W293 blank line contains whitespace |
reviewbot | |
E303 too many blank lines (3) |
reviewbot | |
F821 undefined name 'os' |
reviewbot | |
E261 at least two spaces before inline comment |
reviewbot | |
F821 undefined name 'argparse' |
reviewbot | |
E501 line too long (80 > 79 characters) |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
F821 undefined name 'csv' |
reviewbot | |
E302 expected 2 blank lines, found 1 |
reviewbot | |
F821 undefined name 'os' |
reviewbot | |
E261 at least two spaces before inline comment |
reviewbot | |
E501 line too long (80 > 79 characters) |
reviewbot | |
F821 undefined name 'argparse' |
reviewbot | |
E501 line too long (84 > 79 characters) |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E501 line too long (107 > 79 characters) |
reviewbot | |
F821 undefined name 'get_filenames' |
reviewbot | |
F821 undefined name 'read_input' |
reviewbot | |
F821 undefined name 'find_match' |
reviewbot | |
F821 undefined name 'rm_duplicate' |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E265 block comment should start with '# ' |
reviewbot | |
W293 blank line contains whitespace |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E305 expected 2 blank lines after class or function definition, found 1 |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
F821 undefined name 'csv' |
reviewbot | |
W293 blank line contains whitespace |
reviewbot | |
E303 too many blank lines (3) |
reviewbot | |
F821 undefined name 'os' |
reviewbot | |
E261 at least two spaces before inline comment |
reviewbot | |
F821 undefined name 'argparse' |
reviewbot | |
E501 line too long (80 > 79 characters) |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
F821 undefined name 'csv' |
reviewbot | |
E302 expected 2 blank lines, found 1 |
reviewbot | |
F821 undefined name 'os' |
reviewbot | |
E261 at least two spaces before inline comment |
reviewbot | |
F821 undefined name 'argparse' |
reviewbot | |
E501 line too long (80 > 79 characters) |
reviewbot | |
E501 line too long (84 > 79 characters) |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E501 line too long (107 > 79 characters) |
reviewbot | |
F821 undefined name 'get_filenames' |
reviewbot | |
F821 undefined name 'read_input' |
reviewbot | |
F821 undefined name 'find_match' |
reviewbot | |
F821 undefined name 'rm_duplicate' |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E265 block comment should start with '# ' |
reviewbot | |
W293 blank line contains whitespace |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E305 expected 2 blank lines after class or function definition, found 1 |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
F821 undefined name 'csv' |
reviewbot | |
W293 blank line contains whitespace |
reviewbot | |
E303 too many blank lines (3) |
reviewbot | |
F821 undefined name 'os' |
reviewbot | |
E261 at least two spaces before inline comment |
reviewbot | |
F821 undefined name 'argparse' |
reviewbot | |
E501 line too long (80 > 79 characters) |
reviewbot | |
F401 'reviewboard.diffviewer.differ.DiffCompatVersion' imported but unused |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E303 too many blank lines (2) |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E128 continuation line under-indented for visual indent |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E128 continuation line under-indented for visual indent |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E226 missing whitespace around arithmetic operator |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
Can you add a file-level docstring above this? (Blank line separating it from this import.) |
chipx86 | |
No blank line here. |
chipx86 | |
This needs to subclass object. Also, this should probably be outside of the PatienceDiffer class. |
chipx86 | |
The trailing """ can go on the first line. |
chipx86 | |
Blank line between a class-level docstring and the body. |
chipx86 | |
This is missing a docstring describing the arguments. |
chipx86 | |
These would all go into an Attributes: section in the class's docstring (to deal with parsability issues). |
chipx86 | |
"Return whether ..." Same below. |
chipx86 | |
Type goes on its own line. It also needs to be bool (casing matters). Same comments below. |
chipx86 | |
This can be a single statement, since the conditional evaluates to true or false. Same below. |
chipx86 | |
How about: "Return a string representation of this slice." |
chipx86 | |
The type must be unicode |
chipx86 | |
So normally, this should be formatted like: return ( 'A: %s\n' 'B: %s\n' '%s' % ((self.a_low, self.a_high), (self.b_low, self.b_high), '-' … |
chipx86 | |
If this doesn't need to perform any logic of its own, you can omit it. |
chipx86 | |
Summary docstrings cannot wrap. How about: """Return an opcode generator for the contents of the diff. |
chipx86 | |
This should just be tuple, with the contents described in the description. |
chipx86 | |
For readability, can you use keyword arguments? That'll help convey what's going into this class without having to jump up … |
chipx86 | |
There should be a blank line separating a statement from a block on the same indentation level, but not on … |
chipx86 | |
Let's store as a tuple. It'll take less memory and will be faster to operate on, which is preferred here. |
chipx86 | |
"Differ" -> "Diff" |
chipx86 | |
The description must be on the next line, indented 4. Rather than describe the opcode format, you can just say … |
chipx86 | |
Same as above. |
chipx86 | |
Blank line between these. |
chipx86 | |
So ultimately, what we'd want to do is set up a deeper generator pattern for the Myers work and the … |
chipx86 | |
As mentioned above, slice is a reserved word. |
chipx86 | |
"Return all ..." |
chipx86 | |
Description on the next line. |
chipx86 | |
Description gets its own lines. |
chipx86 | |
Because you use range, you'll need to a six.moves import for range. Grep in the codebase for import range and … |
chipx86 | |
Bcause we use self.a and self.b a lot, let's pull these out into local variables to avoid the attribute lookup … |
chipx86 | |
A couple notes here: When we have an if/else, it's a bit easier mentally work with if the first condition … |
chipx86 | |
Comments need to always be proper sentences, so this should have a period. |
chipx86 | |
It doesn't look like we use this for anything else. Can we just reference i? |
chipx86 | |
No blank line here. |
chipx86 | |
Let's avoid all the lookups and instead iterate using six.iteritems, which will give us keys and values. |
chipx86 | |
"If" |
chipx86 | |
This can be one statement. |
chipx86 | |
This must be a single line. |
chipx86 | |
Same comments as above regarding the formatting. |
chipx86 | |
No blank line here. |
chipx86 | |
Rather than separate elif lines with comments separating them, put the comments inside the nested block. That will keep the … |
chipx86 | |
No blank line at the start of a nested block. |
chipx86 | |
Must use sentence casing. |
chipx86 | |
Rather than have to compare to stacks[-1] each time, let's fetch the value of stacks[-1] once and then compare against … |
chipx86 | |
Rather than single-line code, which is harder to reason about and maintain, can you split this up as: prev = … |
chipx86 | |
while True: |
chipx86 | |
This needs sentence casing. |
chipx86 | |
We can just check if not prev here, which will be faster (as it uses a different operator). |
chipx86 | |
No blank line at the start of a nested block. |
chipx86 | |
Same doc comments as above. I'll let you handle the rest :) |
chipx86 | |
Rather than the shorthand, can you set these individually? It's more clear and maintainable that way. |
chipx86 | |
Blank line between these. |
chipx86 | |
Rather than look up match[1] for each iteration, let's pull it out above. |
chipx86 | |
Rather than looking up matched_slices[-1] each time, let's pull it out into a last_matched_slice variable and access that. |
chipx86 | |
Blank line before a statement opening a new block. |
chipx86 | |
Can you use parens? Like: while (not slice.is_empty() and self.a[slice.a_low] == self.b[slice.b_low]): Also, let's fetch self.a and self.b to avoid … |
chipx86 | |
With this conditional, we're returning slice either way. This code can be cleaned up by doing: if old_a_low != slice.a_low: … |
chipx86 | |
This can be one statement. |
chipx86 | |
Much of my comments in match_head apply here as well. |
chipx86 | |
Can you add a file-level docstring? |
chipx86 | |
"PatienceDiffer" |
chipx86 | |
""" goes on the next line. |
chipx86 | |
Can you add a docstring for this? |
chipx86 | |
E127 continuation line over-indented for visual indent |
reviewbot | |
E127 continuation line over-indented for visual indent |
reviewbot | |
E127 continuation line over-indented for visual indent |
reviewbot | |
E231 missing whitespace after ',' |
reviewbot | |
E127 continuation line over-indented for visual indent |
reviewbot | |
E127 continuation line over-indented for visual indent |
reviewbot | |
E502 the backslash is redundant between brackets |
reviewbot | |
E127 continuation line over-indented for visual indent |
reviewbot |
- Change Summary:
-
Updated patiencediff to call myers with no errors, added new test cases to check if the correct result is returned, so far the tests do not pass.
- Commits:
-
Summary ID c9e2112bea9ba3e65aa103cfb8f5737831c276a2 971b64d11610de99cc6b4ea4aa291dce93b19487 c9e2112bea9ba3e65aa103cfb8f5737831c276a2 ffeb1e9454cd67c12c71a1ea9449d4d6e180c4b4 eeab960675da6c2ae7eea8f6c84059cfbec09660 dbef3c32f854dfedf8de19922edaf1b46f5435dc - Diff:
-
Revision 2 (+4690 -46)
Checks run (1 failed, 1 succeeded)
flake8
-
Warning: Showing 30 of 97 failures.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Commits:
-
Summary ID c9e2112bea9ba3e65aa103cfb8f5737831c276a2 ffeb1e9454cd67c12c71a1ea9449d4d6e180c4b4 eeab960675da6c2ae7eea8f6c84059cfbec09660 dbef3c32f854dfedf8de19922edaf1b46f5435dc c9e2112bea9ba3e65aa103cfb8f5737831c276a2 ffeb1e9454cd67c12c71a1ea9449d4d6e180c4b4 eeab960675da6c2ae7eea8f6c84059cfbec09660 dbef3c32f854dfedf8de19922edaf1b46f5435dc 3a8a96d0518478d597cc1d1b76fb6252fa4fa4ba 36b33edda96fe5a6e6ff6851dd0fe4c760b71407 - Diff:
-
Revision 3 (+4761 -83)
Checks run (1 failed, 1 succeeded)
flake8
-
Warning: Showing 30 of 85 failures.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Commits:
-
Summary ID c9e2112bea9ba3e65aa103cfb8f5737831c276a2 ffeb1e9454cd67c12c71a1ea9449d4d6e180c4b4 eeab960675da6c2ae7eea8f6c84059cfbec09660 dbef3c32f854dfedf8de19922edaf1b46f5435dc 3a8a96d0518478d597cc1d1b76fb6252fa4fa4ba 36b33edda96fe5a6e6ff6851dd0fe4c760b71407 c9e2112bea9ba3e65aa103cfb8f5737831c276a2 ffeb1e9454cd67c12c71a1ea9449d4d6e180c4b4 eeab960675da6c2ae7eea8f6c84059cfbec09660 dbef3c32f854dfedf8de19922edaf1b46f5435dc 3a8a96d0518478d597cc1d1b76fb6252fa4fa4ba 36b33edda96fe5a6e6ff6851dd0fe4c760b71407 b009817dbb7bd7e9b36438eb48adfb9cb2eb8a4f - Diff:
-
Revision 4 (+5106 -922)
Checks run (1 failed, 1 succeeded)
flake8
-
Warning: Showing 30 of 91 failures.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Commits:
-
Summary ID c9e2112bea9ba3e65aa103cfb8f5737831c276a2 ffeb1e9454cd67c12c71a1ea9449d4d6e180c4b4 eeab960675da6c2ae7eea8f6c84059cfbec09660 dbef3c32f854dfedf8de19922edaf1b46f5435dc 3a8a96d0518478d597cc1d1b76fb6252fa4fa4ba 36b33edda96fe5a6e6ff6851dd0fe4c760b71407 b009817dbb7bd7e9b36438eb48adfb9cb2eb8a4f c9e2112bea9ba3e65aa103cfb8f5737831c276a2 ffeb1e9454cd67c12c71a1ea9449d4d6e180c4b4 eeab960675da6c2ae7eea8f6c84059cfbec09660 dbef3c32f854dfedf8de19922edaf1b46f5435dc 3a8a96d0518478d597cc1d1b76fb6252fa4fa4ba 36b33edda96fe5a6e6ff6851dd0fe4c760b71407 b009817dbb7bd7e9b36438eb48adfb9cb2eb8a4f 877e67a8673197202c367e02a5855b0e587f5906 - Diff:
-
Revision 5 (+5202 -966)
Checks run (1 failed, 1 succeeded)
flake8
-
Warning: Showing 30 of 98 failures.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Change Summary:
-
Polished PatienceDiffer with proper documentation and formatting, removed WIP tag.
- Description:
-
~ [WIP] Add Patience Differ to diff viewer.
~ Review Board uses a version of the Meyers Diff algorithm for the diff viewer, which is pretty standard and widely-used. However, unclear diff might be reported under some scenarios. For example, when user switched order of a big chunk of code with each other, meyer diff will see it as a change line by line instead of being aware that the whole block is switched. Patience differ tackles issues like these by scanning the whole file and actively seeking out interesting lines before applying diff while sacrificing some performances.
+ + This is a backend feature and it has not been updated to be the default method of diffing. To test this feature, you could go to reviewboard/diffviewer/differ.py, and update line 26 to
DEFAULT = PATIENCE
. - Testing Done:
-
~ Review Board uses a version of the Meyers Diff algorithm for the diff viewer, which is pretty standard and widely-used. However, unclear diff might be reported under some scenarios. For example, when user switched order of a big chunk of code with each other, meyer diff will see it as a change line by line instead of being aware that the whole block is switched. Patience differ tackles issues like these by scanning the whole file and actively seeking out interesting lines before applying diff while sacrificing some performances.
~ ~ This is a WIP request with all of the work I've done so far checked in, including two jupyter notebooks (patience_diff_v2 is the most updated one), some sample files for diff tests, and a py file containing my attempt to integrate patiencediff into RB.
~ Unit tests are created in reviewboard/diffviewer/tests/test_patiencediff.py.
~ You can run the test by ./tests/runtests.py reviewboard.diffviewer.tests.test_patiencediff:PatienceDifferTest
.~ All unit tests have passed. - Commits:
-
Summary ID c9e2112bea9ba3e65aa103cfb8f5737831c276a2 ffeb1e9454cd67c12c71a1ea9449d4d6e180c4b4 eeab960675da6c2ae7eea8f6c84059cfbec09660 dbef3c32f854dfedf8de19922edaf1b46f5435dc 3a8a96d0518478d597cc1d1b76fb6252fa4fa4ba 36b33edda96fe5a6e6ff6851dd0fe4c760b71407 b009817dbb7bd7e9b36438eb48adfb9cb2eb8a4f 877e67a8673197202c367e02a5855b0e587f5906 14cfcd3b1e28ef1093b7248d898df9b25d94845f 962bbe21c1466137f81987e6e41723ecf3ffdd63 c049933abe62a1cd1b7246fc3a4c20f43bb742a1 afb55222047babced031559e8afc462c39be6c20
Checks run (1 failed, 1 succeeded)
flake8
-
Warning: Showing 30 of 38 failures.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Summary:
-
[WIP] Add Patience Differ to diff viewer.Add Patience Differ to diff viewer.
- Commits:
-
Summary ID 14cfcd3b1e28ef1093b7248d898df9b25d94845f 962bbe21c1466137f81987e6e41723ecf3ffdd63 c049933abe62a1cd1b7246fc3a4c20f43bb742a1 afb55222047babced031559e8afc462c39be6c20 14cfcd3b1e28ef1093b7248d898df9b25d94845f 962bbe21c1466137f81987e6e41723ecf3ffdd63 c049933abe62a1cd1b7246fc3a4c20f43bb742a1 afb55222047babced031559e8afc462c39be6c20 2829587371d814581d92dc7a3a4b979b8906a290 - Groups:
-
- Diff:
Revision 7 (+1123 -39)
Checks run (1 failed, 1 succeeded)
flake8 failed.JSHint passed.flake8
- Commits:
-
Summary ID 14cfcd3b1e28ef1093b7248d898df9b25d94845f 962bbe21c1466137f81987e6e41723ecf3ffdd63 c049933abe62a1cd1b7246fc3a4c20f43bb742a1 afb55222047babced031559e8afc462c39be6c20 2829587371d814581d92dc7a3a4b979b8906a290 14cfcd3b1e28ef1093b7248d898df9b25d94845f 962bbe21c1466137f81987e6e41723ecf3ffdd63 c049933abe62a1cd1b7246fc3a4c20f43bb742a1 afb55222047babced031559e8afc462c39be6c20 3b4f77a94a6d97245c18d0732dd94829ee6c37a0
Checks run (2 succeeded)
-
This is exciting! I want to play with it.
There's a lot of comments. Some are repeats of previous ones. There's really two categories of comments here:
- Code/doc guideline inconsistencies
- Performance improvements (small things that add up in diff logic, like repeated attribute accesses or lookups by index)
-
-
It looks like your changes for the start indexes wound up in here. Can you rebase on top of that and post the range for these commits only?
-
-
-
-
-
-
-
These would all go into an
Attributes:
section in the class's docstring (to deal with parsability issues). -
-
-
-
-
-
So normally, this should be formatted like:
return ( 'A: %s\n' 'B: %s\n' '%s' % ((self.a_low, self.a_high), (self.b_low, self.b_high), '-' * 79) )
That's faster and separates out the structure of the string from the variables going into it.
Now that said, this feels really debuggy to me, and not something that ultimately needs to remain in here.
-
-
Summary docstrings cannot wrap.
How about:
"""Return an opcode generator for the contents of the diff.
-
-
For readability, can you use keyword arguments? That'll help convey what's going into this class without having to jump up to the definition.
-
There should be a blank line separating a statement from a block on the same indentation level, but not on a new nested block. So this should look like:
opcodes = [] for slice in matched_slices: ...
Also of note:
slice
is a reserved word in Python, so you'll need to rethink the variable. -
Let's store as a tuple. It'll take less memory and will be faster to operate on, which is preferred here.
-
-
The description must be on the next line, indented 4.
Rather than describe the opcode format, you can just say that each entry is a standard opcode tuple. For diff viewer work, this will be a standard format.
-
-
-
So ultimately, what we'd want to do is set up a deeper generator pattern for the Myers work and the merging, so that we don't have to take in opcodes up-front, iterate through, store, and then re-iterate through for merging (and then again for yielding).
Consider the following a future TODO, since we're at the end of the semester, for either yourself if you're interested or for us.
The approach I'd use would be to iterate through the opcodes, one-by-one, storing the previous opcode. On each opcode, check if it should be merged into the previous. If yes, merge. If no, yield the previous, and reset merge state for the current one. So something like:
last_opcode = next(opcodes) for opcode in opcodes: if opcode[0] == last_opcode[0]: last_opcode = (tag, last_opcode[1], last_opcode[2] + opcode[2], last_opcode[4] + opcode[4]) else: yield last_opcode last_opcode = opcode yield last_opcode
Untested, but something like that.
What you can then do is have
get_opcodes()
call a function that loops through the slices and does the Myers work (yielding results from its ownget_opcodes()
, rather than storing), and pass that result directly intomerge_opcodes
.That then sets up the fastest route possible for getting any opcodes from Myers work into the merge and to the caller, with no storage (beyond the
last_opcode
). -
-
-
-
-
Because you use
range
, you'll need to asix.moves
import forrange
. Grep in the codebase forimport range
and you'll find it.The reason for this is that Python 2's
range
builds and returns a list, whereas Python 3 uses a generator. We want to be consistent and fast. -
Bcause we use
self.a
andself.b
a lot, let's pull these out into local variables to avoid the attribute lookup on each use.Similarly, since we use
a[i]
repeatedly, pull that out as well.This helps keep diff logic fast.
Same goes for
b
below. -
A couple notes here:
- When we have an if/else, it's a bit easier mentally work with if the first condition is the positive. So,
if .. in
would be better thanif .. not in
. - We probably don't need
position
. We can referencei
instead.
- When we have an if/else, it's a bit easier mentally work with if the first condition is the positive. So,
-
-
-
-
Let's avoid all the lookups and instead iterate using
six.iteritems
, which will give us keys and values. -
-
-
-
-
-
Rather than separate
elif
lines with comments separating them, put the comments inside the nested block. That will keep the conditionals clearly together, and provide an opportunity to better describe each of the conditions. -
-
-
Rather than have to compare to
stacks[-1]
each time, let's fetch the value ofstacks[-1]
once and then compare against that.Also, do we need to continue if we find a match?
-
Rather than single-line code, which is harder to reason about and maintain, can you split this up as:
prev = [ match for match in directed_matches if match[0] == last[1][0] ]
Also,
last[1][0]
has to be fetched every single time, so let's pull that out before we do this, and compare against that value.Same comments apply below.
-
-
-
-
-
-
Rather than the shorthand, can you set these individually? It's more clear and maintainable that way.
-
-
-
Rather than looking up
matched_slices[-1]
each time, let's pull it out into alast_matched_slice
variable and access that. -
-
Can you use parens? Like:
while (not slice.is_empty() and self.a[slice.a_low] == self.b[slice.b_low]):
Also, let's fetch
self.a
andself.b
to avoid the attribute lookups each iteration. -
With this conditional, we're returning
slice
either way. This code can be cleaned up by doing:if old_a_low != slice.a_low: matched_slices.append(Slice(...)) return slice
-
-
-
-
-
-
- Description:
-
~ Review Board uses a version of the Meyers Diff algorithm for the diff viewer, which is pretty standard and widely-used. However, unclear diff might be reported under some scenarios. For example, when user switched order of a big chunk of code with each other, meyer diff will see it as a change line by line instead of being aware that the whole block is switched. Patience differ tackles issues like these by scanning the whole file and actively seeking out interesting lines before applying diff while sacrificing some performances.
~ Review Board uses a version of the Meyers Diff algorithm for the diff viewer,
+ which is pretty standard and widely-used. However, unclear diff might be + reported under some scenarios. For example, when user switched order of a big + chunk of code with each other, meyer diff will see it as a change line by line + instead of being aware that the whole block is switched. Patience differ + tackles issues like these by scanning the whole file and actively seeking out + interesting lines before applying diff while sacrificing some performances. ~ This is a backend feature and it has not been updated to be the default method of diffing. To test this feature, you could go to reviewboard/diffviewer/differ.py, and update line 26 to
DEFAULT = PATIENCE
.~ This is a backend feature and it has not been updated to be the default method
+ of diffing. To test this feature, you could go to + reviewboard/diffviewer/differ.py
, and update line 26 toDEFAULT = PATIENCE
. - Testing Done:
-
Unit tests are created in reviewboard/diffviewer/tests/test_patiencediff.py.
~ You can run the test by ./tests/runtests.py reviewboard.diffviewer.tests.test_patiencediff:PatienceDifferTest
.~ You can run the test by + ./tests/runtests.py reviewboard.diffviewer.tests.test_patiencediff:PatienceDifferTest
.All unit tests have passed.
- Change Summary:
-
Addressed comment from Christian.
- Commits:
-
Summary ID 14cfcd3b1e28ef1093b7248d898df9b25d94845f 962bbe21c1466137f81987e6e41723ecf3ffdd63 c049933abe62a1cd1b7246fc3a4c20f43bb742a1 afb55222047babced031559e8afc462c39be6c20 3b4f77a94a6d97245c18d0732dd94829ee6c37a0 962bbe21c1466137f81987e6e41723ecf3ffdd63 c049933abe62a1cd1b7246fc3a4c20f43bb742a1 afb55222047babced031559e8afc462c39be6c20 9d97f16613ef51587e9090844c36fab042fa499a
Checks run (1 failed, 1 succeeded)
flake8
- Commits:
-
Summary ID 962bbe21c1466137f81987e6e41723ecf3ffdd63 c049933abe62a1cd1b7246fc3a4c20f43bb742a1 afb55222047babced031559e8afc462c39be6c20 9d97f16613ef51587e9090844c36fab042fa499a 962bbe21c1466137f81987e6e41723ecf3ffdd63 c049933abe62a1cd1b7246fc3a4c20f43bb742a1 afb55222047babced031559e8afc462c39be6c20 29b81e90a417881d4a89c81ff73fc08addf68024