Add Patience Differ to diff viewer.

Review Request #11239 — Created Oct. 22, 2020 and updated

bnie
Review Board
release-4.0.x
reviewboard, students

Review Board uses a version of the Meyers Diff algorithm for the diff viewer,
which is pretty standard and widely-used. However, unclear diff might be
reported under some scenarios. For example, when user switched order of a big
chunk of code with each other, meyer diff will see it as a change line by line
instead of being aware that the whole block is switched. Patience differ
tackles issues like these by scanning the whole file and actively seeking out
interesting lines before applying diff while sacrificing some performances.

This is a backend feature and it has not been updated to be the default method
of diffing. To test this feature, you could go to
reviewboard/diffviewer/differ.py, and update line 26 to DEFAULT = PATIENCE.

Unit tests are created in reviewboard/diffviewer/tests/test_patiencediff.py.
You can run the test by
./tests/runtests.py reviewboard.diffviewer.tests.test_patiencediff:PatienceDifferTest.
All unit tests have passed.

Summary
Added patience differ as a new option in differ class.
Added patience diff unit tests.
Added PatienceDiffer to Diff Viewer.
Fixed issues flagged by flake8
Description From Last Updated

Can you adjust the Description and Testing Done to be <= 76 columns?

chipx86chipx86

It looks like your changes for the start indexes wound up in here. Can you rebase on top of that ...

chipx86chipx86

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

E302 expected 2 blank lines, found 1

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

E501 line too long (84 > 79 characters)

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E501 line too long (107 > 79 characters)

reviewbotreviewbot

F821 undefined name 'get_filenames'

reviewbotreviewbot

F821 undefined name 'read_input'

reviewbotreviewbot

F821 undefined name 'find_match'

reviewbotreviewbot

F821 undefined name 'rm_duplicate'

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E265 block comment should start with '# '

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E305 expected 2 blank lines after class or function definition, found 1

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E303 too many blank lines (3)

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

E302 expected 2 blank lines, found 1

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

E501 line too long (84 > 79 characters)

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E501 line too long (107 > 79 characters)

reviewbotreviewbot

F821 undefined name 'get_filenames'

reviewbotreviewbot

F821 undefined name 'read_input'

reviewbotreviewbot

F821 undefined name 'find_match'

reviewbotreviewbot

F821 undefined name 'rm_duplicate'

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E265 block comment should start with '# '

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E305 expected 2 blank lines after class or function definition, found 1

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E303 too many blank lines (3)

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

E302 expected 2 blank lines, found 1

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

E501 line too long (84 > 79 characters)

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E501 line too long (107 > 79 characters)

reviewbotreviewbot

F821 undefined name 'get_filenames'

reviewbotreviewbot

F821 undefined name 'read_input'

reviewbotreviewbot

F821 undefined name 'find_match'

reviewbotreviewbot

F821 undefined name 'rm_duplicate'

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E265 block comment should start with '# '

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E305 expected 2 blank lines after class or function definition, found 1

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E303 too many blank lines (3)

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

E302 expected 2 blank lines, found 1

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (84 > 79 characters)

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E501 line too long (107 > 79 characters)

reviewbotreviewbot

F821 undefined name 'get_filenames'

reviewbotreviewbot

F821 undefined name 'read_input'

reviewbotreviewbot

F821 undefined name 'find_match'

reviewbotreviewbot

F821 undefined name 'rm_duplicate'

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E265 block comment should start with '# '

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E305 expected 2 blank lines after class or function definition, found 1

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E303 too many blank lines (3)

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

E302 expected 2 blank lines, found 1

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

E501 line too long (84 > 79 characters)

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E501 line too long (107 > 79 characters)

reviewbotreviewbot

F821 undefined name 'get_filenames'

reviewbotreviewbot

F821 undefined name 'read_input'

reviewbotreviewbot

F821 undefined name 'find_match'

reviewbotreviewbot

F821 undefined name 'rm_duplicate'

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E265 block comment should start with '# '

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E305 expected 2 blank lines after class or function definition, found 1

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

F821 undefined name 'csv'

reviewbotreviewbot

W293 blank line contains whitespace

reviewbotreviewbot

E303 too many blank lines (3)

reviewbotreviewbot

F821 undefined name 'os'

reviewbotreviewbot

E261 at least two spaces before inline comment

reviewbotreviewbot

F821 undefined name 'argparse'

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

F401 'reviewboard.diffviewer.differ.DiffCompatVersion' imported but unused

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E303 too many blank lines (2)

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E128 continuation line under-indented for visual indent

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E128 continuation line under-indented for visual indent

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E226 missing whitespace around arithmetic operator

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

Can you add a file-level docstring above this? (Blank line separating it from this import.)

chipx86chipx86

No blank line here.

chipx86chipx86

This needs to subclass object. Also, this should probably be outside of the PatienceDiffer class.

chipx86chipx86

The trailing """ can go on the first line.

chipx86chipx86

Blank line between a class-level docstring and the body.

chipx86chipx86

This is missing a docstring describing the arguments.

chipx86chipx86

These would all go into an Attributes: section in the class's docstring (to deal with parsability issues).

chipx86chipx86

"Return whether ..." Same below.

chipx86chipx86

Type goes on its own line. It also needs to be bool (casing matters). Same comments below.

chipx86chipx86

This can be a single statement, since the conditional evaluates to true or false. Same below.

chipx86chipx86

How about: "Return a string representation of this slice."

chipx86chipx86

The type must be unicode

chipx86chipx86

So normally, this should be formatted like: return ( 'A: %s\n' 'B: %s\n' '%s' % ((self.a_low, self.a_high), (self.b_low, self.b_high), '-' ...

chipx86chipx86

If this doesn't need to perform any logic of its own, you can omit it.

chipx86chipx86

Summary docstrings cannot wrap. How about: """Return an opcode generator for the contents of the diff.

chipx86chipx86

This should just be tuple, with the contents described in the description.

chipx86chipx86

For readability, can you use keyword arguments? That'll help convey what's going into this class without having to jump up ...

chipx86chipx86

There should be a blank line separating a statement from a block on the same indentation level, but not on ...

chipx86chipx86

Let's store as a tuple. It'll take less memory and will be faster to operate on, which is preferred here.

chipx86chipx86

"Differ" -> "Diff"

chipx86chipx86

The description must be on the next line, indented 4. Rather than describe the opcode format, you can just say ...

chipx86chipx86

Same as above.

chipx86chipx86

Blank line between these.

chipx86chipx86

So ultimately, what we'd want to do is set up a deeper generator pattern for the Myers work and the ...

chipx86chipx86

As mentioned above, slice is a reserved word.

chipx86chipx86

"Return all ..."

chipx86chipx86

Description on the next line.

chipx86chipx86

Description gets its own lines.

chipx86chipx86

Because you use range, you'll need to a six.moves import for range. Grep in the codebase for import range and ...

chipx86chipx86

Bcause we use self.a and self.b a lot, let's pull these out into local variables to avoid the attribute lookup ...

chipx86chipx86

A couple notes here: When we have an if/else, it's a bit easier mentally work with if the first condition ...

chipx86chipx86

Comments need to always be proper sentences, so this should have a period.

chipx86chipx86

It doesn't look like we use this for anything else. Can we just reference i?

chipx86chipx86

No blank line here.

chipx86chipx86

Let's avoid all the lookups and instead iterate using six.iteritems, which will give us keys and values.

chipx86chipx86

"If"

chipx86chipx86

This can be one statement.

chipx86chipx86

This must be a single line.

chipx86chipx86

Same comments as above regarding the formatting.

chipx86chipx86

No blank line here.

chipx86chipx86

Rather than separate elif lines with comments separating them, put the comments inside the nested block. That will keep the ...

chipx86chipx86

No blank line at the start of a nested block.

chipx86chipx86

Must use sentence casing.

chipx86chipx86

Rather than have to compare to stacks[-1] each time, let's fetch the value of stacks[-1] once and then compare against ...

chipx86chipx86

Rather than single-line code, which is harder to reason about and maintain, can you split this up as: prev = ...

chipx86chipx86

while True:

chipx86chipx86

This needs sentence casing.

chipx86chipx86

We can just check if not prev here, which will be faster (as it uses a different operator).

chipx86chipx86

No blank line at the start of a nested block.

chipx86chipx86

Same doc comments as above. I'll let you handle the rest :)

chipx86chipx86

Rather than the shorthand, can you set these individually? It's more clear and maintainable that way.

chipx86chipx86

Blank line between these.

chipx86chipx86

Rather than look up match[1] for each iteration, let's pull it out above.

chipx86chipx86

Rather than looking up matched_slices[-1] each time, let's pull it out into a last_matched_slice variable and access that.

chipx86chipx86

Blank line before a statement opening a new block.

chipx86chipx86

Can you use parens? Like: while (not slice.is_empty() and self.a[slice.a_low] == self.b[slice.b_low]): Also, let's fetch self.a and self.b to avoid ...

chipx86chipx86

With this conditional, we're returning slice either way. This code can be cleaned up by doing: if old_a_low != slice.a_low: ...

chipx86chipx86

This can be one statement.

chipx86chipx86

Much of my comments in match_head apply here as well.

chipx86chipx86

Can you add a file-level docstring?

chipx86chipx86

"PatienceDiffer"

chipx86chipx86

""" goes on the next line.

chipx86chipx86

Can you add a docstring for this?

chipx86chipx86

E127 continuation line over-indented for visual indent

reviewbotreviewbot

E127 continuation line over-indented for visual indent

reviewbotreviewbot

E127 continuation line over-indented for visual indent

reviewbotreviewbot

E231 missing whitespace after ','

reviewbotreviewbot

E127 continuation line over-indented for visual indent

reviewbotreviewbot

E127 continuation line over-indented for visual indent

reviewbotreviewbot

E502 the backslash is redundant between brackets

reviewbotreviewbot

E127 continuation line over-indented for visual indent

reviewbotreviewbot
Checks run (1 failed, 1 succeeded)
flake8 failed.
JSHint passed.

flake8

bnie
Review request changed

Change Summary:

Updated patiencediff to call myers with no errors, added new test cases to check if the correct result is returned, so far the tests do not pass.

Commits:

Summary
-
[WIP] Add Patience Differ to diff viewer.
-
Check in work done so far that will not be part of Review Board:
+
[WIP] Add Patience Differ to diff viewer.
+
Check in work done so far that will not be part of Review Board:
+
Added two more parameter to indicate start index of a and b file, and modified
+
Updated patiencediff to call myers with no errors, added new test cases to check if the correct result is returned, so far the tests do not pass.

Diff:

Revision 2 (+4690 -46)

Show changes

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

bnie
Review request changed

Commits:

Summary
-
[WIP] Add Patience Differ to diff viewer.
-
Check in work done so far that will not be part of Review Board:
-
Added two more parameter to indicate start index of a and b file, and modified
-
Updated patiencediff to call myers with no errors, added new test cases to check if the correct result is returned, so far the tests do not pass.
+
[WIP] Add Patience Differ to diff viewer.
+
Check in work done so far that will not be part of Review Board:
+
Added two more parameter to indicate start index of a and b file, and modified
+
Updated patiencediff to call myers with no errors, added new test cases to check if the correct result is returned, so far the tests do not pass.
+
Update patience differ test cases and added patience diff as one new differ option in differ class.
+
Removed unecessary tabs and spaces in files

Diff:

Revision 3 (+4761 -83)

Show changes

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

bnie
Review request changed

Commits:

Summary
-
[WIP] Add Patience Differ to diff viewer.
-
Check in work done so far that will not be part of Review Board:
-
Added two more parameter to indicate start index of a and b file, and modified
-
Updated patiencediff to call myers with no errors, added new test cases to check if the correct result is returned, so far the tests do not pass.
-
Update patience differ test cases and added patience diff as one new differ option in differ class.
-
Removed unecessary tabs and spaces in files
+
[WIP] Add Patience Differ to diff viewer.
+
Check in work done so far that will not be part of Review Board:
+
Added two more parameter to indicate start index of a and b file, and modified
+
Updated patiencediff to call myers with no errors, added new test cases to check if the correct result is returned, so far the tests do not pass.
+
Update patience differ test cases and added patience diff as one new differ option in differ class.
+
Removed unecessary tabs and spaces in files
+
Updated PatienceDiffer tests and added new functions to pre-process code blocks

Diff:

Revision 4 (+5106 -922)

Show changes

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

bnie
Review request changed

Commits:

Summary
-
[WIP] Add Patience Differ to diff viewer.
-
Check in work done so far that will not be part of Review Board:
-
Added two more parameter to indicate start index of a and b file, and modified
-
Updated patiencediff to call myers with no errors, added new test cases to check if the correct result is returned, so far the tests do not pass.
-
Update patience differ test cases and added patience diff as one new differ option in differ class.
-
Removed unecessary tabs and spaces in files
-
Updated PatienceDiffer tests and added new functions to pre-process code blocks
+
[WIP] Add Patience Differ to diff viewer.
+
Check in work done so far that will not be part of Review Board:
+
Added two more parameter to indicate start index of a and b file, and modified
+
Updated patiencediff to call myers with no errors, added new test cases to check if the correct result is returned, so far the tests do not pass.
+
Update patience differ test cases and added patience diff as one new differ option in differ class.
+
Removed unecessary tabs and spaces in files
+
Updated PatienceDiffer tests and added new functions to pre-process code blocks
+
Check in final version with all bug fixed. Need to polish comments.

Diff:

Revision 5 (+5202 -966)

Show changes

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

bnie
Review request changed

Change Summary:

Polished PatienceDiffer with proper documentation and formatting, removed WIP tag.

Description:

~  

[WIP] Add Patience Differ to diff viewer.

  ~

Review Board uses a version of the Meyers Diff algorithm for the diff viewer, which is pretty standard and widely-used. However, unclear diff might be reported under some scenarios. For example, when user switched order of a big chunk of code with each other, meyer diff will see it as a change line by line instead of being aware that the whole block is switched. Patience differ tackles issues like these by scanning the whole file and actively seeking out interesting lines before applying diff while sacrificing some performances.

  +
  +

This is a backend feature and it has not been updated to be the default method of diffing. To test this feature, you could go to reviewboard/diffviewer/differ.py, and update line 26 to DEFAULT = PATIENCE.

Testing Done:

~  

Review Board uses a version of the Meyers Diff algorithm for the diff viewer, which is pretty standard and widely-used. However, unclear diff might be reported under some scenarios. For example, when user switched order of a big chunk of code with each other, meyer diff will see it as a change line by line instead of being aware that the whole block is switched. Patience differ tackles issues like these by scanning the whole file and actively seeking out interesting lines before applying diff while sacrificing some performances.

~  
~  

This is a WIP request with all of the work I've done so far checked in, including two jupyter notebooks (patience_diff_v2 is the most updated one), some sample files for diff tests, and a py file containing my attempt to integrate patiencediff into RB.

  ~

Unit tests are created in reviewboard/diffviewer/tests/test_patiencediff.py.

  ~ You can run the test by ./tests/runtests.py reviewboard.diffviewer.tests.test_patiencediff:PatienceDifferTest.
  ~ All unit tests have passed.

Commits:

Summary
-
[WIP] Add Patience Differ to diff viewer.
-
Check in work done so far that will not be part of Review Board:
-
Added two more parameter to indicate start index of a and b file, and modified
-
Updated patiencediff to call myers with no errors, added new test cases to check if the correct result is returned, so far the tests do not pass.
-
Update patience differ test cases and added patience diff as one new differ option in differ class.
-
Removed unecessary tabs and spaces in files
-
Updated PatienceDiffer tests and added new functions to pre-process code blocks
-
Check in final version with all bug fixed. Need to polish comments.
+
Added two more parameter to indicate start index of a and b file, and modified
+
Added patience differ as a new option in differ class.
+
Added patience diff unit tests.
+
Added PatienceDiffer to Diff Viewer.

Diff:

Revision 6 (+1100 -14)

Show changes

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

bnie
Review request changed

Summary:

-[WIP] Add Patience Differ to diff viewer.
+Add Patience Differ to diff viewer.

Commits:

Summary
-
Added two more parameter to indicate start index of a and b file, and modified
-
Added patience differ as a new option in differ class.
-
Added patience diff unit tests.
-
Added PatienceDiffer to Diff Viewer.
+
Added two more parameter to indicate start index of a and b file, and modified
+
Added patience differ as a new option in differ class.
+
Added patience diff unit tests.
+
Added PatienceDiffer to Diff Viewer.
+
Fixed issues flagged by flake8

Groups:

+students

Diff:

Revision 7 (+1123 -39)

Show changes

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

bnie
chipx86
  1. This is exciting! I want to play with it.

    There's a lot of comments. Some are repeats of previous ones. There's really two categories of comments here:

    1. Code/doc guideline inconsistencies
    2. Performance improvements (small things that add up in diff logic, like repeated attribute accesses or lookups by index)
  2. Can you adjust the Description and Testing Done to be <= 76 columns?

  3. It looks like your changes for the start indexes wound up in here. Can you rebase on top of that and post the range for these commits only?

  4. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    Can you add a file-level docstring above this? (Blank line separating it from this import.)

  5. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     

    No blank line here.

  6. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    This needs to subclass object.

    Also, this should probably be outside of the PatienceDiffer class.

    1. In this case, I left the Slice class inside because it defines a specific structure patience differ uses to portion code blocks. Should I keep this inside of Patience Differ then? What's a good way of telling if I should leave a class inside or outside of another class?

    2. The patience differ owns this entire file, so there's no need to nest the class in further. Anything in this file is related to the diff algorithm.

      Having the class outside of it helps with maintainability. It's not nested as deep and it reduces the footprint of PatienceDiffer, which are two wins for that. Generally, there's rarely a good reason to nest classes in Python, and once you begin to do that to keep a class more clearly bound to another class, it's probably time to move both into a new module.

  7. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     

    The trailing """ can go on the first line.

  8. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     

    Blank line between a class-level docstring and the body.

  9. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    This is missing a docstring describing the arguments.

  10. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     
     
     
     
     
     
     
     

    These would all go into an Attributes: section in the class's docstring (to deal with parsability issues).

  11. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    "Return whether ..."

    Same below.

  12. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    Type goes on its own line.

    It also needs to be bool (casing matters).

    Same comments below.

  13. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     

    This can be a single statement, since the conditional evaluates to true or false.

    Same below.

  14. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    How about: "Return a string representation of this slice."

  15. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    The type must be unicode

  16. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     

    So normally, this should be formatted like:

    return (
      'A: %s\n'
      'B: %s\n'
      '%s'
      % ((self.a_low, self.a_high),
         (self.b_low, self.b_high),
         '-' * 79)
    )
    

    That's faster and separates out the structure of the string from the variables going into it.

    Now that said, this feels really debuggy to me, and not something that ultimately needs to remain in here.

  17. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     

    If this doesn't need to perform any logic of its own, you can omit it.

  18. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     

    Summary docstrings cannot wrap.

    How about:

    """Return an opcode generator for the contents of the diff.
    
  19. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    This should just be tuple, with the contents described in the description.

  20. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    For readability, can you use keyword arguments? That'll help convey what's going into this class without having to jump up to the definition.

  21. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     

    There should be a blank line separating a statement from a block on the same indentation level, but not on a new nested block. So this should look like:

    opcodes = []
    
    for slice in matched_slices:
        ...
    

    Also of note: slice is a reserved word in Python, so you'll need to rethink the variable.

  22. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    Let's store as a tuple. It'll take less memory and will be faster to operate on, which is preferred here.

    1. I am changing the value of the elements in this list in merge_opcodes():

      current_opcode = opcodes[i]
                      while i + 1 != op_size and opcodes[i][0] == opcodes[i + 1][0]:
                          current_opcode[2] = opcodes[i + 1][2]
                          current_opcode[4] = opcodes[i + 1][4]
                          i += 1
      

      If we change the opcode to a tuple, we have to convert it back here.
      But we only need to convert probably 60% of all the opcodes.
      I am not sure which way will be more efficient:
      1. Append as a tuple, convert back 60% later.
      2. Append as a list.

    2. It's technically faster to keep it a tuple. They're more compact behind the scenes, easy to rebuild, and don't require some of the operator-based invocations needed with setting list indexes.

      As proof:

      $ python3 -m timeit 'a = ["insert", 2, 3, 4, 5]; a[1] = 3; a[4] = 5;'
      10000000 loops, best of 3: 0.152 usec per loop
      
      $ python3 -m timeit 'a = ("insert", 2, 3, 4, 5); a = (a[0], 3, a[2], a[3], 5)'
      10000000 loops, best of 3: 0.0881 usec per loop
      
  23. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    "Differ" -> "Diff"

  24. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     

    The description must be on the next line, indented 4.

    Rather than describe the opcode format, you can just say that each entry is a standard opcode tuple. For diff viewer work, this will be a standard format.

  25. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     

    Same as above.

  26. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     

    Blank line between these.

  27. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    So ultimately, what we'd want to do is set up a deeper generator pattern for the Myers work and the merging, so that we don't have to take in opcodes up-front, iterate through, store, and then re-iterate through for merging (and then again for yielding).

    Consider the following a future TODO, since we're at the end of the semester, for either yourself if you're interested or for us.

    The approach I'd use would be to iterate through the opcodes, one-by-one, storing the previous opcode. On each opcode, check if it should be merged into the previous. If yes, merge. If no, yield the previous, and reset merge state for the current one. So something like:

    last_opcode = next(opcodes)
    
    for opcode in opcodes:
        if opcode[0] == last_opcode[0]:
            last_opcode = (tag,
                           last_opcode[1],
                           last_opcode[2] + opcode[2],
                           last_opcode[4] + opcode[4])
        else:
            yield last_opcode
            last_opcode = opcode
    
    yield last_opcode
    

    Untested, but something like that.

    What you can then do is have get_opcodes() call a function that loops through the slices and does the Myers work (yielding results from its own get_opcodes(), rather than storing), and pass that result directly into merge_opcodes.

    That then sets up the fastest route possible for getting any opcodes from Myers work into the merge and to the caller, with no storage (beyond the last_opcode).

  28. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    As mentioned above, slice is a reserved word.

  29. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    "Return all ..."

  30. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    Description on the next line.

  31. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     

    Description gets its own lines.

  32. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    Because you use range, you'll need to a six.moves import for range. Grep in the codebase for import range and you'll find it.

    The reason for this is that Python 2's range builds and returns a list, whereas Python 3 uses a generator. We want to be consistent and fast.

  33. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    Bcause we use self.a and self.b a lot, let's pull these out into local variables to avoid the attribute lookup on each use.

    Similarly, since we use a[i] repeatedly, pull that out as well.

    This helps keep diff logic fast.

    Same goes for b below.

  34. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     

    A couple notes here:

    1. When we have an if/else, it's a bit easier mentally work with if the first condition is the positive. So, if .. in would be better than if .. not in.
    2. We probably don't need position. We can reference i instead.
  35. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    Comments need to always be proper sentences, so this should have a period.

  36. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    It doesn't look like we use this for anything else. Can we just reference i?

  37. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     

    No blank line here.

  38. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     
     
     

    Let's avoid all the lookups and instead iterate using six.iteritems, which will give us keys and values.

  39. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     

    This can be one statement.

  40. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     

    This must be a single line.

  41. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     
     
     
     
     
     

    Same comments as above regarding the formatting.

  42. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     

    No blank line here.

  43. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    Rather than separate elif lines with comments separating them, put the comments inside the nested block. That will keep the conditionals clearly together, and provide an opportunity to better describe each of the conditions.

  44. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     

    No blank line at the start of a nested block.

  45. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    Must use sentence casing.

  46. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     

    Rather than have to compare to stacks[-1] each time, let's fetch the value of stacks[-1] once and then compare against that.

    Also, do we need to continue if we find a match?

  47. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    Rather than single-line code, which is harder to reason about and maintain, can you split this up as:

    prev = [
        match
        for match in directed_matches
        if match[0] == last[1][0]
    ]
    

    Also, last[1][0] has to be fetched every single time, so let's pull that out before we do this, and compare against that value.

    Same comments apply below.

    1. The [0] is actually outside of that list comprehension expression:

      prev = [
          match
          for match in directed_matches
          if match[0] == last[1]
      ][0]
      

      What's the best way for formatting if this is the case?

    2. Ah, see, all the more reason to break that into multiple lines :)

      Since we only want the first, we don't actually want to buid up a whole list of items. A better approach would be:

      prev = next(
          match
          for match in directed_matches
          if match[0] == last[1]
      )
      

      By using a generator expression instead of a list comprehension, we'll only execute until the first result.

      Now, if it's at all possible for this to not find a match, it'll be important to try/except this, catching a StopIteration.

  48. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    while True:

  49. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    This needs sentence casing.

  50. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    We can just check if not prev here, which will be faster (as it uses a different operator).

  51. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     

    No blank line at the start of a nested block.

  52. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     
     
     
     
     
     
     
     

    Same doc comments as above.

    I'll let you handle the rest :)

  53. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    Rather than the shorthand, can you set these individually? It's more clear and maintainable that way.

  54. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     

    Blank line between these.

  55. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    Rather than look up match[1] for each iteration, let's pull it out above.

  56. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     

    Rather than looking up matched_slices[-1] each time, let's pull it out into a last_matched_slice variable and access that.

  57. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     

    Blank line before a statement opening a new block.

  58. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     

    Can you use parens? Like:

    while (not slice.is_empty() and
           self.a[slice.a_low] == self.b[slice.b_low]):
    

    Also, let's fetch self.a and self.b to avoid the attribute lookups each iteration.

  59. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     
     
     
     
     
     

    With this conditional, we're returning slice either way. This code can be cleaned up by doing:

    if old_a_low != slice.a_low:
        matched_slices.append(Slice(...))
    
    return slice
    
  60. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     
     
     

    This can be one statement.

  61. reviewboard/diffviewer/patiencediff.py (Diff revision 8)
     
     

    Much of my comments in match_head apply here as well.

  62. Can you add a file-level docstring?

  63. "PatienceDiffer"

  64. """ goes on the next line.

  65. Can you add a docstring for this?

  66. 
      
bnie
keanweng
  1. 
      
  2. Didn't see any issues other than the ones Christian mentioned. Checked out the patch on my local machine and all 7 unit tests passed.
    I like how you included the shortcut for testing your unit tests, made it alot easier.

  3. 
      
bnie
Review request changed

Change Summary:

Addressed comment from Christian.

Commits:

Summary
-
Added two more parameter to indicate start index of a and b file, and modified
-
Added patience differ as a new option in differ class.
-
Added patience diff unit tests.
-
Added PatienceDiffer to Diff Viewer.
-
Fixed issues flagged by flake8
+
Added patience differ as a new option in differ class.
+
Added patience diff unit tests.
+
Added PatienceDiffer to Diff Viewer.
+
Fixed issues flagged by flake8

Diff:

Revision 9 (+1249 -223)

Show changes

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

bnie
Review request changed

Commits:

Summary
-
Added patience differ as a new option in differ class.
-
Added patience diff unit tests.
-
Added PatienceDiffer to Diff Viewer.
-
Fixed issues flagged by flake8
+
Added patience differ as a new option in differ class.
+
Added patience diff unit tests.
+
Added PatienceDiffer to Diff Viewer.
+
Fixed issues flagged by flake8

Diff:

Revision 10 (+1249 -223)

Show changes

Checks run (2 succeeded)

flake8 passed.
JSHint passed.
Loading...