Summary

Reduce diff storage by hashing diff uploads

Review Request #2618 — Created Sept. 25, 2011 and submitted Nov. 6, 2011, 4:06 a.m.

Information

Owner

ddruska

Repository

Review Board

Branch

Bugs

Depends On

Reviewers

Groups

reviewboard

People

Description

Reduce diff storage by hashing diff uploads

Diff files that already exist in the database will no longer be double-stored. Diffs are now hashed on upload and correspond to a new hash->binary table.
Existing table data is not hashed and will remain for backwards compatibility. Evolutions have been made to create the new table and rename existing fields, so that model-logic can override fields. Test data has also been modified for new field name compatibility.

Testing Done

Ran fill-database and all data was hashed accordingly. Manually posted reviews and posted parent diffs. Created a unit test to check if hashes match.

Issues

Description	From	Last Updated
These are both built-in Python modules, so no blank line.	chipx86	Sept. 29, 2011, 9:25 a.m.
I'm not really sure "Binary" is quite the term we want. FileDiffData might be better. Then "hash" and "data" fields.	chipx86	Sept. 29, 2011, 9:25 a.m.
Docstrings are in the format of: """One-line summary Option multi-line description. """ It should always be a proper sentence, punctuation …	chipx86	Sept. 29, 2011, 9:25 a.m.
Alignment problem. There's no need for \, since the parens take care of it.	chipx86	Sept. 29, 2011, 9:25 a.m.
Same here.	chipx86	Sept. 29, 2011, 9:25 a.m.
Blank line between these. A comment describing the next block of code should be separated. Also, comments should have proper …	chipx86	Sept. 29, 2011, 9:25 a.m.
This can be: self.diff_hash, is_new = \ FileDiffBinary.objects.get_or_create(binary_hash=hashkey, defaults={ 'binary': diff })	chipx86	Sept. 29, 2011, 9:25 a.m.
Same logic can be applied here for the get_or_create.	chipx86	Sept. 29, 2011, 9:25 a.m.
I?think that there should a corresponding py file for that?	ZH zhenli	Oct. 2, 2011, 4:58 a.m.
For consistencies sake, can we call that "kwargs"?	mike_conley	Oct. 12, 2011, 1:27 p.m.
Space after the if block	mike_conley	Oct. 12, 2011, 1:27 p.m.
Managers go in managers.py.	chipx86	Oct. 17, 2011, 9:33 a.m.
It'd be nice to pull defaults out, so you don't repeat yourself (and also, don't assume defaults is set): defaults …	chipx86	Oct. 17, 2011, 9:33 a.m.
Space before \	chipx86	Oct. 17, 2011, 9:34 a.m.
Put the comment inside the else:	chipx86	Oct. 17, 2011, 9:34 a.m.
Two blank lines here	david	Oct. 30, 2011, 7:54 a.m.
Imports should be alphabetized.	david	Oct. 30, 2011, 7:54 a.m.

Testing Done:: +
Ran existing unit tests. These only test the backwards compatibility though, and don't test parent diffs.
+ Ran fill-database and all data was hashed accordingly. Manually posted reviews and posted parent diffs. All tests passed.

Wow, you got that working fast! I have some comments. Mostly style issues, a couple better ways to use Django in some cases, and a name change. I'm really happy to see this though :)

reviewboard/diffviewer/models.py (Diff revision 1)
The issue has been resolved. Show all issues
```
These are both built-in Python modules, so no blank line.
```

reviewboard/diffviewer/models.py (Diff revision 1)

The issue has been resolved. Show all issues

I'm not really sure "Binary" is quite the term we want.

FileDiffData might be better. Then "hash" and "data" fields.

reviewboard/diffviewer/models.py (Diff revision 1)

The issue has been resolved. Show all issues

Docstrings are in the format of:

"""One-line summary

Option multi-line description.
"""

It should always be a proper sentence, punctuation and all.

david Sept. 25, 2011, 9:02 a.m.

I think these days, for multi-line ones, the first line can safely go on the line after """. Single-line docstrings should have """ and the text all on the same line, though.

reviewboard/diffviewer/models.py (Diff revision 1)
The issue has been resolved. Show all issues
```
Alignment problem.

There's no need for \, since the parens take care of it.
```
reviewboard/diffviewer/models.py (Diff revision 1)
The issue has been resolved. Show all issues
```
Same here.
```

reviewboard/diffviewer/models.py (Diff revision 1)

The issue has been resolved. Show all issues

Blank line between these. A comment describing the next block of code should be separated.

Also, comments should have proper punctuation (period at the end).

reviewboard/diffviewer/models.py (Diff revision 1)

The issue has been resolved. Show all issues

This can be:

self.diff_hash, is_new = \
    FileDiffBinary.objects.get_or_create(binary_hash=hashkey,
                                         defaults={
                                             'binary': diff
                                         })

reviewboard/diffviewer/models.py (Diff revision 1)

The issue has been resolved. Show all issues

Same logic can be applied here for the get_or_create.

reviewboard/diffviewer/evolutions/__init__.py (Diff revision 1)

I think it might be better to call it “add_diff_hash".

reviewboard/reviews/fixtures/test_reviewrequests.json (Diff revision 1)

Looks like, after the first line, the who block is identical, but still marked by ReviewBoard as different. Is it a bug?

chipx86 Sept. 28, 2011, 10:28 a.m.
```
Nope, it's just one really long line.
```

Change Summary:

Fixed issues pointed out by Christian.

Diff:

Revision 2 (+145 -90)

Show changes

	reviewboard/diffviewer/models.py
	reviewboard/diffviewer/evolutions/__init__.py
	reviewboard/reviews/fixtures/test_reviewrequests.json

reviewboard/diffviewer/evolutions/__init__.py (Diff revision 2)
The issue has been resolved. Show all issues
```
I?think that there should a corresponding py file for that?
```
1. DD
  
  ddruska Oct. 2, 2011, 4:59 a.m.
  Good catch, Jacob. I forgot to add the file to my commit.

Change Summary:

Forgot to add the evolution file add_diff_hash.py

Diff:

Revision 3 (+154 -90)

Show changes

	reviewboard/diffviewer/models.py
	reviewboard/diffviewer/evolutions/__init__.py
	reviewboard/diffviewer/evolutions/add_diff_hash.py
	reviewboard/reviews/fixtures/test_reviewrequests.json

Change Summary:

Removed the AutoKey from FileDiffData and added a custom manager for the Base64Field AutoKey workaround

Diff:

Revision 4 (+361 -134)

Show changes

	docs/releasenotes/rbtools/0.3.4.txt
	docs/releasenotes/rbtools/index.txt
	reviewboard/manage.py
	reviewboard/admin/tests.py
	reviewboard/cmdline/rbsite.py
	reviewboard/diffviewer/models.py
	reviewboard/diffviewer/evolutions/__init__.py
	reviewboard/htdocs/media/rb/css/common.css
	11 more

Change Summary:

Removed the AutoKey from FileDiffData and added a custom manager for the Base64Field AutoKey workaround
(Disregard revision 4)

Diff:

Revision 5 (+168 -91)

Show changes

	reviewboard/diffviewer/models.py
	reviewboard/diffviewer/evolutions/__init__.py
	reviewboard/diffviewer/evolutions/add_diff_hash.py
	reviewboard/reviews/fixtures/test_reviewrequests.json

David:

This looks really good!  Just the two tiny nits I found below.  Once those are fixed, I'll happily give this my ship-it.

Thanks,

-Mike

reviewboard/diffviewer/models.py (Diff revision 5)
The issue has been resolved. Show all issues
```
For consistencies sake, can we call that "kwargs"?
```
reviewboard/diffviewer/models.py (Diff revision 5)
The issue has been resolved. Show all issues
```
Space after the if block
```
1. mike_conley Oct. 8, 2011, 1:38 p.m.
  And by "space", I of course mean "new line".

Change Summary:

Renamed kwds to kwargs, added spacing and formatting

Diff:

Revision 6 (+171 -91)

Show changes

	reviewboard/diffviewer/models.py
	reviewboard/diffviewer/evolutions/__init__.py
	reviewboard/diffviewer/evolutions/add_diff_hash.py
	reviewboard/reviews/fixtures/test_reviewrequests.json

Ship it!

```
This looks good to me!
```

Couple small things.

Do unit tests still run?

reviewboard/diffviewer/models.py (Diff revision 6)
The issue has been resolved. Show all issues
```
Managers go in managers.py.
```

reviewboard/diffviewer/models.py (Diff revision 6)

The issue has been resolved. Show all issues

It'd be nice to pull defaults out, so you don't repeat yourself (and also, don't assume defaults is set):

defaults = kwargs.get('defaults', {})

if defaults and defaults['binary']:
    defaults['binary'] = ...

reviewboard/diffviewer/models.py (Diff revision 6)
The issue has been resolved. Show all issues
```
Space before \
```
reviewboard/diffviewer/models.py (Diff revision 6)
The issue has been resolved. Show all issues
```
Put the comment inside the else:
```

Change Summary:

Moved FileDiffDataManager to managers.py, added a unit test for hashes, cleaned up tests.py with pep8.

Diff:

Revision 7 (+201 -95)

Show changes

	reviewboard/diffviewer/managers.py
	reviewboard/diffviewer/models.py
	reviewboard/diffviewer/tests.py
	reviewboard/diffviewer/evolutions/__init__.py
	reviewboard/diffviewer/evolutions/add_diff_hash.py
	reviewboard/reviews/fixtures/test_reviewrequests.json

Testing Done:: ~
Ran existing unit tests. These only test the backwards compatibility though, and don't test parent diffs.
~
Ran fill-database and all data was hashed accordingly. Manually posted reviews and posted parent diffs. Created a unit test to check if hashes match.
- Ran fill-database and all data was hashed accordingly. Manually posted reviews and posted parent diffs. All tests passed.

Fix it!

```
Two trivial changes left:
```
reviewboard/diffviewer/managers.py (Diff revision 7)
An issue was opened. Show all issues
```
Two blank lines here
```
reviewboard/diffviewer/models.py (Diff revision 7)
An issue was opened. Show all issues
```
Imports should be alphabetized.
```

Diff:

Revision 8 (+202 -95)

Show changes

	reviewboard/diffviewer/managers.py
	reviewboard/diffviewer/models.py
	reviewboard/diffviewer/tests.py
	reviewboard/diffviewer/evolutions/__init__.py
	reviewboard/diffviewer/evolutions/add_diff_hash.py
	reviewboard/reviews/fixtures/test_reviewrequests.json

Ship it!

```
Ship It!
```

Status:: Completed
Change Summary:: Committed to master (8bfea51)

	+	Ran existing unit tests. These only test the backwards compatibility though, and don't test parent diffs.
	+	Ran fill-database and all data was hashed accordingly. Manually posted reviews and posted parent diffs. All tests passed.