Reduce diff storage by hashing diff uploads

Review Request #2618 — Created Sept. 25, 2011 and submitted

Information

Review Board

Reviewers

Reduce diff storage by hashing diff uploads

Diff files that already exist in the database will no longer be double-stored. Diffs are now hashed on upload and correspond to a new hash->binary table.
Existing table data is not hashed and will remain for backwards compatibility. Evolutions have been made to create the new table and rename existing fields, so that model-logic can override fields. Test data has also been modified for new field name compatibility.
Ran fill-database and all data was hashed accordingly. Manually posted reviews and posted parent diffs. Created a unit test to check if hashes match.
Description From Last Updated

These are both built-in Python modules, so no blank line.

chipx86chipx86

I'm not really sure "Binary" is quite the term we want. FileDiffData might be better. Then "hash" and "data" fields.

chipx86chipx86

Docstrings are in the format of: """One-line summary Option multi-line description. """ It should always be a proper sentence, punctuation …

chipx86chipx86

Alignment problem. There's no need for \, since the parens take care of it.

chipx86chipx86

Same here.

chipx86chipx86

Blank line between these. A comment describing the next block of code should be separated. Also, comments should have proper …

chipx86chipx86

This can be: self.diff_hash, is_new = \ FileDiffBinary.objects.get_or_create(binary_hash=hashkey, defaults={ 'binary': diff })

chipx86chipx86

Same logic can be applied here for the get_or_create.

chipx86chipx86

I?think that there should a corresponding py file for that?

ZH zhenli

For consistencies sake, can we call that "kwargs"?

mike_conleymike_conley

Space after the if block

mike_conleymike_conley

Managers go in managers.py.

chipx86chipx86

It'd be nice to pull defaults out, so you don't repeat yourself (and also, don't assume defaults is set): defaults …

chipx86chipx86

Space before \

chipx86chipx86

Put the comment inside the else:

chipx86chipx86

Two blank lines here

daviddavid

Imports should be alphabetized.

daviddavid
DD
chipx86
  1. Wow, you got that working fast! I have some comments. Mostly style issues, a couple better ways to use Django in some cases, and a name change. I'm really happy to see this though :)
  2. reviewboard/diffviewer/models.py (Diff revision 1)
     
     
     
     
    Show all issues
    These are both built-in Python modules, so no blank line.
  3. reviewboard/diffviewer/models.py (Diff revision 1)
     
     
     
     
     
     
     
     
    Show all issues
    I'm not really sure "Binary" is quite the term we want.
    
    FileDiffData might be better. Then "hash" and "data" fields.
  4. reviewboard/diffviewer/models.py (Diff revision 1)
     
     
     
     
    Show all issues
    Docstrings are in the format of:
    
    """One-line summary
    
    Option multi-line description.
    """
    
    It should always be a proper sentence, punctuation and all.
    1. I think these days, for multi-line ones, the first line can safely go on the line after """. Single-line docstrings should have """ and the text all on the same line, though.
  5. reviewboard/diffviewer/models.py (Diff revision 1)
     
     
     
    Show all issues
    Alignment problem.
    
    There's no need for \, since the parens take care of it.
  6. reviewboard/diffviewer/models.py (Diff revision 1)
     
     
     
    Show all issues
    Same here.
  7. reviewboard/diffviewer/models.py (Diff revision 1)
     
     
     
    Show all issues
    Blank line between these. A comment describing the next block of code should be separated.
    
    Also, comments should have proper punctuation (period at the end).
  8. reviewboard/diffviewer/models.py (Diff revision 1)
     
     
     
     
     
     
     
    Show all issues
    This can be:
    
    self.diff_hash, is_new = \
        FileDiffBinary.objects.get_or_create(binary_hash=hashkey,
                                             defaults={
                                                 'binary': diff
                                             })
  9. reviewboard/diffviewer/models.py (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
    Show all issues
    Same logic can be applied here for the get_or_create.
  10. 
      
ZH
  1. 
      
  2. I think it might be better to call it “add_diff_hash".
  3. 
      
ZH
  1. 
      
  2. Looks like, after the first line, the who block is identical, but still marked by ReviewBoard as different. Is it a bug?
    1. Nope, it's just one really long line.
  3. 
      
DD
ZH
  1. 
      
  2. Show all issues
    I?think that there should a corresponding py file for that?
    1. Good catch, Jacob. I forgot to add the file to my commit.
  3. 
      
DD
DD
DD
mike_conley
  1. David:
    
    This looks really good!  Just the two tiny nits I found below.  Once those are fixed, I'll happily give this my ship-it.
    
    Thanks,
    
    -Mike
  2. reviewboard/diffviewer/models.py (Diff revision 5)
     
     
    Show all issues
    For consistencies sake, can we call that "kwargs"?
  3. reviewboard/diffviewer/models.py (Diff revision 5)
     
     
    Show all issues
    Space after the if block
    1. And by "space", I of course mean "new line".
  4. 
      
DD
mike_conley
  1. This looks good to me!
  2. 
      
chipx86
  1. Couple small things.
    
    Do unit tests still run?
  2. reviewboard/diffviewer/models.py (Diff revision 6)
     
     
    Show all issues
    Managers go in managers.py.
  3. reviewboard/diffviewer/models.py (Diff revision 6)
     
     
     
     
    Show all issues
    It'd be nice to pull defaults out, so you don't repeat yourself (and also, don't assume defaults is set):
    
    defaults = kwargs.get('defaults', {})
    
    if defaults and defaults['binary']:
        defaults['binary'] = ...
  4. reviewboard/diffviewer/models.py (Diff revision 6)
     
     
    Show all issues
    Space before \
  5. reviewboard/diffviewer/models.py (Diff revision 6)
     
     
     
    Show all issues
    Put the comment inside the else:
  6. 
      
DD
DD
david
  1. Two trivial changes left:
  2. reviewboard/diffviewer/managers.py (Diff revision 7)
     
     
     
     
    Show all issues
    Two blank lines here
  3. reviewboard/diffviewer/models.py (Diff revision 7)
     
     
     
    Show all issues
    Imports should be alphabetized.
  4. 
      
DD
chipx86
  1. Ship It!
  2. 
      
DD
Review request changed
Status:
Completed
Change Summary:
Committed to master (8bfea51)