Reduce diff storage by hashing diff uploads

Review Request #2618 — Created Sept. 25, 2011 and submitted

Information

Review Board

Reviewers

Reduce diff storage by hashing diff uploads

Diff files that already exist in the database will no longer be double-stored. Diffs are now hashed on upload and correspond to a new hash->binary table.
Existing table data is not hashed and will remain for backwards compatibility. Evolutions have been made to create the new table and rename existing fields, so that model-logic can override fields. Test data has also been modified for new field name compatibility.
Ran fill-database and all data was hashed accordingly. Manually posted reviews and posted parent diffs. Created a unit test to check if hashes match.
Description From Last Updated

These are both built-in Python modules, so no blank line.

chipx86chipx86

I'm not really sure "Binary" is quite the term we want. FileDiffData might be better. Then "hash" and "data" fields.

chipx86chipx86

Docstrings are in the format of: """One-line summary Option multi-line description. """ It should always be a proper sentence, punctuation …

chipx86chipx86

Alignment problem. There's no need for \, since the parens take care of it.

chipx86chipx86

Same here.

chipx86chipx86

Blank line between these. A comment describing the next block of code should be separated. Also, comments should have proper …

chipx86chipx86

This can be: self.diff_hash, is_new = \ FileDiffBinary.objects.get_or_create(binary_hash=hashkey, defaults={ 'binary': diff })

chipx86chipx86

Same logic can be applied here for the get_or_create.

chipx86chipx86

I?think that there should a corresponding py file for that?

ZH zhenli

For consistencies sake, can we call that "kwargs"?

mike_conleymike_conley

Space after the if block

mike_conleymike_conley

Managers go in managers.py.

chipx86chipx86

It'd be nice to pull defaults out, so you don't repeat yourself (and also, don't assume defaults is set): defaults …

chipx86chipx86

Space before \

chipx86chipx86

Put the comment inside the else:

chipx86chipx86

Two blank lines here

daviddavid

Imports should be alphabetized.

daviddavid
DD
chipx86
  1. Wow, you got that working fast! I have some comments. Mostly style issues, a couple better ways to use Django in some cases, and a name change. I'm really happy to see this though :)
  2. reviewboard/diffviewer/models.py (Diff revision 1)
     
     
     
     
    These are both built-in Python modules, so no blank line.
  3. reviewboard/diffviewer/models.py (Diff revision 1)
     
     
     
     
     
     
     
     
    I'm not really sure "Binary" is quite the term we want.
    
    FileDiffData might be better. Then "hash" and "data" fields.
  4. reviewboard/diffviewer/models.py (Diff revision 1)
     
     
     
     
    Docstrings are in the format of:
    
    """One-line summary
    
    Option multi-line description.
    """
    
    It should always be a proper sentence, punctuation and all.
    1. I think these days, for multi-line ones, the first line can safely go on the line after """. Single-line docstrings should have """ and the text all on the same line, though.
  5. reviewboard/diffviewer/models.py (Diff revision 1)
     
     
     
    Alignment problem.
    
    There's no need for \, since the parens take care of it.
  6. reviewboard/diffviewer/models.py (Diff revision 1)
     
     
     
    Same here.
  7. reviewboard/diffviewer/models.py (Diff revision 1)
     
     
     
    Blank line between these. A comment describing the next block of code should be separated.
    
    Also, comments should have proper punctuation (period at the end).
  8. reviewboard/diffviewer/models.py (Diff revision 1)
     
     
     
     
     
     
     
    This can be:
    
    self.diff_hash, is_new = \
        FileDiffBinary.objects.get_or_create(binary_hash=hashkey,
                                             defaults={
                                                 'binary': diff
                                             })
  9. reviewboard/diffviewer/models.py (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
    Same logic can be applied here for the get_or_create.
  10. 
      
ZH
  1. 
      
  2. I think it might be better to call it “add_diff_hash".
  3. 
      
ZH
  1. 
      
  2. Looks like, after the first line, the who block is identical, but still marked by ReviewBoard as different. Is it a bug?
    1. Nope, it's just one really long line.
  3. 
      
DD
ZH
  1. 
      
  2. I?think that there should a corresponding py file for that?
    1. Good catch, Jacob. I forgot to add the file to my commit.
  3. 
      
DD
DD
DD
mike_conley
  1. David:
    
    This looks really good!  Just the two tiny nits I found below.  Once those are fixed, I'll happily give this my ship-it.
    
    Thanks,
    
    -Mike
  2. reviewboard/diffviewer/models.py (Diff revision 5)
     
     
    For consistencies sake, can we call that "kwargs"?
  3. reviewboard/diffviewer/models.py (Diff revision 5)
     
     
    Space after the if block
    1. And by "space", I of course mean "new line".
  4. 
      
DD
mike_conley
  1. This looks good to me!
  2. 
      
chipx86
  1. Couple small things.
    
    Do unit tests still run?
  2. reviewboard/diffviewer/models.py (Diff revision 6)
     
     
    Managers go in managers.py.
  3. reviewboard/diffviewer/models.py (Diff revision 6)
     
     
     
     
    It'd be nice to pull defaults out, so you don't repeat yourself (and also, don't assume defaults is set):
    
    defaults = kwargs.get('defaults', {})
    
    if defaults and defaults['binary']:
        defaults['binary'] = ...
  4. reviewboard/diffviewer/models.py (Diff revision 6)
     
     
    Space before \
  5. reviewboard/diffviewer/models.py (Diff revision 6)
     
     
     
    Put the comment inside the else:
  6. 
      
DD
DD
david
  1. Two trivial changes left:
  2. reviewboard/diffviewer/managers.py (Diff revision 7)
     
     
     
     
    Two blank lines here
  3. reviewboard/diffviewer/models.py (Diff revision 7)
     
     
     
    Imports should be alphabetized.
  4. 
      
DD
chipx86
  1. Ship It!
  2. 
      
DD
Review request changed

Status: Closed (submitted)

Change Summary:

Committed to master (8bfea51)
Loading...