• 
      

    Reduce diff storage by hashing diff uploads

    Review Request #2618 — Created Sept. 25, 2011 and submitted

    Information

    Review Board

    Reviewers

    Reduce diff storage by hashing diff uploads
    
    Diff files that already exist in the database will no longer be double-stored. Diffs are now hashed on upload and correspond to a new hash->binary table.
    Existing table data is not hashed and will remain for backwards compatibility. Evolutions have been made to create the new table and rename existing fields, so that model-logic can override fields. Test data has also been modified for new field name compatibility.
    Ran fill-database and all data was hashed accordingly. Manually posted reviews and posted parent diffs. Created a unit test to check if hashes match.
    Description From Last Updated

    These are both built-in Python modules, so no blank line.

    chipx86chipx86

    I'm not really sure "Binary" is quite the term we want. FileDiffData might be better. Then "hash" and "data" fields.

    chipx86chipx86

    Docstrings are in the format of: """One-line summary Option multi-line description. """ It should always be a proper sentence, punctuation …

    chipx86chipx86

    Alignment problem. There's no need for \, since the parens take care of it.

    chipx86chipx86

    Same here.

    chipx86chipx86

    Blank line between these. A comment describing the next block of code should be separated. Also, comments should have proper …

    chipx86chipx86

    This can be: self.diff_hash, is_new = \ FileDiffBinary.objects.get_or_create(binary_hash=hashkey, defaults={ 'binary': diff })

    chipx86chipx86

    Same logic can be applied here for the get_or_create.

    chipx86chipx86

    I?think that there should a corresponding py file for that?

    ZH zhenli

    For consistencies sake, can we call that "kwargs"?

    mike_conleymike_conley

    Space after the if block

    mike_conleymike_conley

    Managers go in managers.py.

    chipx86chipx86

    It'd be nice to pull defaults out, so you don't repeat yourself (and also, don't assume defaults is set): defaults …

    chipx86chipx86

    Space before \

    chipx86chipx86

    Put the comment inside the else:

    chipx86chipx86

    Two blank lines here

    daviddavid

    Imports should be alphabetized.

    daviddavid
    DD
    chipx86
    1. Wow, you got that working fast! I have some comments. Mostly style issues, a couple better ways to use Django in some cases, and a name change. I'm really happy to see this though :)
    2. reviewboard/diffviewer/models.py (Diff revision 1)
       
       
       
       
      Show all issues
      These are both built-in Python modules, so no blank line.
    3. reviewboard/diffviewer/models.py (Diff revision 1)
       
       
       
       
       
       
       
       
      Show all issues
      I'm not really sure "Binary" is quite the term we want.
      
      FileDiffData might be better. Then "hash" and "data" fields.
    4. reviewboard/diffviewer/models.py (Diff revision 1)
       
       
       
       
      Show all issues
      Docstrings are in the format of:
      
      """One-line summary
      
      Option multi-line description.
      """
      
      It should always be a proper sentence, punctuation and all.
      1. I think these days, for multi-line ones, the first line can safely go on the line after """. Single-line docstrings should have """ and the text all on the same line, though.
    5. reviewboard/diffviewer/models.py (Diff revision 1)
       
       
       
      Show all issues
      Alignment problem.
      
      There's no need for \, since the parens take care of it.
    6. reviewboard/diffviewer/models.py (Diff revision 1)
       
       
       
      Show all issues
      Same here.
    7. reviewboard/diffviewer/models.py (Diff revision 1)
       
       
       
      Show all issues
      Blank line between these. A comment describing the next block of code should be separated.
      
      Also, comments should have proper punctuation (period at the end).
    8. reviewboard/diffviewer/models.py (Diff revision 1)
       
       
       
       
       
       
       
      Show all issues
      This can be:
      
      self.diff_hash, is_new = \
          FileDiffBinary.objects.get_or_create(binary_hash=hashkey,
                                               defaults={
                                                   'binary': diff
                                               })
    9. reviewboard/diffviewer/models.py (Diff revision 1)
       
       
       
       
       
       
       
       
       
       
      Show all issues
      Same logic can be applied here for the get_or_create.
    10. 
        
    ZH
    1. 
        
    2. I think it might be better to call it “add_diff_hash".
    3. 
        
    ZH
    1. 
        
    2. Looks like, after the first line, the who block is identical, but still marked by ReviewBoard as different. Is it a bug?
      1. Nope, it's just one really long line.
    3. 
        
    DD
    ZH
    1. 
        
    2. Show all issues
      I?think that there should a corresponding py file for that?
      1. Good catch, Jacob. I forgot to add the file to my commit.
    3. 
        
    DD
    DD
    DD
    mike_conley
    1. David:
      
      This looks really good!  Just the two tiny nits I found below.  Once those are fixed, I'll happily give this my ship-it.
      
      Thanks,
      
      -Mike
    2. reviewboard/diffviewer/models.py (Diff revision 5)
       
       
      Show all issues
      For consistencies sake, can we call that "kwargs"?
    3. reviewboard/diffviewer/models.py (Diff revision 5)
       
       
      Show all issues
      Space after the if block
      1. And by "space", I of course mean "new line".
    4. 
        
    DD
    mike_conley
    1. This looks good to me!
    2. 
        
    chipx86
    1. Couple small things.
      
      Do unit tests still run?
    2. reviewboard/diffviewer/models.py (Diff revision 6)
       
       
      Show all issues
      Managers go in managers.py.
    3. reviewboard/diffviewer/models.py (Diff revision 6)
       
       
       
       
      Show all issues
      It'd be nice to pull defaults out, so you don't repeat yourself (and also, don't assume defaults is set):
      
      defaults = kwargs.get('defaults', {})
      
      if defaults and defaults['binary']:
          defaults['binary'] = ...
    4. reviewboard/diffviewer/models.py (Diff revision 6)
       
       
      Show all issues
      Space before \
    5. reviewboard/diffviewer/models.py (Diff revision 6)
       
       
       
      Show all issues
      Put the comment inside the else:
    6. 
        
    DD
    DD
    david
    1. Two trivial changes left:
    2. reviewboard/diffviewer/managers.py (Diff revision 7)
       
       
       
       
      Show all issues
      Two blank lines here
    3. reviewboard/diffviewer/models.py (Diff revision 7)
       
       
       
      Show all issues
      Imports should be alphabetized.
    4. 
        
    DD
    chipx86
    1. Ship It!
    2. 
        
    DD
    Review request changed
    Status:
    Completed
    Change Summary:
    Committed to master (8bfea51)