• 
      

    Better manage memory usage of condensediffs.

    Review Request #5870 — Created May 24, 2014 and submitted

    Information

    Review Board
    release-2.0.x
    eb69fb2...

    Reviewers

    condensediffs had some memory issues. Even though we were using
    QuerySet.iterator, it seems memory was still increasing far too much.

    To work around this, we're using a couple tricks that we used in
    loaddb and dumpdb. We're only operating on batches of 200 diffs at a
    time, and are resetting queries and garbage-collecting after each batch.

    The query resets shouldn't impact production installs, since DEBUG
    should be False, but it's a precaution. We're also forcing DEBUG to be
    False in the management command as well.

    With these changes, memory still increases over time, to a degree, but
    seems to stabilize. This is with a sample set of over 8200 diffs.

    Added some debug information to check process memory usage before and after.

    Before, memory was steadily rising in usage (even with DEBUG = False).

    After, memory rose for a bit and then stayed pretty steady, without growing
    unexpectedly large.

    Description From Last Updated

    I don't think this does the right thing (it seems to iterate over items 0-199 over and over).

    daviddavid
    reviewbot
    1. This is a review from Review Bot.
        Tool: PEP8 Style Checker
        Processed Files:
          reviewboard/diffviewer/management/commands/condensediffs.py
          reviewboard/diffviewer/managers.py
        Ignored Files:
      
      
    2. 
        
    reviewbot
    1. This is a review from Review Bot.
        Tool: Pyflakes
        Processed Files:
          reviewboard/diffviewer/management/commands/condensediffs.py
          reviewboard/diffviewer/managers.py
        Ignored Files:
      
      
    2. 
        
    david
    1. 
        
    2. reviewboard/diffviewer/managers.py (Diff revision 1)
       
       
      Show all issues

      I don't think this does the right thing (it seems to iterate over items 0-199 over and over).

      1. The queryset covers unmigrated diffs. Every time we do the .all()[:OBJECT_LIMIT], it re-evaluates, getting the first 200 unmigrated diffs. Since the last grouping have all been migrated, the first group of new unmigrated diffs will start at index 0. Running condensediffs a second time says that no diffs remain unmigrated.

        I actually was doing .all()[j:j + OBJECT_LIMIT] at first, and saw that it was leaving things unmigrated and doing queries with 0 results, which is what made me realize what was happening there.

      2. Since we both made the same mistake, I think this deserves a comment. You might also change filediffs to be something like unmigrated_filediffs for clarity.

    3. 
        
    chipx86
    reviewbot
    1. This is a review from Review Bot.
        Tool: PEP8 Style Checker
        Processed Files:
          reviewboard/diffviewer/management/commands/condensediffs.py
          reviewboard/diffviewer/managers.py
        Ignored Files:
      
      
    2. 
        
    reviewbot
    1. This is a review from Review Bot.
        Tool: Pyflakes
        Processed Files:
          reviewboard/diffviewer/management/commands/condensediffs.py
          reviewboard/diffviewer/managers.py
        Ignored Files:
      
      
    2. 
        
    david
    1. Ship It!

    2. 
        
    chipx86
    Review request changed
    Status:
    Completed
    Change Summary:
    Pushed to release-2.0.x (84d17a3)