Better manage memory usage of condensediffs.

Review Request #5870 — Created May 24, 2014 and submitted

Information

Review Board
release-2.0.x
eb69fb2...

Reviewers

condensediffs had some memory issues. Even though we were using
QuerySet.iterator, it seems memory was still increasing far too much.

To work around this, we're using a couple tricks that we used in
loaddb and dumpdb. We're only operating on batches of 200 diffs at a
time, and are resetting queries and garbage-collecting after each batch.

The query resets shouldn't impact production installs, since DEBUG
should be False, but it's a precaution. We're also forcing DEBUG to be
False in the management command as well.

With these changes, memory still increases over time, to a degree, but
seems to stabilize. This is with a sample set of over 8200 diffs.

Added some debug information to check process memory usage before and after.

Before, memory was steadily rising in usage (even with DEBUG = False).

After, memory rose for a bit and then stayed pretty steady, without growing
unexpectedly large.

Description From Last Updated

I don't think this does the right thing (it seems to iterate over items 0-199 over and over).

daviddavid
reviewbot
  1. This is a review from Review Bot.
      Tool: PEP8 Style Checker
      Processed Files:
        reviewboard/diffviewer/management/commands/condensediffs.py
        reviewboard/diffviewer/managers.py
      Ignored Files:
    
    
  2. 
      
reviewbot
  1. This is a review from Review Bot.
      Tool: Pyflakes
      Processed Files:
        reviewboard/diffviewer/management/commands/condensediffs.py
        reviewboard/diffviewer/managers.py
      Ignored Files:
    
    
  2. 
      
david
  1. 
      
  2. reviewboard/diffviewer/managers.py (Diff revision 1)
     
     
    Show all issues

    I don't think this does the right thing (it seems to iterate over items 0-199 over and over).

    1. The queryset covers unmigrated diffs. Every time we do the .all()[:OBJECT_LIMIT], it re-evaluates, getting the first 200 unmigrated diffs. Since the last grouping have all been migrated, the first group of new unmigrated diffs will start at index 0. Running condensediffs a second time says that no diffs remain unmigrated.

      I actually was doing .all()[j:j + OBJECT_LIMIT] at first, and saw that it was leaving things unmigrated and doing queries with 0 results, which is what made me realize what was happening there.

    2. Since we both made the same mistake, I think this deserves a comment. You might also change filediffs to be something like unmigrated_filediffs for clarity.

  3. 
      
chipx86
reviewbot
  1. This is a review from Review Bot.
      Tool: PEP8 Style Checker
      Processed Files:
        reviewboard/diffviewer/management/commands/condensediffs.py
        reviewboard/diffviewer/managers.py
      Ignored Files:
    
    
  2. 
      
reviewbot
  1. This is a review from Review Bot.
      Tool: Pyflakes
      Processed Files:
        reviewboard/diffviewer/management/commands/condensediffs.py
        reviewboard/diffviewer/managers.py
      Ignored Files:
    
    
  2. 
      
david
  1. Ship It!

  2. 
      
chipx86
Review request changed
Status:
Completed
Change Summary:
Pushed to release-2.0.x (84d17a3)