Summary

Better manage memory usage of condensediffs.

Review Request #5870 — Created May 24, 2014 and submitted May 25, 2014, 7:35 a.m.

Information

Owner

chipx86

Repository

Review Board

Branch

release-2.0.x

Bugs

Depends On

Commit

eb69fb2...

Reviewers

Groups

reviewboard

People

Description

condensediffs had some memory issues. Even though we were using
QuerySet.iterator, it seems memory was still increasing far too much.

To work around this, we're using a couple tricks that we used in
loaddb and dumpdb. We're only operating on batches of 200 diffs at a
time, and are resetting queries and garbage-collecting after each batch.

The query resets shouldn't impact production installs, since DEBUG
should be False, but it's a precaution. We're also forcing DEBUG to be
False in the management command as well.

With these changes, memory still increases over time, to a degree, but
seems to stabilize. This is with a sample set of over 8200 diffs.

Testing Done

Added some debug information to check process memory usage before and after.

Before, memory was steadily rising in usage (even with DEBUG = False).

After, memory rose for a bit and then stayed pretty steady, without growing
unexpectedly large.

Issues

Description	From	Last Updated
I don't think this does the right thing (it seems to iterate over items 0-199 over and over).	david	May 25, 2014, 5:49 a.m.

This is a review from Review Bot.
  Tool: PEP8 Style Checker
  Processed Files:
    reviewboard/diffviewer/management/commands/condensediffs.py
    reviewboard/diffviewer/managers.py
  Ignored Files:

This is a review from Review Bot.
  Tool: Pyflakes
  Processed Files:
    reviewboard/diffviewer/management/commands/condensediffs.py
    reviewboard/diffviewer/managers.py
  Ignored Files:

reviewboard/diffviewer/managers.py (Diff revision 1)

The issue has been resolved. Show all issues

I don't think this does the right thing (it seems to iterate over items 0-199 over and over).

chipx86

May 25, 2014, 1:33 a.m.

The queryset covers unmigrated diffs. Every time we do the .all()[:OBJECT_LIMIT], it re-evaluates, getting the first 200 unmigrated diffs. Since the last grouping have all been migrated, the first group of new unmigrated diffs will start at index 0. Running condensediffs a second time says that no diffs remain unmigrated.

I actually was doing .all()[j:j + OBJECT_LIMIT] at first, and saw that it was leaving things unmigrated and doing queries with 0 results, which is what made me realize what was happening there.

david

May 25, 2014, 2 a.m.

Since we both made the same mistake, I think this deserves a comment. You might also change filediffs to be something like unmigrated_filediffs for clarity.

Change Summary:


Added some doc comments.
Renamed filediffs to unmigrated_filediffs.
Switch the while loop to a for loop.

Commit:

05a5f75b56ec6b51fd8b114ac60539390b934606

eb69fb2ed2be744dbb8ce33082894f9284055808

Diff:

Revision 2 (+33 -13)

Show changes

	reviewboard/diffviewer/managers.py
	reviewboard/diffviewer/management/commands/condensediffs.py

This is a review from Review Bot.
  Tool: PEP8 Style Checker
  Processed Files:
    reviewboard/diffviewer/management/commands/condensediffs.py
    reviewboard/diffviewer/managers.py
  Ignored Files:

This is a review from Review Bot.
  Tool: Pyflakes
  Processed Files:
    reviewboard/diffviewer/management/commands/condensediffs.py
    reviewboard/diffviewer/managers.py
  Ignored Files:

Ship it!

```
Ship It!
```

Status:: Completed
Change Summary:: Pushed to release-2.0.x (84d17a3)