This change has been marked as completed.

Describe the completed change (optional):

Pushed to release-2.0.x (84d17a3)

Summary

Better manage memory usage of condensediffs.

Review Request #5870 — Created May 24, 2014 and submitted 11 years, 1 month ago

Information

Owner

chipx86*

Repository

Review Board

Branch

release-2.0.x

Bugs

Depends On

Commit

eb69fb2...

Reviewers

Groups

reviewboard

People

Description*

condensediffs had some memory issues. Even though we were using
QuerySet.iterator, it seems memory was still increasing far too much.

To work around this, we're using a couple tricks that we used in
loaddb and dumpdb. We're only operating on batches of 200 diffs at a
time, and are resetting queries and garbage-collecting after each batch.

The query resets shouldn't impact production installs, since DEBUG
should be False, but it's a precaution. We're also forcing DEBUG to be
False in the management command as well.

With these changes, memory still increases over time, to a degree, but
seems to stabilize. This is with a sample set of over 8200 diffs.

Testing Done

Added some debug information to check process memory usage before and after.

Before, memory was steadily rising in usage (even with DEBUG = False).

After, memory rose for a bit and then stayed pretty steady, without growing
unexpectedly large.

Issues

Description	From	Last Updated
I don't think this does the right thing (it seems to iterate over items 0-199 over and over).	david	11 years, 1 month ago
There are no open issues

This is a review from Review Bot.
  Tool: PEP8 Style Checker
  Processed Files:
    reviewboard/diffviewer/management/commands/condensediffs.py
    reviewboard/diffviewer/managers.py
  Ignored Files:

This is a review from Review Bot.
  Tool: Pyflakes
  Processed Files:
    reviewboard/diffviewer/management/commands/condensediffs.py
    reviewboard/diffviewer/managers.py
  Ignored Files:

reviewboard/diffviewer/managers.py (Diff revision 1)

The issue has been resolved. Show all issues

I don't think this does the right thing (it seems to iterate over items 0-199 over and over).

chipx86 11 years, 1 month ago

The queryset covers unmigrated diffs. Every time we do the .all()[:OBJECT_LIMIT], it re-evaluates, getting the first 200 unmigrated diffs. Since the last grouping have all been migrated, the first group of new unmigrated diffs will start at index 0. Running condensediffs a second time says that no diffs remain unmigrated.

I actually was doing .all()[j:j + OBJECT_LIMIT] at first, and saw that it was leaving things unmigrated and doing queries with 0 results, which is what made me realize what was happening there.

david 11 years, 1 month ago

Since we both made the same mistake, I think this deserves a comment. You might also change filediffs to be something like unmigrated_filediffs for clarity.

Change Summary:


Added some doc comments.
Renamed filediffs to unmigrated_filediffs.
Switch the while loop to a for loop.

Commit:

05a5f75b56ec6b51fd8b114ac60539390b934606

eb69fb2ed2be744dbb8ce33082894f9284055808

Diff:

Revision 2 (+33 -13)

Show changes

	reviewboard/diffviewer/managers.py
	reviewboard/diffviewer/management/commands/condensediffs.py

This is a review from Review Bot.
  Tool: PEP8 Style Checker
  Processed Files:
    reviewboard/diffviewer/management/commands/condensediffs.py
    reviewboard/diffviewer/managers.py
  Ignored Files:

This is a review from Review Bot.
  Tool: Pyflakes
  Processed Files:
    reviewboard/diffviewer/management/commands/condensediffs.py
    reviewboard/diffviewer/managers.py
  Ignored Files:

Ship it!

```
Ship It!
```

Status:: Completed
Change Summary:: Pushed to release-2.0.x (84d17a3)