Greatly reduce the number of SQL queries in the Dashboard.

Review Request #1339 — Created Jan. 4, 2010 and submitted

Information

Review Board
master

Reviewers

Greatly reduce the number of SQL queries in the Dashboard.

The dashboard is pretty slow, especially with certain column types. This
is mainly due to a large number of SQL queries made per-row. This change
reduces the number of queries by a considerable amount.

There are a few main things done here:

1) Through the new queryset augmentation in Djblets, the My Comments column
   requests the number of total reviews made on the review request by the user,
   the number of private (unpublished) reviews, and the number of reviews
   marked "Ship It." We can use these values to determine what label or text to
   show, instead of grabbing each review on each review request and looping
   through them.

2) The Summary column now includes the associated draft's summary (if any)
   in the review request query, instead of attempting to fetch a draft
   individually for every review request.

3) The Summary column now compares the user ID on the review request (as
   stored in the ReviewRequest object) with the requester's ID, instead of
   fetching the user object and then comparing that object's ID. This saves
   us one query per row.

4) The Review Request Starred column takes the list of review request IDs
   that are to be shown in the datagrid (as determined by the datagrid itself)
   and runs one query to get all the starred review requests associated with
   the user's profile. It then caches these, later setting a flag on the
   review request itself, so that the Starred template tag can render directly
   from there. This saves a query per row.

5) All the counts for the sidebar ("Outgoing Reviews", "Incoming Reviews",
   groups, etc.) are all pre-calculated and stored on the datagrid. The
   template tag for displaying this information no longer performs the
   queries, but rather grabs the stored values. There's actually not a benefit
   for anything but the groups (which I'll describe next), but is all there
   in one place for consistency.

6) We now make one query that returns counts for every group (joined and
   starred) in the dashboard, instead of one per. We do need one query
   before this to get a list of all unique group names that we can feed into
   the second query, as we otherwise end up with duplicates that I can't
   seem to get rid of.

7) Use values() and values_list() in places where we know we only care about
   a single attribute and don't need all data for, say, a Group.

This along with the recent Djblets changes cuts back our SQL queries quite a
bit. As an example, take a dashboard with 20 review requests (with 5 unique
submitters), 6 groups joined (or starred -- doesn't matter), and the
My Comments, Summary, Starred, and Submitter columns turned on.

Previously, you'd get roughly the following number of queries:

Datagrid:    1 query
My Comments: 20 queries (with data loaded into Review objects)
Summary:     20 queries (with data loaded into the ReviewRequestDraft objects)
Starred:     20 queries
Submitter:   20 queries
Sidebar:     11 queries (5 mains, 6 groups)
Total:       92 queries (embarrassing)

Now, you should get:

Datagrid:    1 query
My Comments: 0 queries (data is pre-loaded in main query)
Summary:     0 queries (data is pre-loaded in main query)
Starred:     1 query
Submitter:   5 queries (one per user)
Sidebar:     7 queries (5 mains, 2 total for all group calculations)
Total:       14 queries

MUCH BETTER
Compared my dashboard here before and after. Everything looked identical.

Added a unit test for the various sidebar counts based on data before the change. Everything passed.

Checked the profiling logs with all the SQL statements and saw a huge reduction in the number of queries.

Checked that the main datagrid query that's made with all the augmented queries ran fine against huge datasets. Unlike the really old, complicated queries that led to massive database slowdown way back when (the last time some of this was attempted, poorly), this doesn't seem to have any noticeable speed difference from the more basic query. EXPLAIN SELECT ... showed the same times, ranging from 0.01ms to 0.02ms.
david
  1. Looks like it should be fine.
  2.