Greatly reduce the number of SQL queries in the Dashboard.
Review Request #1339 — Created Jan. 4, 2010 and submitted
Greatly reduce the number of SQL queries in the Dashboard. The dashboard is pretty slow, especially with certain column types. This is mainly due to a large number of SQL queries made per-row. This change reduces the number of queries by a considerable amount. There are a few main things done here: 1) Through the new queryset augmentation in Djblets, the My Comments column requests the number of total reviews made on the review request by the user, the number of private (unpublished) reviews, and the number of reviews marked "Ship It." We can use these values to determine what label or text to show, instead of grabbing each review on each review request and looping through them. 2) The Summary column now includes the associated draft's summary (if any) in the review request query, instead of attempting to fetch a draft individually for every review request. 3) The Summary column now compares the user ID on the review request (as stored in the ReviewRequest object) with the requester's ID, instead of fetching the user object and then comparing that object's ID. This saves us one query per row. 4) The Review Request Starred column takes the list of review request IDs that are to be shown in the datagrid (as determined by the datagrid itself) and runs one query to get all the starred review requests associated with the user's profile. It then caches these, later setting a flag on the review request itself, so that the Starred template tag can render directly from there. This saves a query per row. 5) All the counts for the sidebar ("Outgoing Reviews", "Incoming Reviews", groups, etc.) are all pre-calculated and stored on the datagrid. The template tag for displaying this information no longer performs the queries, but rather grabs the stored values. There's actually not a benefit for anything but the groups (which I'll describe next), but is all there in one place for consistency. 6) We now make one query that returns counts for every group (joined and starred) in the dashboard, instead of one per. We do need one query before this to get a list of all unique group names that we can feed into the second query, as we otherwise end up with duplicates that I can't seem to get rid of. 7) Use values() and values_list() in places where we know we only care about a single attribute and don't need all data for, say, a Group. This along with the recent Djblets changes cuts back our SQL queries quite a bit. As an example, take a dashboard with 20 review requests (with 5 unique submitters), 6 groups joined (or starred -- doesn't matter), and the My Comments, Summary, Starred, and Submitter columns turned on. Previously, you'd get roughly the following number of queries: Datagrid: 1 query My Comments: 20 queries (with data loaded into Review objects) Summary: 20 queries (with data loaded into the ReviewRequestDraft objects) Starred: 20 queries Submitter: 20 queries Sidebar: 11 queries (5 mains, 6 groups) Total: 92 queries (embarrassing) Now, you should get: Datagrid: 1 query My Comments: 0 queries (data is pre-loaded in main query) Summary: 0 queries (data is pre-loaded in main query) Starred: 1 query Submitter: 5 queries (one per user) Sidebar: 7 queries (5 mains, 2 total for all group calculations) Total: 14 queries MUCH BETTER
Compared my dashboard here before and after. Everything looked identical. Added a unit test for the various sidebar counts based on data before the change. Everything passed. Checked the profiling logs with all the SQL statements and saw a huge reduction in the number of queries. Checked that the main datagrid query that's made with all the augmented queries ran fine against huge datasets. Unlike the really old, complicated queries that led to massive database slowdown way back when (the last time some of this was attempted, poorly), this doesn't seem to have any noticeable speed difference from the more basic query. EXPLAIN SELECT ... showed the same times, ranging from 0.01ms to 0.02ms.