• 
      

    Greatly reduce the number of SQL queries in the Dashboard.

    Review Request #1339 — Created Jan. 4, 2010 and submitted — Latest diff uploaded

    Information

    Review Board
    master

    Reviewers

    Greatly reduce the number of SQL queries in the Dashboard.
    
    The dashboard is pretty slow, especially with certain column types. This
    is mainly due to a large number of SQL queries made per-row. This change
    reduces the number of queries by a considerable amount.
    
    There are a few main things done here:
    
    1) Through the new queryset augmentation in Djblets, the My Comments column
       requests the number of total reviews made on the review request by the user,
       the number of private (unpublished) reviews, and the number of reviews
       marked "Ship It." We can use these values to determine what label or text to
       show, instead of grabbing each review on each review request and looping
       through them.
    
    2) The Summary column now includes the associated draft's summary (if any)
       in the review request query, instead of attempting to fetch a draft
       individually for every review request.
    
    3) The Summary column now compares the user ID on the review request (as
       stored in the ReviewRequest object) with the requester's ID, instead of
       fetching the user object and then comparing that object's ID. This saves
       us one query per row.
    
    4) The Review Request Starred column takes the list of review request IDs
       that are to be shown in the datagrid (as determined by the datagrid itself)
       and runs one query to get all the starred review requests associated with
       the user's profile. It then caches these, later setting a flag on the
       review request itself, so that the Starred template tag can render directly
       from there. This saves a query per row.
    
    5) All the counts for the sidebar ("Outgoing Reviews", "Incoming Reviews",
       groups, etc.) are all pre-calculated and stored on the datagrid. The
       template tag for displaying this information no longer performs the
       queries, but rather grabs the stored values. There's actually not a benefit
       for anything but the groups (which I'll describe next), but is all there
       in one place for consistency.
    
    6) We now make one query that returns counts for every group (joined and
       starred) in the dashboard, instead of one per. We do need one query
       before this to get a list of all unique group names that we can feed into
       the second query, as we otherwise end up with duplicates that I can't
       seem to get rid of.
    
    7) Use values() and values_list() in places where we know we only care about
       a single attribute and don't need all data for, say, a Group.
    
    This along with the recent Djblets changes cuts back our SQL queries quite a
    bit. As an example, take a dashboard with 20 review requests (with 5 unique
    submitters), 6 groups joined (or starred -- doesn't matter), and the
    My Comments, Summary, Starred, and Submitter columns turned on.
    
    Previously, you'd get roughly the following number of queries:
    
    Datagrid:    1 query
    My Comments: 20 queries (with data loaded into Review objects)
    Summary:     20 queries (with data loaded into the ReviewRequestDraft objects)
    Starred:     20 queries
    Submitter:   20 queries
    Sidebar:     11 queries (5 mains, 6 groups)
    Total:       92 queries (embarrassing)
    
    Now, you should get:
    
    Datagrid:    1 query
    My Comments: 0 queries (data is pre-loaded in main query)
    Summary:     0 queries (data is pre-loaded in main query)
    Starred:     1 query
    Submitter:   5 queries (one per user)
    Sidebar:     7 queries (5 mains, 2 total for all group calculations)
    Total:       14 queries
    
    MUCH BETTER
    Compared my dashboard here before and after. Everything looked identical.
    
    Added a unit test for the various sidebar counts based on data before the change. Everything passed.
    
    Checked the profiling logs with all the SQL statements and saw a huge reduction in the number of queries.
    
    Checked that the main datagrid query that's made with all the augmented queries ran fine against huge datasets. Unlike the really old, complicated queries that led to massive database slowdown way back when (the last time some of this was attempted, poorly), this doesn't seem to have any noticeable speed difference from the more basic query. EXPLAIN SELECT ... showed the same times, ranging from 0.01ms to 0.02ms.