Optimize our datagrid queries when sorting data.

Review Request #883 — Created June 7, 2009 and submitted

Information

Navi (deprecated)
trunk

Reviewers

Optimize our datagrid queries when sorting data.

We were doing one big query for the data in the datagrid, but when
sorting, it could take a long time to fetch the results. In MySQL (our
test case for this) we were hitting several slowdowns, as it needed to
set up a temporary table and do a filesort on the data.

To improve the speed, we now query for the IDs only, factoring out all
other parts of the provided query. We can then perform a second query,
fetching the data we need, by querying specifically for entries with
those IDs, after all sorting has been applied.

A side-effect of this optimization is that things may be dropped from
the initial query. This isn't a problem for most uses, but if subqueries
are used and the datagrid cells are expected to be able to access this
data, that data will end up stripped away and the cells will fail.

This can be worked around by either turning off optimize_sorts for the
datagrid, or overriding post_process_queryset to specify those extra
subqueries (the latter being significantly better for larger datasets).
Tested with all the datagrids in Review Board, and saw significant performance increases, especially in conjunction with the Review Board changes in /r/884.
david
  1. This mostly looks pretty good. My comments are mostly about documentation.
  2. /trunk/djblets/djblets/datagrid/grids.py (Diff revision 1)
     
     
     
     
    I'm not really sure what this comment means.
  3. /trunk/djblets/djblets/datagrid/grids.py (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     
     
    Can you mention in here that this function is meant to be overridden? It took me a while to figure out that I had to look at your other changeset for this to make sense.
  4. 
      
chipx86
Review request changed
Change Summary:
Fixed the sort order of the results. This was broken, since the database doesn't actually keep results in the ID list provided (nor should it). We now process this ourselves after getting the results.

Also enhanced the comments to describe what we're doing a bit better, and to explain how post_process_queryset() is useful.
david
  1. 
      
  2. /trunk/djblets/djblets/datagrid/grids.py (Diff revisions 1 - 2)
     
     
    This should just be "if id_list"
    1. Right. Originally I was thinking that we wanted to still handle this logic if id_list was [], but that was really before I knew exactly what my logic was going to be.
  3. It seems like there should be a more elegant way to do the ID sorting, but it's not jumping out at me right away. I'll continue thinking about it, but this is good as-is.