Optimize search indexing for users and review requests.
Review Request #15014 — Created April 15, 2026 and updated
The code for search indexing was old and lacked sufficient unit tests,
and this led to some missed performance issues when indexing content.First of all, we had previously had an N+4 issue when indexing users
without a profile, due to an issue whereUser.get_profile()wasn't
respecting aNoneprofile returned from aselect_related(). This fix
has been rolled up into its own change, but is notable as it's part of
this work.Our main queries used to return results for indexing weren't using
modern Django ORM capabilities to fetch related results. We were
stuffing a lot of objects intoselect_related(), when
prefetch_related()would have been more appropriate. If 20 items all
referenced the same repository (for example), we'd be re-fetching that
repository information as part of each review request query, instead of
batching the repository IDs and fetching only unique ones, which
prefetch_related()gives us. We now have more queries, but smaller
ones that can better utilize indexes.The prefetches make use of
Prefetch()andonly(), which lets us
gather just the fields we need on each object, instead of fetching
every column. This means smaller results and less processing. We use
this in every prefetch.
only()is also used to limit the fields on the mainUserand
ReviewRequestobjects we're indexing. We now only fetch the fields we
know we'll need for indexing or for computing data for indexing. This
also covers anyselect_related()needed on the objects.
LocalSite-related queries is only included when we know the server has
Local Sites. In the typical case, this means fewer, slimmer queries and
less processing time. There is a caveat in that if a server goes from no
Local Sites to 1+ Local Sites while indexing is occurring, various ACL
logic may end up fetching associated Local Sites as independent queries
while processing ACLs and such, but this wouldn't happen at the next
re-index, and is pretty unlikely to pose a problem in real usage.Most importantly, we now have extensive unit tests for the indexing
logic that asserts queries, making sure we don't have any N+1 issues now
or later as logic evolves. This will also help us keep things manageable
as we add new indexing improvements.To keep all this logic maintainable and to further avoid errors, typing
and modern docstrings have been added.
All unit tests pass.
| Summary | ID |
|---|---|
| fd8b273b01112a3b49376b1e4da61e1d3d2f1c11 |
| Description | From | Last Updated |
|---|---|---|
|
Here and below should say UserIndex.update. |
|
|
|
Sort order is off here. |
|
|
|
This doesn't match the name in the args. We should probably rename that one to review_request for consistency with the … |
|
|
|
Copy/pasteo: should be user IDs |
|
|
|
Can you add this attribute error to the docstring. |
|
|
|
Same here can you add this to the docstring. |
|
-
-
I have a question that's not strictly about this change, but this made me think of it: How can we account for hooks in these sorts of database queries? Like for example we have the
FileDiffACLHook, which if present will come into play from theprepare_private()method callingreview_request.is_accessible_by()which will then cause more queries by doingget_diffsets().Was also thinking about this when working on the dashboard stuff, trying to fetch approved review requests in a performant way while still taking into account the review request approval hook.
-
-
-