• 
      

    Optimize search indexing for users and review requests.

    Review Request #15014 — Created April 15, 2026 and updated

    Information

    Review Board
    release-7.1.x

    Reviewers

    The code for search indexing was old and lacked sufficient unit tests,
    and this led to some missed performance issues when indexing content.

    First of all, we had previously had an N+4 issue when indexing users
    without a profile, due to an issue where User.get_profile() wasn't
    respecting a None profile returned from a select_related(). This fix
    has been rolled up into its own change, but is notable as it's part of
    this work.

    Our main queries used to return results for indexing weren't using
    modern Django ORM capabilities to fetch related results. We were
    stuffing a lot of objects into select_related(), when
    prefetch_related() would have been more appropriate. If 20 items all
    referenced the same repository (for example), we'd be re-fetching that
    repository information as part of each review request query, instead of
    batching the repository IDs and fetching only unique ones, which
    prefetch_related() gives us. We now have more queries, but smaller
    ones that can better utilize indexes.

    The prefetches make use of Prefetch() and only(), which lets us
    gather just the fields we need on each object, instead of fetching
    every column. This means smaller results and less processing. We use
    this in every prefetch.

    only() is also used to limit the fields on the main User and
    ReviewRequest objects we're indexing. We now only fetch the fields we
    know we'll need for indexing or for computing data for indexing. This
    also covers any select_related() needed on the objects.

    LocalSite-related queries is only included when we know the server has
    Local Sites. In the typical case, this means fewer, slimmer queries and
    less processing time. There is a caveat in that if a server goes from no
    Local Sites to 1+ Local Sites while indexing is occurring, various ACL
    logic may end up fetching associated Local Sites as independent queries
    while processing ACLs and such, but this wouldn't happen at the next
    re-index, and is pretty unlikely to pose a problem in real usage.

    Most importantly, we now have extensive unit tests for the indexing
    logic that asserts queries, making sure we don't have any N+1 issues now
    or later as logic evolves. This will also help us keep things manageable
    as we add new indexing improvements.

    To keep all this logic maintainable and to further avoid errors, typing
    and modern docstrings have been added.

    All unit tests pass.

    Summary ID
    Optimize search indexing for users and review requests.
    The code for search indexing was old and lacked sufficient unit tests, and this led to some missed performance issues when indexing content. First of all, we had previously had an N+4 issue when indexing users without a profile, due to an issue where `User.get_profile()` wasn't respecting a `None` profile returned from a `select_related()`. This fix has been rolled up into its own change, but is notable as it's part of this work. Our main queries used to return results for indexing weren't using modern Django ORM capabilities to fetch related results. We were stuffing a lot of objects into `select_related()`, when `prefetch_related()` would have been more appropriate. If 20 items all referenced the same repository (for example), we'd be re-fetching that repository information as part of each review request query, instead of batching the repository IDs and fetching only unique ones, which `prefetch_related()` gives us. We now have more queries, but smaller ones that can better utilize indexes. The prefetches make use of `Prefetch()` and `only()`, which lets us gather just the fields we need on each object, instead of fetching every column. This means smaller results and less processing. We use this in every prefetch. `only()` is also used to limit the fields on the main `User` and `ReviewRequest` objects we're indexing. We now only fetch the fields we know we'll need for indexing or for computing data for indexing. This also covers any `select_related()` needed on the objects. `LocalSite`-related queries is only included when we know the server has Local Sites. In the typical case, this means fewer, slimmer queries and less processing time. There is a caveat in that if a server goes from no Local Sites to 1+ Local Sites while indexing is occurring, various ACL logic may end up fetching associated Local Sites as independent queries while processing ACLs and such, but this wouldn't happen at the next re-index, and is pretty unlikely to pose a problem in real usage. Most importantly, we now have extensive unit tests for the indexing logic that asserts queries, making sure we don't have any N+1 issues now or later as logic evolves. This will also help us keep things manageable as we add new indexing improvements. To keep all this logic maintainable and to further avoid errors, typing and modern docstrings have been added.
    fd8b273b01112a3b49376b1e4da61e1d3d2f1c11
    Description From Last Updated

    Here and below should say UserIndex.update.

    maubin maubin

    Sort order is off here.

    david david

    This doesn't match the name in the args. We should probably rename that one to review_request for consistency with the …

    david david

    Copy/pasteo: should be user IDs

    david david

    Can you add this attribute error to the docstring.

    maubin maubin

    Same here can you add this to the docstring.

    maubin maubin
    Checks run (2 succeeded)
    flake8 passed.
    JSHint passed.
    maubin
    1. 
        
    2. I have a question that's not strictly about this change, but this made me think of it: How can we account for hooks in these sorts of database queries? Like for example we have the FileDiffACLHook, which if present will come into play from the prepare_private() method calling review_request.is_accessible_by() which will then cause more queries by doing get_diffsets().

      Was also thinking about this when working on the dashboard stuff, trying to fetch approved review requests in a performant way while still taking into account the review request approval hook.

    3. Show all issues

      Here and below should say UserIndex.update.

    4. reviewboard/search/indexes.py (Diff revision 1)
       
       
      Show all issues

      Can you add this attribute error to the docstring.

    5. reviewboard/search/indexes.py (Diff revision 1)
       
       
      Show all issues

      Same here can you add this to the docstring.

    6. 
        
    david
    1. 
        
    2. reviewboard/reviews/search_indexes.py (Diff revision 1)
       
       
       
      Show all issues

      Sort order is off here.

    3. reviewboard/reviews/search_indexes.py (Diff revision 1)
       
       
      Show all issues

      This doesn't match the name in the args. We should probably rename that one to review_request for consistency with the other methods around here.

    4. reviewboard/reviews/search_indexes.py (Diff revision 1)
       
       
       
      Show all issues

      Copy/pasteo: should be user IDs

    5.