Fix a search indexing performance regression with ACL diff checks.

Review Request #14165 — Created Sept. 12, 2024 and submitted

Information

Review Board
release-7.x

Reviewers

When indexing review requests, we determine if a review request is
accessible via review_request.is_accessible_by(). Part of this checks
the files in the diffsets for accessibility, using ACL diff checks. This
ends up querying the list of diffsets for each review request and then
each file within it, which drastically slows down diff indexing.

We now employ a couple layers of protection against this:

  1. ReviewRequest.get_diffsets() will now use the prefetch_related()
    and select_related() caches to determine if it needs to even
    perform a new query, or if it can short-cut the result. This saves
    the per-diffset and prefetched-files queries.

  2. ReviewRequest._are_diffs_accessible_by() no longer even calls
    get_diffsets() if there aren't any FileDiffACLHooks registered,
    which will be the common case.

This ensures we don't do any more work than we need to do at any stage
of these checks, and makes a major difference in indexing performance.

All unit tests pass.

Added SQL-level debugging for the database backend and performed a
search index. With get_diffsets() being invoked for every diff, I
verified that it used the prefetch cache, avoiding new queries. With
ACL hooks factored in, I verified it never even got to the diffset
query code without faking hooks being available.

Summary ID
Fix a search indexing performance regression with ACL diff checks.
When indexing review requests, we determine if a review request is accessible via `review_request.is_accessible_by()`. Part of this checks the files in the diffsets for accessibility, using ACL diff checks. This ends up querying the list of diffsets for each review request and then each file within it, which drastically slows down diff indexing. We now employ a couple layers of protection against this: 1. `ReviewRequest.get_diffsets()` will now use the `prefetch_related()` and `select_related()` caches to determine if it needs to even perform a new query, or if it can short-cut the result. This saves the per-diffset and prefetched-files queries. 2. `ReviewRequest._are_diffs_accessible_by()` no longer even calls `get_diffsets()` if there aren't any `FileDiffACLHook`s registered, which will be the common case. This ensures we don't do any more work than we need to do at any stage of these checks, and makes a major difference in indexing performance.
15968ceda0cfdacd5df7f8b56007b9ab1cd5e532
Description From Last Updated

Nit: You could add a comment in the form of # Query description. # # X queries: # # 1. …

maubinmaubin
maubin
  1. 
      
  2. Show all issues

    Nit: You could add a comment in the form of

    # Query description.
    #
    # X queries:
    #
    # 1. Operation description
    # ...
    

    above the query list. Same below.

  3. 
      
david
  1. Ship It!
  2. 
      
chipx86
david
  1. Ship It!
  2. 
      
maubin
  1. Ship It!
  2. 
      
chipx86
Review request changed
Status:
Completed
Change Summary:
Pushed to release-7.x (e05199e)