Fix a search indexing performance regression with ACL diff checks.

Review Request #14165 — Created Sept. 12, 2024 and submitted — Latest diff uploaded

Information

Review Board
release-7.x

Reviewers

When indexing review requests, we determine if a review request is
accessible via review_request.is_accessible_by(). Part of this checks
the files in the diffsets for accessibility, using ACL diff checks. This
ends up querying the list of diffsets for each review request and then
each file within it, which drastically slows down diff indexing.

We now employ a couple layers of protection against this:

  1. ReviewRequest.get_diffsets() will now use the prefetch_related()
    and select_related() caches to determine if it needs to even
    perform a new query, or if it can short-cut the result. This saves
    the per-diffset and prefetched-files queries.

  2. ReviewRequest._are_diffs_accessible_by() no longer even calls
    get_diffsets() if there aren't any FileDiffACLHooks registered,
    which will be the common case.

This ensures we don't do any more work than we need to do at any stage
of these checks, and makes a major difference in indexing performance.

All unit tests pass.

Added SQL-level debugging for the database backend and performed a
search index. With get_diffsets() being invoked for every diff, I
verified that it used the prefetch cache, avoiding new queries. With
ACL hooks factored in, I verified it never even got to the diffset
query code without faking hooks being available.

Diff Revision 1

This is not the most recent revision of the diff. The latest diff is revision 2. See what's changed.

orig
1
2

Commits

First Last Summary ID Author
Fix a search indexing performance regression with ACL diff checks.
When indexing review requests, we determine if a review request is accessible via `review_request.is_accessible_by()`. Part of this checks the files in the diffsets for accessibility, using ACL diff checks. This ends up querying the list of diffsets for each review request and then each file within it, which drastically slows down diff indexing. We now employ a couple layers of protection against this: 1. `ReviewRequest.get_diffsets()` will now use the `prefetch_related()` and `select_related()` caches to determine if it needs to even perform a new query, or if it can short-cut the result. This saves the per-diffset and prefetched-files queries. 2. `ReviewRequest._are_diffs_accessible_by()` no longer even calls `get_diffsets()` if there aren't any `FileDiffACLHook`s registered, which will be the common case. This ensures we don't do any more work than we need to do at any stage of these checks, and makes a major difference in indexing performance.
6a3972a18daf667439c36c061c435aafb0e8b974 Christian Hammond
reviewboard/reviews/models/review_request.py
reviewboard/reviews/tests/test_review_request.py
Loading...