Make accessibility checks more robust for Reviews and Comments queries

Review Request #12424 — Created June 29, 2022 and submitted

Information

Review Board
release-5.0.x

Reviewers

Recently, it was discovered that when querying for reviews there may be
duplicate Q(public=True)'s that appear in the queries to fetch the reviews.
The cause of this duplication lies in ReviewManager's _query method, which
handles the logic for querying for reviews. If an unauthenticated or anonymous
user was querying for the reviews, the public query field would be set twice,
once to the public value that was passed and once to True, with the latter
ensuring that only public reviews get returned to unauthenticated/anonymous
users. This however makes the query less efficient and can lead to the wrong
results being included in the returned queryset if a False value for public
was passed in.

To fix this, we ensure that the public query field is only set once, for any
type of user, in ReviewManager._query. This fix was applied to
CommentManager._query as well.

This change also adds a public parameter to ReviewManager.accessible to
ensure that the public parameter in ReviewManager._query handles all of
the logic surrounding public and unpublished reviews. Previously, you would
have to set the public query field using the extra_query parameter in
ReviewManager.accessible, which could also lead to duplicate or
conflicting Q(public= queries.

Finally, this change also improves the tests for the ReviewManager and
CommentManager. The tests are more clear, organized and well documented,
they use assertQueries, and have more comprehensive test coverage surrounding
access control.

  • Added new unit tests for ReviewManager.accessible with setting the
    public parameter.
  • Ran all unit tests with success.
Summary ID
Make accessibility checks more robust for Reviews and Comments queries
321568393659002f0d6d41524cdf9e1a4e9c19cd
Description From Last Updated

How should we check that the diffset related to the review is accessible by the user? I see the logic …

maubinmaubin

I think we can have a more performant query if this becomes something like: q &= ( Q(user=user) | (Q(public=True) …

chipx86chipx86

Seems like a good performance improvement. Let's document these conditions for an empty result with a comment.

chipx86chipx86

So going through the other changes for some duplicate object fixes I'm putting together, I think we need to take …

chipx86chipx86

For booleans, we allow ('1', 'true', 'True'). Not ideal to repeat this everywhere, but let's call that a problem for …

chipx86chipx86

This should be review__public, right? If so, and no tests failed, this means we're missing a test.

chipx86chipx86

Imports should be in alphabetical order.

chipx86chipx86

Comments should generally end with a period.

chipx86chipx86

Blank line before new blocks (same below).

chipx86chipx86

Should I consolidate these common queries into a variable or something? Then use the variable instead of repeating the full …

maubinmaubin

For multi-line lists, we generally do: [ review1, review2, ..., review5, ..., ] Or, when appropriate, one item per row …

chipx86chipx86

Given this is one Q, we can just put this on the same line. Same in other places.

chipx86chipx86

Another example of where we want periods at the end of the sentence. Really applies to all such comments.

chipx86chipx86

trailing whitespace Column: 38 Error code: W291

reviewbotreviewbot

Missing a trailing comma (we always want to terminate multi-column list items with a trailing comma to minimize line changes …

chipx86chipx86
chipx86
  1. 
      
  2. reviewboard/reviews/managers.py (Diff revision 1)
     
     
     
     
    Show all issues

    I think we can have a more performant query if this becomes something like:

    q &= (
        Q(user=user) |
        (Q(public=True) &
         (repo_q &
          Q(review_request__target_people=user) |
          group_q))
    )
    

    That allows the query planner to have to consider only one set of rows for a public=True, and short-circuits that if we know the user matches.

    It does mean repeating some of that above, but we can just take the (repo_q ...) bits and put that into a reusable acl_check_q or something.

  3. reviewboard/reviews/managers.py (Diff revision 1)
     
     
     
    Show all issues

    Seems like a good performance improvement. Let's document these conditions for an empty result with a comment.

  4. reviewboard/reviews/tests/test_review_manager.py (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     
    Show all issues

    So going through the other changes for some duplicate object fixes I'm putting together, I think we need to take the opportunity to better organize these tests so it's explicitly clear what data we're working with.

    As such, with these tests (and ideally we'd update the others to match, but let's start here), let's have a pattern of putting a comment above each distinct group of related data explaining what access policy that's testing. For example:

    # Published review on a publicly-accessible review request.
    review1 = self.create_review(review_request, publish=True)
    
    ...
    
    # Draft review on a publicly-accessible review request.
    ...
    
    # Published review on a private repository the user has access to.
    ...
    
    # Published review on a private repository the user does not have access to.
    ...
    

    etc.

    That way it's very easy for us to see what data we're working with, and it's very easy to know where the code should live.

    Bonus points for a comment above each assertQuerysetEqual that says what we're testing for, and usage of assertQueries() as part of that to ensure we're building exactly the queries we want.

  5. reviewboard/webapi/resources/root_review.py (Diff revision 1)
     
     
     
    Show all issues

    For booleans, we allow ('1', 'true', 'True').

    Not ideal to repeat this everywhere, but let's call that a problem for another day.

  6. 
      
maubin
maubin
  1. 
      
  2. Show all issues

    How should we check that the diffset related to the review is accessible by the user? I see the logic for it in ReviewRequest._is_diffset_accessible_by(user, diffset) but how can we efficiently do this in the Managers through a query?

    1. There's no way to, because this isn't database-bound. The computations are Python code, which may do things like check a third-party service.

      That sort of check has to be done separately from accessible results.

  3. reviewboard/reviews/tests/test_review_manager.py (Diff revision 2)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    Show all issues

    Should I consolidate these common queries into a variable or something? Then use the variable instead of repeating the full queries

    1. I've debated this myself. We could. We had a similar situation for the Cliosoft SOS tests in Power Pack, and what I ended doing there is providing a class of "rule" properties/functions that generated expected results for a particular type of operation.

      In this case, many of these sorts of queries would be repeated outside of an individual test suite.

      So maybe we coudl do something similar to the SOS tests, but centralized in something like reviewboard.testing.query_rules. A class per thing, like GroupQueryRules with properties for common static rules and functions for common dynamic rules (that take in, say, a user ID).

      I can see some benefits to that as we scale out these sorts of queries. Might be worth the tradeoff of added complexity of the properties/functions.

    2. I like that idea. For now I will continue writing out the full queries, and sometime in the future I can revisit this and think about it more. Btw, what class are you referring to in the Cliosoft SOS tests?

  4. 
      
chipx86
  1. This is a lot of work! Thanks for doing this :)

    Some small nits and one thing that may be lacking a test.

  2. reviewboard/reviews/managers.py (Diff revision 2)
     
     
    Show all issues

    This should be review__public, right?

    If so, and no tests failed, this means we're missing a test.

    1. That's right, however this code is never going to be reached when using the CommentManager.accessible method because we always set public=None. This wasn't the case for the ReviewManager.accessible since we allow public to be None, False or True, and had tests checking this. For the CommentManager, I could add tests for checking CommentManager.from_user(superuser, public=True) and public=False which would trigger this code, but that seems kind of random. Should I test CommentManager._query directly instead?

    2. If we have code to build this part of the query, we'll want test coverage for it. However, if there's no chance it can ever be executed through normal means, then it's probably not worth building this part of the query.

  3. Show all issues

    Imports should be in alphabetical order.

  4. Show all issues

    Comments should generally end with a period.

  5. Show all issues

    Blank line before new blocks (same below).

  6. Show all issues

    For multi-line lists, we generally do:

    [
        review1, review2, ...,
        review5, ...,
    ]
    

    Or, when appropriate, one item per row (generally more appropriate when we may be operating more on the list over time, to minimize changes to the code).

  7. reviewboard/reviews/tests/test_review_manager.py (Diff revision 2)
     
     
     
     
    Show all issues

    Given this is one Q, we can just put this on the same line.

    Same in other places.

  8. Show all issues

    Another example of where we want periods at the end of the sentence.

    Really applies to all such comments.

  9. 
      
maubin
Review request changed
Change Summary:
  • Applied feedback from last review.
  • Tuned up the CommentManager tests.
Description:
   

Recently, it was discovered that when querying for reviews there may be

    duplicate Q(public=True)'s that appear in the queries to fetch the reviews.
    The cause of this duplication lies in ReviewManager's _query method, which
    handles the logic for querying for reviews. If an unauthenticated or anonymous
    user was querying for the reviews, the public query field would be set twice,
    once to the public value that was passed and once to True, with the latter
    ensuring that only public reviews get returned to unauthenticated/anonymous
    users. This however makes the query less efficient and can lead to the wrong
    results being included in the returned queryset if a False value for public
    was passed in.

   
   

To fix this, we ensure that the public query field is only set once, for any

    type of user, in ReviewManager._query. This fix was applied to
    CommentManager._query as well.

   
   

This change also adds a public parameter to ReviewManager.accessible to

    ensure that the public parameter in ReviewManager._query handles all of
    the logic surrounding public and unpublished reviews. Previously, you would
    have to set the public query field using the extra_query parameter in
    ReviewManager.accessible, which could also lead to duplicate or
    conflicting Q(public= queries.

  +
  +

Finally, this change also improves the tests for the ReviewManager and

  + CommentManager. The tests are more clear, organized and well documented,
  + they use assertQueries, and have more comprehensive test coverage surrounding
  + access control.

Commits:
Summary ID
Make accessibility checks more robust for Reviews and Comments queries
bf75c8ca6cfd33c66bc646a65510865999ff9b7e
Make accessibility checks more robust for Reviews and Comments queries
edfbeab3d8c44767d124900a020a30af93b5cbf1

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

maubin
chipx86
  1. Awesome, home stretch. One tiny code styling nit (in a handful of places), but everything else looks ready to go.

  2. Show all issues

    Missing a trailing comma (we always want to terminate multi-column list items with a trailing comma to minimize line changes if we add to the list).

    This applies to other queries as well.

  3. 
      
maubin
david
  1. Ship It!
  2. 
      
chipx86
  1. Ship It!
  2. 
      
maubin
Review request changed
Status:
Completed
Change Summary:
Pushed to release-5.0.x (7128e09)