Add a method for returning reviews that are accessible by a given user to ReviewManager

Review Request #12302 — Created May 25, 2022 and submitted

Information

Review Board
release-5.0.x

Reviewers

Currently there is no efficient way to query for reviews that are
accessible by a given user when using Reviews.objects. This change adds a
method to ReviewManager such that Reviews.objects.accessible(user) returns a
queryset of reviews that are accessible by the given user.

  • Created test_review_manager.py file and moved in the ReviewManager
    tests from test_review.py.
  • Created tests for Review.objects.accessible and ran all tests in
    test_review_manager.py with success.
Summary ID Author
Add a method for returning reviews that are accessible by a given user
to ReviewManager
932f8f52bcbe23fa2e680cd5af257f04ec4bcb00 Michelle
Description From Last Updated

F401 'django.contrib.auth.models.AnonymousUser' imported but unused

reviewbotreviewbot

F401 'djblets.testing.decorators.add_fixtures' imported but unused

reviewbotreviewbot

F401 'reviewboard.reviews.errors.RevokeShipItError' imported but unused

reviewbotreviewbot

F401 'reviewboard.reviews.models.Review' imported but unused

reviewbotreviewbot

F401 'reviewboard.reviews.models.ReviewRequest' imported but unused

reviewbotreviewbot

F401 'reviewboard.reviews.signals.review_ship_it_revoked' imported but unused

reviewbotreviewbot

F401 'reviewboard.reviews.signals.review_ship_it_revoking' imported but unused

reviewbotreviewbot

what do you mean by "should be consolidated?" perhaps I can address this TODO in my changes

maubinmaubin

i feel iffy about changing the logic of the _query() function to this extent. now if we're querying for unpublished …

maubinmaubin

i'm wondering if there's a way to add users to a local site

maubinmaubin

Let's add a period at the end of this.

daviddavid

E402 module level import not at top of file

reviewbotreviewbot

We can get rid of this now, in favor of LocalSite.ALL (which we'll want to document as a valid type …

chipx86chipx86

Since this method is new, we can default this to off. It's a bit inconsistent with other changes, but is …

chipx86chipx86

I think we should just require a User. Something I want to phase out is the whole user-or-username aspect of …

chipx86chipx86

When we have functions with multiple keyword arguments, it's best to do one per line. That helps with both readability …

chipx86chipx86

These queries are very expensive, full of nested joins and stuff. We don't want to blindly OR those together, as …

chipx86chipx86

We can just do an assert for str inside the else, to elimate an isinstance check.

chipx86chipx86

The more fixtures we load, the more expensive per-test setup. We should avoid test_site except for tests that specifically need …

chipx86chipx86

Two important things here: Private functions always go after public functions. This function is very specific to certain tests we …

chipx86chipx86

To keep code consistent, let's always use review_request and review as variable names.

chipx86chipx86

assertIn / assertNotIn can be very misleading. It doesn't always tell us the whole story. For instance, we might expect …

chipx86chipx86

continuation line over-indented for visual indent Column: 38 Error code: E127

reviewbotreviewbot

continuation line over-indented for visual indent Column: 38 Error code: E127

reviewbotreviewbot

No blank line after method-level docstrings.

chipx86chipx86

Let's change this to "Added the ..." That'll be more consistent, and reads a bit better.

chipx86chipx86
Checks run (1 failed, 1 succeeded)
flake8 failed.
JSHint passed.

flake8

maubin
maubin
maubin
  1. 
      
  2. reviewboard/reviews/managers.py (Diff revision 3)
     
     
    Show all issues

    what do you mean by "should be consolidated?" perhaps I can address this TODO in my changes

    1. Oh don't worry about this for now. It's part of a much larger series of things that need to be fixed carefully, and is firmly on my plate.

  3. reviewboard/reviews/managers.py (Diff revision 3)
     
     
    Show all issues

    i feel iffy about changing the logic of the _query() function to this extent. now if we're querying for unpublished reviews and a user is given, it will only return unpublished reviews that are owned by that user. i made this change because that's what i need for the accessible() function to work properly, otherwise it would return drafts that are owned by other users. i tried to think of other ways for accessible() to only return drafts owned by the given user, but none worked out. if you see any ways to achieve this without having to modify _query(), or if it's fine to modify _query() like this let me know.

    1. It looks to me like both of the blocks of code being added are the same.

      I don't think you need to modify this method at all. The wrapper function just needs to pass in user to _query().

      I may be missing something, so let's talk about it in detail in Slack.

  4. Show all issues

    i'm wondering if there's a way to add users to a local site

    1. If you mean through the UI, then it can be done through Admin UI -> Database -> Local Sites.

      Review Board doesn't have a full Local Site management UI. That's available in RBCommons, but is pretty tailored to that service.

  5. 
      
maubin
david
  1. 
      
  2. reviewboard/reviews/managers.py (Diff revision 4)
     
     
    Show all issues

    Let's add a period at the end of this.

  3. 
      
maubin
Review request changed
Change Summary:

cleaned up query code, added asserts for user_or_username's type

Commits:
Summary ID Author
Add a method for returning reviews that are accessible by a given user
to ReviewManager
4b6a9d6c995a4cd6c8524d5ccf2ab4b0cae1adac Michelle
Add a method for returning reviews that are accessible by a given user
to ReviewManager
974b7bb0f9741a07688a93305bbfc5905bec9034 Michelle

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

maubin
chipx86
  1. 
      
  2. reviewboard/reviews/managers.py (Diff revision 6)
     
     
     
     
     
     
    Show all issues

    We can get rid of this now, in favor of LocalSite.ALL (which we'll want to document as a valid type for local_site -- I think I have this in one of my changes).

  3. reviewboard/reviews/managers.py (Diff revision 6)
     
     
     
     
     
     
     
    Show all issues

    Since this method is new, we can default this to off. It's a bit inconsistent with other changes, but is probably the correct way to go.

  4. reviewboard/reviews/managers.py (Diff revision 6)
     
     
     
     
     
     
    Show all issues

    I think we should just require a User. Something I want to phase out is the whole user-or-username aspect of the other methods, because it eventually leads to unwanted extra queries.

    The reason is that, for the kinds of checks we perform, we need a User anyway. So the query function has to get one. However, if the code calling this eventually needs the same User, they have to query for it again, which is therefore slower.

    Requiring a User ensures the caller has to think about when it's getting a user and how.

  5. reviewboard/reviews/managers.py (Diff revision 6)
     
     
     
     
     
     
    Show all issues

    When we have functions with multiple keyword arguments, it's best to do one per line. That helps with both readability and maintainability.

  6. reviewboard/reviews/managers.py (Diff revision 6)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    Show all issues

    These queries are very expensive, full of nested joins and stuff. We don't want to blindly OR those together, as it'll get pretty hairy pretty quickly.

    Let's look at how this looks. We'll go with a simple example. Just one _query():

    str(Review.objects._query(some_non_admin_user, public=True, status=None, filter_private=True).query)
    

    Which gives us something like the following (after I run it through a formatter):

    SELECT
      DISTINCT "reviews_review"."id",
      "reviews_review"."review_request_id",
      "reviews_review"."user_id",
      "reviews_review"."timestamp",
      "reviews_review"."public",
      "reviews_review"."ship_it",
      "reviews_review"."base_reply_to_id",
      "reviews_review"."email_message_id",
      "reviews_review"."time_emailed",
      "reviews_review"."body_top",
      "reviews_review"."body_top_rich_text",
      "reviews_review"."body_bottom",
      "reviews_review"."body_bottom_rich_text",
      "reviews_review"."body_top_reply_to_id",
      "reviews_review"."body_bottom_reply_to_id",
      "reviews_review"."extra_data",
      "reviews_review"."rich_text",
      "reviews_review"."reviewed_diffset_id"
    FROM
      "reviews_review"
      INNER JOIN "reviews_reviewrequest" ON (
        "reviews_review"."review_request_id" = "reviews_reviewrequest"."id"
      )
      LEFT OUTER JOIN "reviews_reviewrequest_target_people" ON (
        "reviews_reviewrequest"."id" = "reviews_reviewrequest_target_people"."reviewrequest_id"
      )
      LEFT OUTER JOIN "reviews_reviewrequest_target_groups" ON (
        "reviews_reviewrequest"."id" = "reviews_reviewrequest_target_groups"."reviewrequest_id"
      )
    WHERE
      (
        "reviews_review"."public" = TRUE
        AND "reviews_review"."base_reply_to_id" IS NULL
        AND "reviews_reviewrequest"."local_site_id" IS NULL
        AND (
          "reviews_review"."user_id" = 1
          OR (
            (
              "reviews_reviewrequest"."repository_id" IS NULL
              OR "reviews_reviewrequest"."repository_id" IN (1, 3, 4,)
            )
            AND (
              "reviews_reviewrequest_target_people"."user_id" = 1
              OR "reviews_reviewrequest_target_groups"."group_id" IS NULL
              OR "reviews_reviewrequest_target_groups"."group_id" IN (1, 2, 3, 4)
            )
          )
        )
      )
    ORDER BY
      "reviews_review"."timestamp" ASC
    

    If we run another (with public=False), we get something similar.

    Now let's do the OR:

    str((Review.objects._query(non_admin_user, public=True, status=None, filter_private=True) | Review.objects._query(non_admin_user, public=False, status=None, filter_private=True)).query)
    

    We get:

    SELECT
      DISTINCT "reviews_review"."id",
      "reviews_review"."review_request_id",
      "reviews_review"."user_id",
      "reviews_review"."timestamp",
      "reviews_review"."public",
      "reviews_review"."ship_it",
      "reviews_review"."base_reply_to_id",
      "reviews_review"."email_message_id",
      "reviews_review"."time_emailed",
      "reviews_review"."body_top",
      "reviews_review"."body_top_rich_text",
      "reviews_review"."body_bottom",
      "reviews_review"."body_bottom_rich_text",
      "reviews_review"."body_top_reply_to_id",
      "reviews_review"."body_bottom_reply_to_id",
      "reviews_review"."extra_data",
      "reviews_review"."rich_text",
      "reviews_review"."reviewed_diffset_id"
    FROM
      "reviews_review"
      INNER JOIN "reviews_reviewrequest" ON (
        "reviews_review"."review_request_id" = "reviews_reviewrequest"."id"
      )
      LEFT OUTER JOIN "reviews_reviewrequest_target_people" ON (
        "reviews_reviewrequest"."id" = "reviews_reviewrequest_target_people"."reviewrequest_id"
      )
      LEFT OUTER JOIN "reviews_reviewrequest_target_groups" ON (
        "reviews_reviewrequest"."id" = "reviews_reviewrequest_target_groups"."reviewrequest_id"
      )
    WHERE
      (
        (
          "reviews_review"."public" = TRUE
          AND "reviews_review"."base_reply_to_id" IS NULL
          AND "reviews_reviewrequest"."local_site_id" IS NULL
          AND (
            "reviews_review"."user_id" = 1
            OR (
              (
                "reviews_reviewrequest"."repository_id" IS NULL
                OR "reviews_reviewrequest"."repository_id" IN (1, 2, 3, 4)
              )
              AND (
                "reviews_reviewrequest_target_people"."user_id" = 1
                OR "reviews_reviewrequest_target_groups"."group_id" IS NULL
                OR "reviews_reviewrequest_target_groups"."group_id" IN (1, 2, 3, 4)
              )
            )
          )
        )
        OR (
          "reviews_review"."public" = FALSE
          AND "reviews_review"."base_reply_to_id" IS NULL
          AND "reviews_reviewrequest"."local_site_id" IS NULL
          AND (
            "reviews_review"."user_id" = 1
            OR (
              (
                "reviews_reviewrequest"."repository_id" IS NULL
                OR "reviews_reviewrequest"."repository_id" IN (1, 2, 3 3, 4)
              )
              AND (
                "reviews_reviewrequest_target_people"."user_id" = 1
                OR "reviews_reviewrequest_target_groups"."group_id" IS NULL
                OR "reviews_reviewrequest_target_groups"."group_id" IN (1, 2, 3, 4)
              )
            )
          )
        )
      )
    ORDER BY
      "reviews_review"."timestamp" ASC
    

    Look how expensive that just go with the OR. Way more complicated, and the database is going to take who knows what strategy to check things -- probably a row scan, meaning it searches nearly every single row in the database. Sort of a worst-case scenario. So let's help with that.

    Since public=True | public=False really just means "any value for public, we really just want to eliminate that part of the query. And then have one block of ACL checks.

    We can do this by allowing public to be None, and just omitting the Q(public=...) if it is None. That gets us everything we need, keeping the simpler, faster (still expensive but soon less so) version of the query.

    Database query work tends to be more complex than it appears on the surface, especially since the Django ORM hides a lot of complexity from us. There isn't a great way to test complexity of queries in Django, so it boils down to running the queries like above and seeing what we get (and possibly using SQL like EXPLAIN to figure out what the database would do with it). Happy to go into those tricks offline.

    1. Good to know, thanks for the explanation

  7. reviewboard/reviews/managers.py (Diff revision 6)
     
     
     
     
     
     
     
    Show all issues

    We can just do an assert for str inside the else, to elimate an isinstance check.

  8. Show all issues

    The more fixtures we load, the more expensive per-test setup. We should avoid test_site except for tests that specifically need it, and same with test_scmtools.

  9. Show all issues

    Two important things here:

    1. Private functions always go after public functions.
    2. This function is very specific to certain tests we have now. There are more coming, where these won't be appropriate, and that can lead to shoe-horning in additional behaviors (same problem our existing fixtures have).

    I suggest just passing in AnonymousUser() where we need it, and looking up the users where we need them.

  10. Show all issues

    To keep code consistent, let's always use review_request and review as variable names.

  11. reviewboard/reviews/tests/test_review_manager.py (Diff revision 6)
     
     
     
     
    Show all issues

    assertIn / assertNotIn can be very misleading. It doesn't always tell us the whole story. For instance, we might expect that because an item is or is not in the results that we've gotten the results we expect, but what if we get a review request we don't expect? Or assumed there'd be one in there where there isn't?

    Not necessarily an issue for all of these tests, but since we're testing for accessibility, we want to be very explicit. Let's instead do:

    self.assertQuerysetEqual(
        Review.objects.accessible(...),
        [list of review request options])
    

    Same applies to all tests.

  12. 
      
maubin
maubin
maubin
maubin
Review request changed
Change Summary:
  • Moved access control logic for removing drafts that do not belong to the user into the _query method
  • Removed unnecessary transform parameter from assertQuerysetEqual
Commits:
Summary ID Author
Add a method for returning reviews that are accessible by a given user
to ReviewManager
3c723a5d88344567e29e790391daa4e3cb22033d Michelle
Add a method for returning reviews that are accessible by a given user
to ReviewManager
77093de89e972b8029468b7725402c5b6f3de496 Michelle

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

maubin
maubin
maubin
maubin
maubin
chipx86
  1. Looking great! Thanks for bearing with me on the changes :)

    Two small nits, and then Ship It!

  2. reviewboard/reviews/managers.py (Diff revision 15)
     
     
     
     
    Show all issues

    No blank line after method-level docstrings.

  3. reviewboard/reviews/managers.py (Diff revision 15)
     
     
    Show all issues

    Let's change this to "Added the ..." That'll be more consistent, and reads a bit better.

  4. 
      
maubin
maubin
david
  1. Ship It!
  2. 
      
maubin
Review request changed
Status:
Completed
Change Summary:
Pushed to release-5.0.x (e95107b)