Adding ability to drop old issues on new review.

Review Request #10061 — Created July 3, 2018 and updated

jcannon
ReviewBot
master
4709
10075
95c3607...
reviewbot

Adding ability to drop old issues on new review.

When a user publishes a new review request, all bots re-run and open new issues. If the publisher didn't fix the old issues (maybe they just added a new file) they now have double the issues, half of which are duplicates. The bot should be responsible for dropping the old issues, if it is configured as such.

This change adds support for configuring each bot to drop old issues when a new review request is published. All changes are done server-side for better performance (closest to the database) and better user experience (issues dropped as soon as extension is executed).

Note this change also changes the service_id to be based off the configuration name. (This is required so that the same tools configured 2 different ways doesn't drop the other's issues).

Review with previous issues:
- Post new review with issues fixed -> Drops old issues. (*)
- Post new review with issues not fixed -> Drops old issues
Review with no previous issues:
- Post a new review with issues -> Opens new issues.
- Post a new review without issues -> Still passing, doesn't drop anything.

Two configurations using the same tool, one dropping old one without:
- One tool drops its old issues, the other doesn't.

In all cases:
- Only dropped issues when box was checked

(*) - We might think of a better solution to this. If the user fixes the issues, by the time they publish the new review, the old issues will be "dropped" instead of "fixed". That might be OK (mark the issues fixed before you repost?). I'd like to avoid having to semantically deduce if the issue was "fixed" or dropped. (Maybe they just moved everything down a line).

  • 0
  • 0
  • 7
  • 0
  • 7
Description From Last Updated
Checks run (1 failed, 1 succeeded)
flake8 failed.
JSHint passed.

flake8

jcannon
jcannon
  1. 
      
  2. extension/reviewbotext/integration.py (Diff revision 2)
     
     

    I guess I assumed that the names had to be unique, but I just tried it and that's not true.
    Perhaps we should also mix in the integration_id?

    1. Thinking about this some more, I think we should only use the integration_id since the admin could change the name of a configuration.

    2. Slight correction, it should use config.id.

  3. 
      
jcannon
chipx86
  1. 
      
  2. The idea is interesting, but I had to read the code to really get what this was going for. Can you update the description of the change to more clearly describe the scenario that this addresses?

    We have a guide for how we like to see descriptions for the commits and review requests, which provides some examples of the format.

    https://www.notion.so/reviewboard/Writing-Good-Change-Descriptions-10529e7c207743fa8ca90153d4b21fea

  3. extension/reviewbotext/integration.py (Diff revision 3)
     
     
     
     
     
     

    You can group all the filters together in one filter() call. We also have a chaining format we like to use for complex queries, which helps with readability. We also know we're going to be fetching the reviews, so we can include that along with the results here, reducing query counts.

    This would look like:

    status_updates = (
        StatusUpdates.objects
        .filter(user=user,
                service_id=service_id,
                review_request=review_request)
        .exclude(review__isnull=True)
        .select_related('review')
    )
    
    1. For whatever a reason I had come under the assumption putting several keyword arguments in the filter wasn't possible, and was a little frustrated. Glad to see the world makes sense again.

      select_related is interesting. Thanks for the tip.

  4. extension/reviewbotext/integration.py (Diff revision 3)
     
     
     

    Can you add a blank line between these? We separate out blocks (for loops, if statements, etc.) from other neighboring statements.

  5. extension/reviewbotext/integration.py (Diff revision 3)
     
     
     
     

    Going to leave a large comment here. Bear with me :)

    This code has to do an awful lot, more than it seems, to update the various states around issues. We can do a bit right now to simplify this, but there's a lot we can't do cleanly from within Review Bot (and I have suggestions below).

    Let's put together a scenario. Say Review Bot opened 50 diff comments on your code before (this is not an unrealistic number). 10 of those issues were fixed, 40 remain. A new change is then put up. (We're also going to pretend that this is only querying diff comments, just to help keep this simple -- really, the fact that other comment types are involved means the query counts listed below will be under the real numbers).

    So, we get to this function and we begin closing comments. The original code does (for just diff comments):

    # From the for loop iteration...
    comments = update.review.comments.all()
    
    for comment in comments:
        if comment.issue_status == BaseComment.OPEN:
            comment.issue_status = BaseComment.DROPPED
            comment.save()
    

    To start, we're fetching every single comment, and all its data, just so we can begin checking the issue statuses and possibly modifying them. That means 50 results coming back, even though we only care about 40. This is expensive, and not really necessary. So let's simplify:

    # From the for loop iteration...
    comments = update.review.comments.all()
    
    comments = comments.filter(issue_status=BaseComment.OPEN)
    
    for comment in comments:
        comment.issue_status = BaseComment.DROPPED
        comment.save()
    

    That's a little better. We're down to 40 objects worth of data, because we're letting the database take care of checking issue statuses. Still, we're fetching a lot -- comment text, extra_data content, various IDs and stats... but we can't do much about that right now, because of the code in Review Board.

    Before I continue, let's talk about what happens when you save a comment. We don't just write new fields to the database. Rather, we're writing and then updating state on both the review (the latest activity timestamp) and the review request (issue state counters). This means that every comment.save() comes out to 3 queries, and we're doing this for every item in the list.

    That means that even though we're down to fetching 40 comments, we end up performing a total of 160 database queries (40 * (1 read + 3 save)). Not ideal.

    If we were simply updating state in the database and not tracking the rest of this, we could just do:

    comments.filter(issue_status=BaseComment.OPEN).update(issue_status=BaseComment.DROPPED)
    

    And that would be great! One single SQL query per comment type. A fixed count that doesn't increase with the number of comments. That'd be doable if we could pull out the review request and review saves. And that is doable if we had a function that was all about setting just issue states and doing the review/review request updating after all those were done.

    I think that'd what I'd ideally like to see. I'd prefer it live in Review Board, so this can be usable outside of Review Bot, but then Review Bot would require a newer version of Review Board... Could still be okay if we check for this and disable the option if not new enough...

    Anyway, so what would that look like... Well, it'd first need to go through all the comments we want to update on each review and perform the above SQL statement. Once all are updated, we'd want Review Board to re-compute the issue counts for the review request (ideally we'd track old/new counts, but that requires more complicated queries that I'd rather not have you do for this). Then we need to update the timestamps on the reviews.

    So something like:

    now = timezone.now()
    any_updated = False
    
    for update in status_updates:
        review = update.review
        review_updated = False
    
        for comments in (review.comments,
                         review.screenshot_comments,
                         review.file_attachment_comments,
                         review.general_comments):
            comments = comments.filter(issue_status=BaseComment.OPEN)
            count = comments.update(issue_status=BaseComment.DROPPED,
                                    timestamp=now)
    
            if count > 0:
                review_updated = True
                any_updated = True
    
        if review_updated:
            review.last_review_activity_timestamp = now
            review.save(update_fields=('last_review_activity_timestamp',))
    
    if any_updated:
        # Due to how review request issue counters initialize, this will
        # actually trigger updating every issue counter.
        review_request.reinit_issue_open_count()
    

    I think that would do it, but it'd need testing (including comprehensive unit tests).

    The end result is that updating just diff comments out of 50 filed comments across one review would result in 9 queries (4 for the comment queries, 4 for the diff comment updates, 1 for the review update) + a recalculation of the issue counters (a fixed cost as well). Bump that up to 1000 comments filed across 1 review, and the counts don't change. Split that into two reviews, and we go up to 18 queries.

    More work could be done to further reduce (pre-fetching counts for all types across all reviews and then further limiting what we're fetching based on those counts), but this would be the first step toward that. Further optimization could be made once we know for sure that the above works.

    (Hope you don't regret posting this change 😄 Crash-course on query optimization in Django and Review Board.)

    1. Don't regret at all 😛

      It's unfortunate what is Pythonic isn't Django-ic (the for loop is a Python backbone, but is seemingly very noisy in Django).

      The Django optimizations make sense (filter instead of for-if). The Review Board ones are harder to follow (smaller domain) but I think I get it. Saving a comment performs more than one operation (has to update info on the review request itself), and there isn't a way to "save" several comments at once.

      I'd prefer it live in Review Board

      If I understand correctly, you'd like to see something like drop_open_issues added to StatusUpdate (not sure if this functionality makes as much sense in the context of a "review", but I might be wrong here), where the body is essentially the last code snippit? That makes sense.

      As far as Review Bot and Review Board versioning goes, what does that look like to you? Is there a way to toggle this feature based on version? Is it as easy as if hasattr(status_update, 'drop_open_issues)`?

  6. 
      
jcannon
jcannon
Review request changed

Change Summary:

  • Punting the bulk of the logic to Review Board (will be posted in a new review).
  • Changed the boolean field to be disabled if Review Board isn't new enough (missing the added method)
  • Changed the service_id to be based off the config id

Commit:

-a9871230be10d1b99340233c0169853317eef02d
+0f286e3b6619637b1051a74d8828f28d02c5b1bd

Diff:

Revision 4 (+36 -1)

Show changes

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

jcannon
jcannon
Review request changed
jcannon
  1. bump

  2. 
      
jcannon
  1. bump

  2. 
      
Loading...