Adding ability to drop old issues on new review.
Review Request #10061 — Created July 3, 2018 and submitted
Adding ability to drop old issues on new review.
When a user publishes a new review request, all bots re-run and open new issues. If the publisher didn't fix the old issues (maybe they just added a new file) they now have double the issues, half of which are duplicates. The bot should be responsible for dropping the old issues, if it is configured as such.
This change adds support for configuring each bot to drop old issues when a new review request is published. All changes are done server-side for better performance (closest to the database) and better user experience (issues dropped as soon as extension is executed).
Note this change also changes the service_id to be based off the configuration name. (This is required so that the same tools configured 2 different ways doesn't drop the other's issues).
Review with previous issues:
- Post new review with issues fixed -> Drops old issues. (*)
- Post new review with issues not fixed -> Drops old issues
Review with no previous issues:
- Post a new review with issues -> Opens new issues.
- Post a new review without issues -> Still passing, doesn't drop anything.Two configurations using the same tool, one dropping old one without:
- One tool drops its old issues, the other doesn't.In all cases:
- Only dropped issues when box was checked(*) - We might think of a better solution to this. If the user fixes the issues, by the time they publish the new review, the old issues will be "dropped" instead of "fixed". That might be OK (mark the issues fixed before you repost?). I'd like to avoid having to semantically deduce if the issue was "fixed" or dropped. (Maybe they just moved everything down a line).
Description | From | Last Updated |
---|---|---|
The idea is interesting, but I had to read the code to really get what this was going for. Can … |
chipx86 | |
E501 line too long (110 > 79 characters) |
reviewbot | |
I guess I assumed that the names had to be unique, but I just tried it and that's not true. … |
jcannon | |
You can group all the filters together in one filter() call. We also have a chaining format we like to … |
chipx86 | |
Can you add a blank line between these? We separate out blocks (for loops, if statements, etc.) from other neighboring … |
chipx86 | |
Going to leave a large comment here. Bear with me :) This code has to do an awful lot, more … |
chipx86 | |
E501 line too long (96 > 79 characters) |
reviewbot | |
This should probably be "Drop old open issues" instead of "Drops old open issues" |
david | |
This needs a docstring. |
david |
- Change Summary:
-
Now with less logging!
- Commit:
-
e5da34d56fcd923021e59ffa6136d6cada30c3e0d58269e6cecdee5e361d47fc38cc6147fb3f00c8
Checks run (2 succeeded)
- Change Summary:
-
Removed period from form label.
- Commit:
-
d58269e6cecdee5e361d47fc38cc6147fb3f00c8a9871230be10d1b99340233c0169853317eef02d
Checks run (2 succeeded)
-
-
The idea is interesting, but I had to read the code to really get what this was going for. Can you update the description of the change to more clearly describe the scenario that this addresses?
We have a guide for how we like to see descriptions for the commits and review requests, which provides some examples of the format.
https://www.notion.so/reviewboard/Writing-Good-Change-Descriptions-10529e7c207743fa8ca90153d4b21fea
-
You can group all the filters together in one
filter()
call. We also have a chaining format we like to use for complex queries, which helps with readability. We also know we're going to be fetching the reviews, so we can include that along with the results here, reducing query counts.This would look like:
status_updates = ( StatusUpdates.objects .filter(user=user, service_id=service_id, review_request=review_request) .exclude(review__isnull=True) .select_related('review') )
-
Can you add a blank line between these? We separate out blocks (for loops, if statements, etc.) from other neighboring statements.
-
Going to leave a large comment here. Bear with me :)
This code has to do an awful lot, more than it seems, to update the various states around issues. We can do a bit right now to simplify this, but there's a lot we can't do cleanly from within Review Bot (and I have suggestions below).
Let's put together a scenario. Say Review Bot opened 50 diff comments on your code before (this is not an unrealistic number). 10 of those issues were fixed, 40 remain. A new change is then put up. (We're also going to pretend that this is only querying diff comments, just to help keep this simple -- really, the fact that other comment types are involved means the query counts listed below will be under the real numbers).
So, we get to this function and we begin closing comments. The original code does (for just diff comments):
# From the for loop iteration... comments = update.review.comments.all() for comment in comments: if comment.issue_status == BaseComment.OPEN: comment.issue_status = BaseComment.DROPPED comment.save()
To start, we're fetching every single comment, and all its data, just so we can begin checking the issue statuses and possibly modifying them. That means 50 results coming back, even though we only care about 40. This is expensive, and not really necessary. So let's simplify:
# From the for loop iteration... comments = update.review.comments.all() comments = comments.filter(issue_status=BaseComment.OPEN) for comment in comments: comment.issue_status = BaseComment.DROPPED comment.save()
That's a little better. We're down to 40 objects worth of data, because we're letting the database take care of checking issue statuses. Still, we're fetching a lot -- comment text, extra_data content, various IDs and stats... but we can't do much about that right now, because of the code in Review Board.
Before I continue, let's talk about what happens when you save a comment. We don't just write new fields to the database. Rather, we're writing and then updating state on both the review (the latest activity timestamp) and the review request (issue state counters). This means that every
comment.save()
comes out to 3 queries, and we're doing this for every item in the list.That means that even though we're down to fetching 40 comments, we end up performing a total of 160 database queries (40 * (1 read + 3 save)). Not ideal.
If we were simply updating state in the database and not tracking the rest of this, we could just do:
comments.filter(issue_status=BaseComment.OPEN).update(issue_status=BaseComment.DROPPED)
And that would be great! One single SQL query per comment type. A fixed count that doesn't increase with the number of comments. That'd be doable if we could pull out the review request and review saves. And that is doable if we had a function that was all about setting just issue states and doing the review/review request updating after all those were done.
I think that'd what I'd ideally like to see. I'd prefer it live in Review Board, so this can be usable outside of Review Bot, but then Review Bot would require a newer version of Review Board... Could still be okay if we check for this and disable the option if not new enough...
Anyway, so what would that look like... Well, it'd first need to go through all the comments we want to update on each review and perform the above SQL statement. Once all are updated, we'd want Review Board to re-compute the issue counts for the review request (ideally we'd track old/new counts, but that requires more complicated queries that I'd rather not have you do for this). Then we need to update the timestamps on the reviews.
So something like:
now = timezone.now() any_updated = False for update in status_updates: review = update.review review_updated = False for comments in (review.comments, review.screenshot_comments, review.file_attachment_comments, review.general_comments): comments = comments.filter(issue_status=BaseComment.OPEN) count = comments.update(issue_status=BaseComment.DROPPED, timestamp=now) if count > 0: review_updated = True any_updated = True if review_updated: review.last_review_activity_timestamp = now review.save(update_fields=('last_review_activity_timestamp',)) if any_updated: # Due to how review request issue counters initialize, this will # actually trigger updating every issue counter. review_request.reinit_issue_open_count()
I think that would do it, but it'd need testing (including comprehensive unit tests).
The end result is that updating just diff comments out of 50 filed comments across one review would result in 9 queries (4 for the comment queries, 4 for the diff comment updates, 1 for the review update) + a recalculation of the issue counters (a fixed cost as well). Bump that up to 1000 comments filed across 1 review, and the counts don't change. Split that into two reviews, and we go up to 18 queries.
More work could be done to further reduce (pre-fetching counts for all types across all reviews and then further limiting what we're fetching based on those counts), but this would be the first step toward that. Further optimization could be made once we know for sure that the above works.
(Hope you don't regret posting this change Crash-course on query optimization in Django and Review Board.)
- Description:
-
~ Adding ability to drop old issues on new review.
~ Changing the service_id to be based off the configuration name. (This is required so that the same tools configured 2 different ways doesn't drop the other's issues). ~ Adding ability to drop old issues on new review.
~ + When a user publishes a new review request, all bots re-run and open new issues. If the publisher didn't fix the old issues (maybe they just added a new file) they now have double the issues, half of which are duplicates. The bot should be responsible for dropping the old issues, if it is configured as such.
+ + This change adds support for configuring each bot to drop old issues when a new review request is published. All changes are done server-side for better performance (closest to the database) and better user experience (issues dropped as soon as extension is executed).
+ + Note this change also changes the service_id to be based off the configuration name. (This is required so that the same tools configured 2 different ways doesn't drop the other's issues).
- Bugs:
- Change Summary:
-
- Punting the bulk of the logic to Review Board (will be posted in a new review).
- Changed the boolean field to be disabled if Review Board isn't new enough (missing the added method)
- Changed the
service_id
to be based off the config id
- Commit:
-
a9871230be10d1b99340233c0169853317eef02d0f286e3b6619637b1051a74d8828f28d02c5b1bd
- Change Summary:
-
Fixing line length
- Commit:
-
0f286e3b6619637b1051a74d8828f28d02c5b1bd95c36074047625824d92669020aa391d52a709a8