Summary

Adding ability to drop old issues on new review.

Review Request #10061 — Created July 3, 2018 and submitted July 31, 2020, 1:51 p.m.

Information

Owner

jcannon

Repository

ReviewBot

Branch

master

Bugs

4709

Depends On

~~10075~~

Commit

95c3607...

Reviewers

Groups

reviewbot

People

Description

Adding ability to drop old issues on new review.

When a user publishes a new review request, all bots re-run and open new issues. If the publisher didn't fix the old issues (maybe they just added a new file) they now have double the issues, half of which are duplicates. The bot should be responsible for dropping the old issues, if it is configured as such.

This change adds support for configuring each bot to drop old issues when a new review request is published. All changes are done server-side for better performance (closest to the database) and better user experience (issues dropped as soon as extension is executed).

Note this change also changes the service_id to be based off the configuration name. (This is required so that the same tools configured 2 different ways doesn't drop the other's issues).

Testing Done

Review with previous issues:
- Post new review with issues fixed -> Drops old issues. (*)
- Post new review with issues not fixed -> Drops old issues
Review with no previous issues:
- Post a new review with issues -> Opens new issues.
- Post a new review without issues -> Still passing, doesn't drop anything.

Two configurations using the same tool, one dropping old one without:
- One tool drops its old issues, the other doesn't.

In all cases:
- Only dropped issues when box was checked

(*) - We might think of a better solution to this. If the user fixes the issues, by the time they publish the new review, the old issues will be "dropped" instead of "fixed". That might be OK (mark the issues fixed before you repost?). I'd like to avoid having to semantically deduce if the issue was "fixed" or dropped. (Maybe they just moved everything down a line).

Issues

Description	From	Last Updated
The idea is interesting, but I had to read the code to really get what this was going for. Can …	chipx86	July 9, 2018, 7:06 a.m.
E501 line too long (110 > 79 characters)	reviewbot	July 3, 2018, 8:16 a.m.
I guess I assumed that the names had to be unique, but I just tried it and that's not true. …	jcannon	July 9, 2018, 2:27 p.m.
You can group all the filters together in one filter() call. We also have a chaining format we like to …	chipx86	July 9, 2018, 2:27 p.m.
Can you add a blank line between these? We separate out blocks (for loops, if statements, etc.) from other neighboring …	chipx86	July 9, 2018, 2:27 p.m.
Going to leave a large comment here. Bear with me :) This code has to do an awful lot, more …	chipx86	July 9, 2018, 2:27 p.m.
E501 line too long (96 > 79 characters)	reviewbot	July 9, 2018, 2:29 p.m.
This should probably be "Drop old open issues" instead of "Drops old open issues"	david	July 31, 2020, 1:48 p.m.
This needs a docstring.	david	July 31, 2020, 1:48 p.m.

flake8 failed.

JSHint passed.

flake8

extension/reviewbotext/integration.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E501 line too long (110 > 79 characters)
```

Change Summary:

Now with less logging!

Commit:

e5da34d56fcd923021e59ffa6136d6cada30c3e0

d58269e6cecdee5e361d47fc38cc6147fb3f00c8

Diff:

Revision 2 (+31 -1)

Show changes

	extension/reviewbotext/forms.py
	extension/reviewbotext/integration.py

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

extension/reviewbotext/integration.py (Diff revision 2)

The issue has been resolved. Show all issues

I guess I assumed that the names had to be unique, but I just tried it and that's not true.
Perhaps we should also mix in the integration_id?

jcannon

July 3, 2018, 2:29 p.m.

Thinking about this some more, I think we should only use the integration_id since the admin could change the name of a configuration.

jcannon

July 9, 2018, 11:51 a.m.

Slight correction, it should use config.id.

Change Summary:

Removed period from form label.

Commit:

d58269e6cecdee5e361d47fc38cc6147fb3f00c8

a9871230be10d1b99340233c0169853317eef02d

Diff:

Revision 3 (+31 -1)

Show changes

	extension/reviewbotext/forms.py
	extension/reviewbotext/integration.py

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

The issue has been resolved. Show all issues

The idea is interesting, but I had to read the code to really get what this was going for. Can you update the description of the change to more clearly describe the scenario that this addresses?

We have a guide for how we like to see descriptions for the commits and review requests, which provides some examples of the format.

https://www.notion.so/reviewboard/Writing-Good-Change-Descriptions-10529e7c207743fa8ca90153d4b21fea

extension/reviewbotext/integration.py (Diff revision 3)

The issue has been resolved. Show all issues

You can group all the filters together in one filter() call. We also have a chaining format we like to use for complex queries, which helps with readability. We also know we're going to be fetching the reviews, so we can include that along with the results here, reducing query counts.
This would look like:

status_updates = (
    StatusUpdates.objects
    .filter(user=user,
            service_id=service_id,
            review_request=review_request)
    .exclude(review__isnull=True)
    .select_related('review')
)

jcannon

July 9, 2018, 7:11 a.m.

For whatever a reason I had come under the assumption putting several keyword arguments in the filter wasn't possible, and was a little frustrated. Glad to see the world makes sense again.

select_related is interesting. Thanks for the tip.

extension/reviewbotext/integration.py (Diff revision 3)

The issue has been resolved. Show all issues

Can you add a blank line between these? We separate out blocks (for loops, if statements, etc.) from other neighboring statements.

extension/reviewbotext/integration.py (Diff revision 3)

The issue has been resolved. Show all issues

Going to leave a large comment here. Bear with me :)

This code has to do an awful lot, more than it seems, to update the various states around issues. We can do a bit right now to simplify this, but there's a lot we can't do cleanly from within Review Bot (and I have suggestions below).

Let's put together a scenario. Say Review Bot opened 50 diff comments on your code before (this is not an unrealistic number). 10 of those issues were fixed, 40 remain. A new change is then put up. (We're also going to pretend that this is only querying diff comments, just to help keep this simple -- really, the fact that other comment types are involved means the query counts listed below will be under the real numbers).

So, we get to this function and we begin closing comments. The original code does (for just diff comments):

# From the for loop iteration...
comments = update.review.comments.all()

for comment in comments:
    if comment.issue_status == BaseComment.OPEN:
        comment.issue_status = BaseComment.DROPPED
        comment.save()

To start, we're fetching every single comment, and all its data, just so we can begin checking the issue statuses and possibly modifying them. That means 50 results coming back, even though we only care about 40. This is expensive, and not really necessary. So let's simplify:

# From the for loop iteration...
comments = update.review.comments.all()

comments = comments.filter(issue_status=BaseComment.OPEN)

for comment in comments:
    comment.issue_status = BaseComment.DROPPED
    comment.save()

That's a little better. We're down to 40 objects worth of data, because we're letting the database take care of checking issue statuses. Still, we're fetching a lot -- comment text, extra_data content, various IDs and stats... but we can't do much about that right now, because of the code in Review Board.

Before I continue, let's talk about what happens when you save a comment. We don't just write new fields to the database. Rather, we're writing and then updating state on both the review (the latest activity timestamp) and the review request (issue state counters). This means that every comment.save() comes out to 3 queries, and we're doing this for every item in the list.

That means that even though we're down to fetching 40 comments, we end up performing a total of 160 database queries (40 * (1 read + 3 save)). Not ideal.

If we were simply updating state in the database and not tracking the rest of this, we could just do:

comments.filter(issue_status=BaseComment.OPEN).update(issue_status=BaseComment.DROPPED)

And that would be great! One single SQL query per comment type. A fixed count that doesn't increase with the number of comments. That'd be doable if we could pull out the review request and review saves. And that is doable if we had a function that was all about setting just issue states and doing the review/review request updating after all those were done.

I think that'd what I'd ideally like to see. I'd prefer it live in Review Board, so this can be usable outside of Review Bot, but then Review Bot would require a newer version of Review Board... Could still be okay if we check for this and disable the option if not new enough...

Anyway, so what would that look like... Well, it'd first need to go through all the comments we want to update on each review and perform the above SQL statement. Once all are updated, we'd want Review Board to re-compute the issue counts for the review request (ideally we'd track old/new counts, but that requires more complicated queries that I'd rather not have you do for this). Then we need to update the timestamps on the reviews.

So something like:

now = timezone.now()
any_updated = False

for update in status_updates:
    review = update.review
    review_updated = False

    for comments in (review.comments,
                     review.screenshot_comments,
                     review.file_attachment_comments,
                     review.general_comments):
        comments = comments.filter(issue_status=BaseComment.OPEN)
        count = comments.update(issue_status=BaseComment.DROPPED,
                                timestamp=now)

        if count > 0:
            review_updated = True
            any_updated = True

    if review_updated:
        review.last_review_activity_timestamp = now
        review.save(update_fields=('last_review_activity_timestamp',))

if any_updated:
    # Due to how review request issue counters initialize, this will
    # actually trigger updating every issue counter.
    review_request.reinit_issue_open_count()

I think that would do it, but it'd need testing (including comprehensive unit tests).

The end result is that updating just diff comments out of 50 filed comments across one review would result in 9 queries (4 for the comment queries, 4 for the diff comment updates, 1 for the review update) + a recalculation of the issue counters (a fixed cost as well). Bump that up to 1000 comments filed across 1 review, and the counts don't change. Split that into two reviews, and we go up to 18 queries.

More work could be done to further reduce (pre-fetching counts for all types across all reviews and then further limiting what we're fetching based on those counts), but this would be the first step toward that. Further optimization could be made once we know for sure that the above works.

(Hope you don't regret posting this change Crash-course on query optimization in Django and Review Board.)

jcannon

July 9, 2018, 8:57 a.m.

Don't regret at all

It's unfortunate what is Pythonic isn't Django-ic (the for loop is a Python backbone, but is seemingly very noisy in Django).

The Django optimizations make sense (filter instead of for-if). The Review Board ones are harder to follow (smaller domain) but I think I get it. Saving a comment performs more than one operation (has to update info on the review request itself), and there isn't a way to "save" several comments at once.

I'd prefer it live in Review Board

If I understand correctly, you'd like to see something like drop_open_issues added to StatusUpdate (not sure if this functionality makes as much sense in the context of a "review", but I might be wrong here), where the body is essentially the last code snippit? That makes sense.

As far as Review Bot and Review Board versioning goes, what does that look like to you? Is there a way to toggle this feature based on version? Is it as easy as if hasattr(status_update, 'drop_open_issues)`?

Description:: ~
Adding ability to drop old issues on new review.
~ Changing the service_id to be based off the configuration name. (This is required so that the same tools configured 2 different ways doesn't drop the other's issues).
~
Adding ability to drop old issues on new review.
~
+
When a user publishes a new review request, all bots re-run and open new issues. If the publisher didn't fix the old issues (maybe they just added a new file) they now have double the issues, half of which are duplicates. The bot should be responsible for dropping the old issues, if it is configured as such.
+
+
This change adds support for configuring each bot to drop old issues when a new review request is published. All changes are done server-side for better performance (closest to the database) and better user experience (issues dropped as soon as extension is executed).
+
+
Note this change also changes the service_id to be based off the configuration name. (This is required so that the same tools configured 2 different ways doesn't drop the other's issues).
Bugs:: 4709

Change Summary:


Punting the bulk of the logic to Review Board (will be posted in a new review).
Changed the boolean field to be disabled if Review Board isn't new enough (missing the added method)
Changed the service_id to be based off the config id

Commit:

a9871230be10d1b99340233c0169853317eef02d

0f286e3b6619637b1051a74d8828f28d02c5b1bd

Diff:

Revision 4 (+36 -1)

Show changes

	extension/reviewbotext/forms.py
	extension/reviewbotext/integration.py

Checks run (1 failed, 1 succeeded)

flake8 failed.

JSHint passed.

flake8

extension/reviewbotext/forms.py (Diff revision 4)
The issue has been resolved. Show all issues
```
E501 line too long (96 > 79 characters)
```

Change Summary:

Fixing line length

Commit:

0f286e3b6619637b1051a74d8828f28d02c5b1bd

95c36074047625824d92669020aa391d52a709a8

Diff:

Revision 5 (+37 -1)

Show changes

	extension/reviewbotext/forms.py
	extension/reviewbotext/integration.py

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

Depends On:: ~~10075 - Adding 'drop_open_issues' to StatusUpdate~~

```
bump
```

```
bump
```

extension/reviewbotext/forms.py (Diff revision 5)
The issue has been resolved. Show all issues
```
This should probably be "Drop old open issues" instead of "Drops old open issues"
```
extension/reviewbotext/integration.py (Diff revision 5)
The issue has been resolved. Show all issues
```
This needs a docstring.
```

any progress here? :-)

jcannon

Jan. 29, 2020, 5:28 a.m.

My company is no longer planning on upgrading to ReviewBoard 3.0 for the forseeable future. Feel free to steal my code and continue the changes if you'd like.

Ship it!

Going to make some minor changes and get this landed. Thanks!

Status:: Completed
Change Summary:: Pushed to master (1ccc5f5)

~		Adding ability to drop old issues on new review.
~		Changing the service_id to be based off the configuration name. (This is required so that the same tools configured 2 different ways doesn't drop the other's issues).
	~	Adding ability to drop old issues on new review.
	~
	+	When a user publishes a new review request, all bots re-run and open new issues. If the publisher didn't fix the old issues (maybe they just added a new file) they now have double the issues, half of which are duplicates. The bot should be responsible for dropping the old issues, if it is configured as such.
	+
	+	This change adds support for configuring each bot to drop old issues when a new review request is published. All changes are done server-side for better performance (closest to the database) and better user experience (issues dropped as soon as extension is executed).
	+
	+	Note this change also changes the service_id to be based off the configuration name. (This is required so that the same tools configured 2 different ways doesn't drop the other's issues).