Support on-the-fly search indexing
Review Request #8601 — Created Jan. 5, 2017 and submitted
On-the-fly search indexing is now supported for the Elasticsearch search
backend. When the option is enabled (from the search settings
administration form), the search index will be updated whenever review
requests are updated or deleted.Additionally, the default value of the Whoosh search index has been
renamed to "search-index", which was the original default; it was
accidentally changed to "search_index" in a previous change.
Ran unit tests.
Description | From | Last Updated |
---|---|---|
You probably didn't mean to include cat.jpg. |
chipx86 | |
'elasticsearch' imported but unused |
reviewbot | |
'elasticsearch' imported but unused |
reviewbot | |
One blank line. |
chipx86 | |
Each item should be sentence casing, and no "; and" or any other delimiter between items. This is the standard … |
chipx86 | |
We also need to index Users when created/modified/deleted or when group membership changes (see accounts/search_indexes.py). |
chipx86 | |
Missing docstring. |
chipx86 | |
Missing docstring. |
chipx86 | |
**kwargs |
chipx86 | |
Can we put review_request in the function argument list instead? By the way, if this function's going to be specific … |
chipx86 | |
**kwargs |
chipx86 | |
Alphabetical order. |
chipx86 | |
As discussed in Slack, we're going to want to allow this for Whoosh. It may not be great for large … |
chipx86 | |
Probably better as self.assertIsInstance(...) |
chipx86 | |
assertIsInstance |
chipx86 | |
'django_reset' imported but unused |
reviewbot | |
'from settings_local import *' used; unable to detect undefined names |
reviewbot | |
"whe" -> "when" |
chipx86 | |
Two blank lines. |
chipx86 | |
I don't think the way this is all working is thread-safe. If multiple threads are all going through the signal … |
chipx86 | |
Can we move this below, so it's closer to where we connect it? |
chipx86 | |
Just wondering, but is there any chance that this can negatively impact consumers today? Or does this situation emit a … |
chipx86 | |
:py:attr: here? |
chipx86 | |
"wasn't instead" ? |
chipx86 | |
Col: 80 E501 line too long (101 > 79 characters) |
reviewbot | |
Blank line between these. |
chipx86 | |
Only one "g" in "arguments" :) |
chipx86 | |
This looks like a Shortest Middle Snake comment. |
chipx86 | |
Since it's the search indexing specifically that needs this, I'd prefer not to put this variable in the signal emission. … |
chipx86 | |
'elasticsearch' imported but unused |
reviewbot | |
"on-the-fly indexing", I think. |
chipx86 | |
Can be one statement. |
chipx86 | |
'django_reset' imported but unused |
reviewbot | |
'from settings_local import *' used; unable to detect undefined names |
reviewbot | |
Do we need both? Did the name change? |
chipx86 | |
'elasticsearch' imported but unused |
reviewbot | |
Just to help future expandability, can we put Group into a list like we do with save/delete signals? |
chipx86 | |
Col: 80 E501 line too long (86 > 79 characters) |
reviewbot | |
You can wrap before the < to satisfy pyflakes. |
chipx86 | |
Alignment is off by one. |
chipx86 | |
Wasn't what? |
chipx86 | |
Col: 80 E501 line too long (105 > 79 characters) |
reviewbot | |
Lists can't be indented, or they'll render incorrectly. The * must be aligned with the preceding paragraph. |
chipx86 | |
Looks like some spaces snuck in here. |
chipx86 | |
There's an extra space after pk_set. |
chipx86 | |
'django_reset' imported but unused |
reviewbot | |
'from settings_local import *' used; unable to detect undefined names |
reviewbot | |
'elasticsearch' imported but unused |
reviewbot | |
Col: 80 E501 line too long (105 > 79 characters) |
reviewbot | |
Col: 1 W391 blank line at end of file |
reviewbot | |
'django_reset' imported but unused |
reviewbot | |
'from settings_local import *' used; unable to detect undefined names |
reviewbot | |
'elasticsearch' imported but unused |
reviewbot | |
Why this change? |
chipx86 | |
"on-the-fly" |
chipx86 | |
Col: 80 E501 line too long (105 > 79 characters) |
reviewbot | |
reviewboard.reviews.models.group.Group, here and anywhere else. |
chipx86 | |
'django_reset' imported but unused |
reviewbot | |
'from settings_local import *' used; unable to detect undefined names |
reviewbot | |
I would say "... for large or multi-server installs." |
chipx86 |
- Testing Done:
-
~ Updated & deleted review requests and saw the resulting changes in the
~ Ran unit tests.
- search index when the option was enabled. - Diff:
-
Revision 2 (+169 -3)
-
Tool: Pyflakes Processed Files: reviewboard/settings.py reviewboard/search/search_backends/registry.py reviewboard/search/tests.py reviewboard/search/signal_processor.py reviewboard/search/search_backends/base.py reviewboard/search/search_backends/elasticsearch.py Ignored Files: reviewboard/htdocs/media/cat.jpg Tool: PEP8 Style Checker Processed Files: reviewboard/settings.py reviewboard/search/search_backends/registry.py reviewboard/search/tests.py reviewboard/search/signal_processor.py reviewboard/search/search_backends/base.py reviewboard/search/search_backends/elasticsearch.py Ignored Files: reviewboard/htdocs/media/cat.jpg
-
-
-
-
-
-
Each item should be sentence casing, and no "; and" or any other delimiter between items. This is the standard form for lists in documentation.
-
We also need to index Users when created/modified/deleted or when group membership changes (see
accounts/search_indexes.py
). -
-
-
-
Can we put
review_request
in the function argument list instead?By the way, if this function's going to be specific to review requests, we should rename the function and update how it's connected.
-
-
-
As discussed in Slack, we're going to want to allow this for Whoosh. It may not be great for large deployments, but smaller ones should definitely be able to start out with on-the-fly indexing.
-
-
- Change Summary:
-
- Users are now indexed on the fly.
- M2M fields for users are now indexed correctly.
- The on the fly indexing option is now a generic search option and is off by default (because Whoosh is the default engine). A note is displayed below the option indicating it is not intended for large installs when using the Whoosh engine.
- Diff:
-
Revision 3 (+355 -8)
-
Tool: Pyflakes Processed Files: reviewboard/admin/siteconfig.py reviewboard/admin/forms.py reviewboard/settings.py reviewboard/search/search_backends/registry.py reviewboard/search/tests.py reviewboard/reviews/models/review_request.py reviewboard/reviews/__init__.py reviewboard/search/signal_processor.py reviewboard/reviews/signals.py reviewboard/search/search_backends/elasticsearch.py Ignored Files: reviewboard/templates/admin/search_settings.html reviewboard/htdocs/media/cat.jpg Tool: PEP8 Style Checker Processed Files: reviewboard/admin/siteconfig.py reviewboard/admin/forms.py reviewboard/settings.py reviewboard/search/search_backends/registry.py reviewboard/search/tests.py reviewboard/reviews/models/review_request.py reviewboard/reviews/__init__.py reviewboard/search/signal_processor.py reviewboard/reviews/signals.py reviewboard/search/search_backends/elasticsearch.py Ignored Files: reviewboard/templates/admin/search_settings.html reviewboard/htdocs/media/cat.jpg
-
-
-
-
-
Got some thoughts on the signal-related changes. The rest are mostly trivial changes.
-
-
-
-
I don't think the way this is all working is thread-safe. If multiple threads are all going through the signal logic below with the same PK as the key, they're going to stomp over each other. You'll need a
threading.local()
for all this.Should also help prevent the same level of leakage that could occur if there's any kind of error in-between the
pre_clear
/post_clear
signals. -
-
Just wondering, but is there any chance that this can negatively impact consumers today? Or does this situation emit a signal that's otherwise never sent?
My concern is that this code, which is really intended for the search index specifically, could end up with side-effects that we're not anticipating. It sounds like it's inconsistent with how signals normally work for these relations, which potentially could trip someone up down the road?
I wonder if instead of having all this be general, we could make this really about the search index. That is, instead of having this be global here, we have this live in the search indexing code and have the results be function calls into the indexing code instead of signal emissions.
-
-
-
-
-
-
Since it's the search indexing specifically that needs this, I'd prefer not to put this variable in the signal emission. It's a bit confusing and redundant, and sets a precedent for future signals we have to listen to (in our app and maybe others down the road). Instead, the search indexing code should be able to handle the different keyword names.
What you could do in the indexing signal connection code is something like:
from functools import partial ... save_signals = [ (ReviewRequest, review_request_published, 'review_request'), (User, post_save, 'instance'), (User, m2m_changed, 'instance'), ] ... save_handler = partial(self.check_handle_save, instance_kwarg= for cls, signal, instance_kwarg in self.save_signals: signal.connect( partial(self.check_handle_save, instance_kwarg=instance_kwarg), sender=cls) ... def check_handle_save(self, instance_kwarg, **kwargs): instance = kwargs[instance_kwarg] ...
(Maybe
weak=False
will have to be passed toconnect
. I'm not sure.) -
-
-
Tool: Pyflakes Processed Files: reviewboard/admin/siteconfig.py reviewboard/admin/forms.py reviewboard/settings.py reviewboard/search/search_backends/registry.py reviewboard/search/tests.py reviewboard/search/signal_processor.py reviewboard/reviews/signals.py reviewboard/search/search_backends/elasticsearch.py Ignored Files: reviewboard/templates/admin/search_settings.html .gitignore Tool: PEP8 Style Checker Processed Files: reviewboard/admin/siteconfig.py reviewboard/admin/forms.py reviewboard/settings.py reviewboard/search/search_backends/registry.py reviewboard/search/tests.py reviewboard/search/signal_processor.py reviewboard/reviews/signals.py reviewboard/search/search_backends/elasticsearch.py Ignored Files: reviewboard/templates/admin/search_settings.html .gitignore
-
-
-
-
-
- Description:
-
On-the-fly search indexing is now supported for the Elasticsearch search
backend. When the option is enabled (from the search settings administration form), the search index will be updated whenever review requests are updated or deleted. + + Additionally, the default value of the Whoosh search index has been
+ renamed to "search-index", which was the original default; it was + accidentally changed to "search_index" in a previous change. - Diff:
-
Revision 5 (+360 -8)
-
Tool: Pyflakes Processed Files: reviewboard/admin/siteconfig.py reviewboard/admin/forms.py reviewboard/settings.py reviewboard/search/search_backends/registry.py reviewboard/search/search_backends/whoosh.py reviewboard/search/tests.py reviewboard/search/signal_processor.py reviewboard/reviews/signals.py reviewboard/search/search_backends/elasticsearch.py Ignored Files: reviewboard/templates/admin/search_settings.html Tool: PEP8 Style Checker Processed Files: reviewboard/admin/siteconfig.py reviewboard/admin/forms.py reviewboard/settings.py reviewboard/search/search_backends/registry.py reviewboard/search/search_backends/whoosh.py reviewboard/search/tests.py reviewboard/search/signal_processor.py reviewboard/reviews/signals.py reviewboard/search/search_backends/elasticsearch.py Ignored Files: reviewboard/templates/admin/search_settings.html
-
-
-
-
-
-
Tool: Pyflakes Processed Files: reviewboard/admin/siteconfig.py reviewboard/admin/forms.py reviewboard/settings.py reviewboard/search/search_backends/registry.py reviewboard/search/search_backends/whoosh.py reviewboard/search/tests.py reviewboard/search/signal_processor.py reviewboard/reviews/signals.py reviewboard/search/search_backends/elasticsearch.py Ignored Files: reviewboard/templates/admin/search_settings.html Tool: PEP8 Style Checker Processed Files: reviewboard/admin/siteconfig.py reviewboard/admin/forms.py reviewboard/settings.py reviewboard/search/search_backends/registry.py reviewboard/search/search_backends/whoosh.py reviewboard/search/tests.py reviewboard/search/signal_processor.py reviewboard/reviews/signals.py reviewboard/search/search_backends/elasticsearch.py Ignored Files: reviewboard/templates/admin/search_settings.html
-
-
-
-
- Change Summary:
-
Addressed Christian's issues.
- Diff:
-
Revision 7 (+359 -8)