Support on-the-fly search indexing

Review Request #8601 — Created Jan. 5, 2017 and submitted

Information

Review Board
release-3.0.x

Reviewers

On-the-fly search indexing is now supported for the Elasticsearch search
backend. When the option is enabled (from the search settings
administration form), the search index will be updated whenever review
requests are updated or deleted.

Additionally, the default value of the Whoosh search index has been
renamed to "search-index", which was the original default; it was
accidentally changed to "search_index" in a previous change.

Ran unit tests.

Description From Last Updated

You probably didn't mean to include cat.jpg.

chipx86chipx86

'elasticsearch' imported but unused

reviewbotreviewbot

'elasticsearch' imported but unused

reviewbotreviewbot

One blank line.

chipx86chipx86

Each item should be sentence casing, and no "; and" or any other delimiter between items. This is the standard …

chipx86chipx86

We also need to index Users when created/modified/deleted or when group membership changes (see accounts/search_indexes.py).

chipx86chipx86

Missing docstring.

chipx86chipx86

Missing docstring.

chipx86chipx86

**kwargs

chipx86chipx86

Can we put review_request in the function argument list instead? By the way, if this function's going to be specific …

chipx86chipx86

**kwargs

chipx86chipx86

Alphabetical order.

chipx86chipx86

As discussed in Slack, we're going to want to allow this for Whoosh. It may not be great for large …

chipx86chipx86

Probably better as self.assertIsInstance(...)

chipx86chipx86

assertIsInstance

chipx86chipx86

'django_reset' imported but unused

reviewbotreviewbot

'from settings_local import *' used; unable to detect undefined names

reviewbotreviewbot

"whe" -> "when"

chipx86chipx86

Two blank lines.

chipx86chipx86

I don't think the way this is all working is thread-safe. If multiple threads are all going through the signal …

chipx86chipx86

Can we move this below, so it's closer to where we connect it?

chipx86chipx86

Just wondering, but is there any chance that this can negatively impact consumers today? Or does this situation emit a …

chipx86chipx86

:py:attr: here?

chipx86chipx86

"wasn't instead" ?

chipx86chipx86

Col: 80 E501 line too long (101 > 79 characters)

reviewbotreviewbot

Blank line between these.

chipx86chipx86

Only one "g" in "arguments" :)

chipx86chipx86

This looks like a Shortest Middle Snake comment.

chipx86chipx86

Since it's the search indexing specifically that needs this, I'd prefer not to put this variable in the signal emission. …

chipx86chipx86

'elasticsearch' imported but unused

reviewbotreviewbot

"on-the-fly indexing", I think.

chipx86chipx86

Can be one statement.

chipx86chipx86

'django_reset' imported but unused

reviewbotreviewbot

'from settings_local import *' used; unable to detect undefined names

reviewbotreviewbot

Do we need both? Did the name change?

chipx86chipx86

'elasticsearch' imported but unused

reviewbotreviewbot

Just to help future expandability, can we put Group into a list like we do with save/delete signals?

chipx86chipx86

Col: 80 E501 line too long (86 > 79 characters)

reviewbotreviewbot

You can wrap before the < to satisfy pyflakes.

chipx86chipx86

Alignment is off by one.

chipx86chipx86

Wasn't what?

chipx86chipx86

Col: 80 E501 line too long (105 > 79 characters)

reviewbotreviewbot

Lists can't be indented, or they'll render incorrectly. The * must be aligned with the preceding paragraph.

chipx86chipx86

Looks like some spaces snuck in here.

chipx86chipx86

There's an extra space after pk_set.

chipx86chipx86

'django_reset' imported but unused

reviewbotreviewbot

'from settings_local import *' used; unable to detect undefined names

reviewbotreviewbot

'elasticsearch' imported but unused

reviewbotreviewbot

Col: 80 E501 line too long (105 > 79 characters)

reviewbotreviewbot

Col: 1 W391 blank line at end of file

reviewbotreviewbot

'django_reset' imported but unused

reviewbotreviewbot

'from settings_local import *' used; unable to detect undefined names

reviewbotreviewbot

'elasticsearch' imported but unused

reviewbotreviewbot

Why this change?

chipx86chipx86

"on-the-fly"

chipx86chipx86

Col: 80 E501 line too long (105 > 79 characters)

reviewbotreviewbot

reviewboard.reviews.models.group.Group, here and anywhere else.

chipx86chipx86

'django_reset' imported but unused

reviewbotreviewbot

'from settings_local import *' used; unable to detect undefined names

reviewbotreviewbot

I would say "... for large or multi-server installs."

chipx86chipx86
reviewbot
  1. Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/search/search_backends/base.py
        reviewboard/search/signal_processor.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/htdocs/media/cat.jpg
    
    
    
    Tool: Pyflakes
    Processed Files:
        reviewboard/search/search_backends/base.py
        reviewboard/search/signal_processor.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/htdocs/media/cat.jpg
    
    
  2. Show all issues
     'elasticsearch' imported but unused
    
  3. 
      
brennie
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/search/search_backends/base.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/htdocs/media/cat.jpg
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/search/search_backends/base.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/htdocs/media/cat.jpg
    
    
  2. Show all issues
     'elasticsearch' imported but unused
    
  3. reviewboard/settings.py (Diff revision 2)
     
     
    Show all issues
     'django_reset' imported but unused
    
  4. reviewboard/settings.py (Diff revision 2)
     
     
    Show all issues
     'from settings_local import *' used; unable to detect undefined names
    
  5. 
      
chipx86
  1. 
      
  2. reviewboard/search/search_backends/registry.py (Diff revision 2)
     
     
     
     
     
    Show all issues

    One blank line.

  3. reviewboard/search/signal_processor.py (Diff revision 2)
     
     
     
    Show all issues

    Each item should be sentence casing, and no "; and" or any other delimiter between items. This is the standard form for lists in documentation.

  4. reviewboard/search/signal_processor.py (Diff revision 2)
     
     
     
     
     
     
     
     
    Show all issues

    We also need to index Users when created/modified/deleted or when group membership changes (see accounts/search_indexes.py).

  5. reviewboard/search/signal_processor.py (Diff revision 2)
     
     
    Show all issues

    Missing docstring.

  6. reviewboard/search/signal_processor.py (Diff revision 2)
     
     
    Show all issues

    Missing docstring.

  7. reviewboard/search/signal_processor.py (Diff revision 2)
     
     
    Show all issues

    **kwargs

  8. reviewboard/search/signal_processor.py (Diff revision 2)
     
     
    Show all issues

    Can we put review_request in the function argument list instead?

    By the way, if this function's going to be specific to review requests, we should rename the function and update how it's connected.

  9. reviewboard/search/signal_processor.py (Diff revision 2)
     
     
    Show all issues

    **kwargs

  10. reviewboard/search/tests.py (Diff revision 2)
     
     
     
     
     
    Show all issues

    Alphabetical order.

  11. reviewboard/search/tests.py (Diff revision 2)
     
     
     
    Show all issues

    As discussed in Slack, we're going to want to allow this for Whoosh. It may not be great for large deployments, but smaller ones should definitely be able to start out with on-the-fly indexing.

  12. reviewboard/search/tests.py (Diff revision 2)
     
     
     
     
    Show all issues

    Probably better as self.assertIsInstance(...)

  13. reviewboard/search/tests.py (Diff revision 2)
     
     
     
    Show all issues

    assertIsInstance

  14. 
      
brennie
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/tests.py
        reviewboard/reviews/models/review_request.py
        reviewboard/reviews/__init__.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
        reviewboard/htdocs/media/cat.jpg
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/tests.py
        reviewboard/reviews/models/review_request.py
        reviewboard/reviews/__init__.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
        reviewboard/htdocs/media/cat.jpg
    
    
  2. reviewboard/reviews/__init__.py (Diff revision 3)
     
     
    Show all issues
    Col: 80
     E501 line too long (101 > 79 characters)
    
  3. Show all issues
     'elasticsearch' imported but unused
    
  4. reviewboard/settings.py (Diff revision 3)
     
     
    Show all issues
     'django_reset' imported but unused
    
  5. reviewboard/settings.py (Diff revision 3)
     
     
    Show all issues
     'from settings_local import *' used; unable to detect undefined names
    
  6. 
      
chipx86
  1. Got some thoughts on the signal-related changes. The rest are mostly trivial changes.

  2. Show all issues

    You probably didn't mean to include cat.jpg.

    1. cat.jpg is integral to this feature.

  3. reviewboard/admin/forms.py (Diff revision 3)
     
     
    Show all issues

    "whe" -> "when"

  4. reviewboard/reviews/__init__.py (Diff revision 3)
     
     
     
     
    Show all issues

    Two blank lines.

  5. reviewboard/reviews/__init__.py (Diff revision 3)
     
     
     
    Show all issues

    I don't think the way this is all working is thread-safe. If multiple threads are all going through the signal logic below with the same PK as the key, they're going to stomp over each other. You'll need a threading.local() for all this.

    Should also help prevent the same level of leakage that could occur if there's any kind of error in-between the pre_clear/post_clear signals.

  6. reviewboard/reviews/__init__.py (Diff revision 3)
     
     
    Show all issues

    Can we move this below, so it's closer to where we connect it?

  7. reviewboard/reviews/__init__.py (Diff revision 3)
     
     
    Show all issues

    Just wondering, but is there any chance that this can negatively impact consumers today? Or does this situation emit a signal that's otherwise never sent?

    My concern is that this code, which is really intended for the search index specifically, could end up with side-effects that we're not anticipating. It sounds like it's inconsistent with how signals normally work for these relations, which potentially could trip someone up down the road?

    I wonder if instead of having all this be general, we could make this really about the search index. That is, instead of having this be global here, we have this live in the search indexing code and have the results be function calls into the indexing code instead of signal emissions.

  8. reviewboard/reviews/__init__.py (Diff revision 3)
     
     
    Show all issues

    :py:attr: here?

  9. reviewboard/reviews/__init__.py (Diff revision 3)
     
     
    Show all issues

    "wasn't instead" ?

  10. reviewboard/reviews/__init__.py (Diff revision 3)
     
     
     
    Show all issues

    Blank line between these.

  11. reviewboard/reviews/__init__.py (Diff revision 3)
     
     
    Show all issues

    Only one "g" in "arguments" :)

  12. reviewboard/reviews/__init__.py (Diff revision 3)
     
     
    Show all issues

    This looks like a Shortest Middle Snake comment.

  13. reviewboard/reviews/signals.py (Diff revision 3)
     
     
     
     
     
    Show all issues

    Since it's the search indexing specifically that needs this, I'd prefer not to put this variable in the signal emission. It's a bit confusing and redundant, and sets a precedent for future signals we have to listen to (in our app and maybe others down the road). Instead, the search indexing code should be able to handle the different keyword names.

    What you could do in the indexing signal connection code is something like:

    from functools import partial
    
    ...
    
    save_signals = [
        (ReviewRequest, review_request_published, 'review_request'),
        (User, post_save, 'instance'),
        (User, m2m_changed, 'instance'),
    ]
    
    ...
    
    save_handler = partial(self.check_handle_save, instance_kwarg=
    
    for cls, signal, instance_kwarg in self.save_signals:
        signal.connect(
            partial(self.check_handle_save, instance_kwarg=instance_kwarg),
            sender=cls)
    
    ...
    
    def check_handle_save(self, instance_kwarg, **kwargs):
        instance = kwargs[instance_kwarg]
        ...
    

    (Maybe weak=False will have to be passed to connect. I'm not sure.)

  14. Show all issues

    "on-the-fly indexing", I think.

  15. Show all issues

    Can be one statement.

  16. 
      
brennie
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
        .gitignore
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
        .gitignore
    
    
  2. Show all issues
     'elasticsearch' imported but unused
    
  3. reviewboard/search/signal_processor.py (Diff revision 4)
     
     
    Show all issues
    Col: 80
     E501 line too long (86 > 79 characters)
    
  4. reviewboard/search/signal_processor.py (Diff revision 4)
     
     
    Show all issues
    Col: 80
     E501 line too long (105 > 79 characters)
    
  5. reviewboard/settings.py (Diff revision 4)
     
     
    Show all issues
     'django_reset' imported but unused
    
  6. reviewboard/settings.py (Diff revision 4)
     
     
    Show all issues
     'from settings_local import *' used; unable to detect undefined names
    
  7. 
      
chipx86
  1. Looks great! All my comments here are very minor things. (Mostly typos/ReST errors.)

  2. .gitignore (Diff revision 4)
     
     
     
    Show all issues

    Do we need both? Did the name change?

    1. I accidentally changed the default in a previous patch. I've undone that change now.

  3. reviewboard/search/signal_processor.py (Diff revision 4)
     
     
     
    Show all issues

    Just to help future expandability, can we put Group into a list like we do with save/delete signals?

  4. reviewboard/search/signal_processor.py (Diff revision 4)
     
     
    Show all issues

    You can wrap before the < to satisfy pyflakes.

  5. reviewboard/search/signal_processor.py (Diff revision 4)
     
     
     
    Show all issues

    Alignment is off by one.

  6. reviewboard/search/signal_processor.py (Diff revision 4)
     
     
    Show all issues

    Wasn't what?

  7. reviewboard/search/signal_processor.py (Diff revision 4)
     
     
     
     
     
     
     
     
     
    Show all issues

    Lists can't be indented, or they'll render incorrectly. The * must be aligned with the preceding paragraph.

  8. reviewboard/search/signal_processor.py (Diff revision 4)
     
     
    Show all issues

    Looks like some spaces snuck in here.

  9. reviewboard/search/signal_processor.py (Diff revision 4)
     
     
    Show all issues

    There's an extra space after pk_set.

  10. 
      
brennie
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/search_backends/whoosh.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/search_backends/whoosh.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
    
    
  2. Show all issues
     'elasticsearch' imported but unused
    
  3. reviewboard/search/signal_processor.py (Diff revision 5)
     
     
    Show all issues
    Col: 80
     E501 line too long (105 > 79 characters)
    
  4. reviewboard/search/signal_processor.py (Diff revision 5)
     
     
    Show all issues
    Col: 1
     W391 blank line at end of file
    
  5. reviewboard/settings.py (Diff revision 5)
     
     
    Show all issues
     'django_reset' imported but unused
    
  6. reviewboard/settings.py (Diff revision 5)
     
     
    Show all issues
     'from settings_local import *' used; unable to detect undefined names
    
  7. 
      
brennie
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/search_backends/whoosh.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/search_backends/whoosh.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
    
    
  2. Show all issues
     'elasticsearch' imported but unused
    
  3. reviewboard/search/signal_processor.py (Diff revision 6)
     
     
    Show all issues
    Col: 80
     E501 line too long (105 > 79 characters)
    
  4. reviewboard/settings.py (Diff revision 6)
     
     
    Show all issues
     'django_reset' imported but unused
    
  5. reviewboard/settings.py (Diff revision 6)
     
     
    Show all issues
     'from settings_local import *' used; unable to detect undefined names
    
  6. 
      
chipx86
  1. 
      
  2. Show all issues

    Why this change?

    1. It originally was search-index, but when the patch for search backends went it in it changed to search_index. This moves us back to the original.

  3. reviewboard/search/signal_processor.py (Diff revision 6)
     
     
    Show all issues

    "on-the-fly"

  4. reviewboard/search/signal_processor.py (Diff revision 6)
     
     
    Show all issues

    reviewboard.reviews.models.group.Group, here and anywhere else.

  5. 
      
brennie
chipx86
  1. 
      
  2. reviewboard/admin/forms.py (Diff revision 7)
     
     
    Show all issues

    I would say "... for large or multi-server installs."

  3. 
      
brennie
Review request changed
Status:
Completed
Change Summary:
Pushed to release-3.0.x (7de3a90)