Support on-the-fly search indexing

Review Request #8601 — Created Jan. 5, 2017 and submitted

brennie
Review Board
release-3.0.x
8600
reviewboard

On-the-fly search indexing is now supported for the Elasticsearch search
backend. When the option is enabled (from the search settings
administration form), the search index will be updated whenever review
requests are updated or deleted.

Additionally, the default value of the Whoosh search index has been
renamed to "search-index", which was the original default; it was
accidentally changed to "search_index" in a previous change.

Ran unit tests.

Description From Last Updated

You probably didn't mean to include cat.jpg.

chipx86chipx86

'elasticsearch' imported but unused

reviewbotreviewbot

'elasticsearch' imported but unused

reviewbotreviewbot

One blank line.

chipx86chipx86

Each item should be sentence casing, and no "; and" or any other delimiter between items. This is the standard ...

chipx86chipx86

We also need to index Users when created/modified/deleted or when group membership changes (see accounts/search_indexes.py).

chipx86chipx86

Missing docstring.

chipx86chipx86

Missing docstring.

chipx86chipx86

**kwargs

chipx86chipx86

Can we put review_request in the function argument list instead? By the way, if this function's going to be specific ...

chipx86chipx86

**kwargs

chipx86chipx86

Alphabetical order.

chipx86chipx86

As discussed in Slack, we're going to want to allow this for Whoosh. It may not be great for large ...

chipx86chipx86

Probably better as self.assertIsInstance(...)

chipx86chipx86

assertIsInstance

chipx86chipx86

'django_reset' imported but unused

reviewbotreviewbot

'from settings_local import *' used; unable to detect undefined names

reviewbotreviewbot

"whe" -> "when"

chipx86chipx86

Two blank lines.

chipx86chipx86

I don't think the way this is all working is thread-safe. If multiple threads are all going through the signal ...

chipx86chipx86

Can we move this below, so it's closer to where we connect it?

chipx86chipx86

Just wondering, but is there any chance that this can negatively impact consumers today? Or does this situation emit a ...

chipx86chipx86

:py:attr: here?

chipx86chipx86

"wasn't instead" ?

chipx86chipx86

Col: 80 E501 line too long (101 > 79 characters)

reviewbotreviewbot

Blank line between these.

chipx86chipx86

Only one "g" in "arguments" :)

chipx86chipx86

This looks like a Shortest Middle Snake comment.

chipx86chipx86

Since it's the search indexing specifically that needs this, I'd prefer not to put this variable in the signal emission. ...

chipx86chipx86

'elasticsearch' imported but unused

reviewbotreviewbot

"on-the-fly indexing", I think.

chipx86chipx86

Can be one statement.

chipx86chipx86

'django_reset' imported but unused

reviewbotreviewbot

'from settings_local import *' used; unable to detect undefined names

reviewbotreviewbot

Do we need both? Did the name change?

chipx86chipx86

'elasticsearch' imported but unused

reviewbotreviewbot

Just to help future expandability, can we put Group into a list like we do with save/delete signals?

chipx86chipx86

Col: 80 E501 line too long (86 > 79 characters)

reviewbotreviewbot

You can wrap before the < to satisfy pyflakes.

chipx86chipx86

Alignment is off by one.

chipx86chipx86

Wasn't what?

chipx86chipx86

Col: 80 E501 line too long (105 > 79 characters)

reviewbotreviewbot

Lists can't be indented, or they'll render incorrectly. The * must be aligned with the preceding paragraph.

chipx86chipx86

Looks like some spaces snuck in here.

chipx86chipx86

There's an extra space after pk_set.

chipx86chipx86

'django_reset' imported but unused

reviewbotreviewbot

'from settings_local import *' used; unable to detect undefined names

reviewbotreviewbot

'elasticsearch' imported but unused

reviewbotreviewbot

Col: 80 E501 line too long (105 > 79 characters)

reviewbotreviewbot

Col: 1 W391 blank line at end of file

reviewbotreviewbot

'django_reset' imported but unused

reviewbotreviewbot

'from settings_local import *' used; unable to detect undefined names

reviewbotreviewbot

'elasticsearch' imported but unused

reviewbotreviewbot

Why this change?

chipx86chipx86

"on-the-fly"

chipx86chipx86

Col: 80 E501 line too long (105 > 79 characters)

reviewbotreviewbot

reviewboard.reviews.models.group.Group, here and anywhere else.

chipx86chipx86

'django_reset' imported but unused

reviewbotreviewbot

'from settings_local import *' used; unable to detect undefined names

reviewbotreviewbot

I would say "... for large or multi-server installs."

chipx86chipx86
reviewbot
  1. Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/search/search_backends/base.py
        reviewboard/search/signal_processor.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/htdocs/media/cat.jpg
    
    
    
    Tool: Pyflakes
    Processed Files:
        reviewboard/search/search_backends/base.py
        reviewboard/search/signal_processor.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/htdocs/media/cat.jpg
    
    
  2.  'elasticsearch' imported but unused
    
  3. 
      
brennie
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/search/search_backends/base.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/htdocs/media/cat.jpg
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/search/search_backends/base.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/htdocs/media/cat.jpg
    
    
  2.  'elasticsearch' imported but unused
    
  3. reviewboard/settings.py (Diff revision 2)
     
     
     'django_reset' imported but unused
    
  4. reviewboard/settings.py (Diff revision 2)
     
     
     'from settings_local import *' used; unable to detect undefined names
    
  5. 
      
chipx86
  1. 
      
  2. reviewboard/search/search_backends/registry.py (Diff revision 2)
     
     
     
     
     

    One blank line.

  3. reviewboard/search/signal_processor.py (Diff revision 2)
     
     
     

    Each item should be sentence casing, and no "; and" or any other delimiter between items. This is the standard form for lists in documentation.

  4. reviewboard/search/signal_processor.py (Diff revision 2)
     
     
     
     
     
     
     
     

    We also need to index Users when created/modified/deleted or when group membership changes (see accounts/search_indexes.py).

  5. reviewboard/search/signal_processor.py (Diff revision 2)
     
     

    Missing docstring.

  6. reviewboard/search/signal_processor.py (Diff revision 2)
     
     

    Missing docstring.

  7. reviewboard/search/signal_processor.py (Diff revision 2)
     
     

    **kwargs

  8. reviewboard/search/signal_processor.py (Diff revision 2)
     
     

    Can we put review_request in the function argument list instead?

    By the way, if this function's going to be specific to review requests, we should rename the function and update how it's connected.

  9. reviewboard/search/signal_processor.py (Diff revision 2)
     
     

    **kwargs

  10. reviewboard/search/tests.py (Diff revision 2)
     
     
     
     
     

    Alphabetical order.

  11. reviewboard/search/tests.py (Diff revision 2)
     
     
     

    As discussed in Slack, we're going to want to allow this for Whoosh. It may not be great for large deployments, but smaller ones should definitely be able to start out with on-the-fly indexing.

  12. reviewboard/search/tests.py (Diff revision 2)
     
     
     
     

    Probably better as self.assertIsInstance(...)

  13. reviewboard/search/tests.py (Diff revision 2)
     
     
     

    assertIsInstance

  14. 
      
brennie
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/tests.py
        reviewboard/reviews/models/review_request.py
        reviewboard/reviews/__init__.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
        reviewboard/htdocs/media/cat.jpg
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/tests.py
        reviewboard/reviews/models/review_request.py
        reviewboard/reviews/__init__.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
        reviewboard/htdocs/media/cat.jpg
    
    
  2. reviewboard/reviews/__init__.py (Diff revision 3)
     
     
    Col: 80
     E501 line too long (101 > 79 characters)
    
  3.  'elasticsearch' imported but unused
    
  4. reviewboard/settings.py (Diff revision 3)
     
     
     'django_reset' imported but unused
    
  5. reviewboard/settings.py (Diff revision 3)
     
     
     'from settings_local import *' used; unable to detect undefined names
    
  6. 
      
chipx86
  1. Got some thoughts on the signal-related changes. The rest are mostly trivial changes.

  2. You probably didn't mean to include cat.jpg.

    1. cat.jpg is integral to this feature.

  3. reviewboard/admin/forms.py (Diff revision 3)
     
     

    "whe" -> "when"

  4. reviewboard/reviews/__init__.py (Diff revision 3)
     
     
     
     

    Two blank lines.

  5. reviewboard/reviews/__init__.py (Diff revision 3)
     
     
     

    I don't think the way this is all working is thread-safe. If multiple threads are all going through the signal logic below with the same PK as the key, they're going to stomp over each other. You'll need a threading.local() for all this.

    Should also help prevent the same level of leakage that could occur if there's any kind of error in-between the pre_clear/post_clear signals.

  6. reviewboard/reviews/__init__.py (Diff revision 3)
     
     

    Can we move this below, so it's closer to where we connect it?

  7. reviewboard/reviews/__init__.py (Diff revision 3)
     
     

    Just wondering, but is there any chance that this can negatively impact consumers today? Or does this situation emit a signal that's otherwise never sent?

    My concern is that this code, which is really intended for the search index specifically, could end up with side-effects that we're not anticipating. It sounds like it's inconsistent with how signals normally work for these relations, which potentially could trip someone up down the road?

    I wonder if instead of having all this be general, we could make this really about the search index. That is, instead of having this be global here, we have this live in the search indexing code and have the results be function calls into the indexing code instead of signal emissions.

  8. reviewboard/reviews/__init__.py (Diff revision 3)
     
     

    :py:attr: here?

  9. reviewboard/reviews/__init__.py (Diff revision 3)
     
     

    "wasn't instead" ?

  10. reviewboard/reviews/__init__.py (Diff revision 3)
     
     
     

    Blank line between these.

  11. reviewboard/reviews/__init__.py (Diff revision 3)
     
     

    Only one "g" in "arguments" :)

  12. reviewboard/reviews/__init__.py (Diff revision 3)
     
     

    This looks like a Shortest Middle Snake comment.

  13. reviewboard/reviews/signals.py (Diff revision 3)
     
     
     
     
     

    Since it's the search indexing specifically that needs this, I'd prefer not to put this variable in the signal emission. It's a bit confusing and redundant, and sets a precedent for future signals we have to listen to (in our app and maybe others down the road). Instead, the search indexing code should be able to handle the different keyword names.

    What you could do in the indexing signal connection code is something like:

    from functools import partial
    
    ...
    
    save_signals = [
        (ReviewRequest, review_request_published, 'review_request'),
        (User, post_save, 'instance'),
        (User, m2m_changed, 'instance'),
    ]
    
    ...
    
    save_handler = partial(self.check_handle_save, instance_kwarg=
    
    for cls, signal, instance_kwarg in self.save_signals:
        signal.connect(
            partial(self.check_handle_save, instance_kwarg=instance_kwarg),
            sender=cls)
    
    ...
    
    def check_handle_save(self, instance_kwarg, **kwargs):
        instance = kwargs[instance_kwarg]
        ...
    

    (Maybe weak=False will have to be passed to connect. I'm not sure.)

  14. "on-the-fly indexing", I think.

  15. Can be one statement.

  16. 
      
brennie
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
        .gitignore
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
        .gitignore
    
    
  2.  'elasticsearch' imported but unused
    
  3. reviewboard/search/signal_processor.py (Diff revision 4)
     
     
    Col: 80
     E501 line too long (86 > 79 characters)
    
  4. reviewboard/search/signal_processor.py (Diff revision 4)
     
     
    Col: 80
     E501 line too long (105 > 79 characters)
    
  5. reviewboard/settings.py (Diff revision 4)
     
     
     'django_reset' imported but unused
    
  6. reviewboard/settings.py (Diff revision 4)
     
     
     'from settings_local import *' used; unable to detect undefined names
    
  7. 
      
chipx86
  1. Looks great! All my comments here are very minor things. (Mostly typos/ReST errors.)

  2. .gitignore (Diff revision 4)
     
     
     

    Do we need both? Did the name change?

    1. I accidentally changed the default in a previous patch. I've undone that change now.

  3. reviewboard/search/signal_processor.py (Diff revision 4)
     
     
     

    Just to help future expandability, can we put Group into a list like we do with save/delete signals?

  4. reviewboard/search/signal_processor.py (Diff revision 4)
     
     

    You can wrap before the < to satisfy pyflakes.

  5. reviewboard/search/signal_processor.py (Diff revision 4)
     
     
     

    Alignment is off by one.

  6. reviewboard/search/signal_processor.py (Diff revision 4)
     
     

    Wasn't what?

  7. reviewboard/search/signal_processor.py (Diff revision 4)
     
     
     
     
     
     
     
     
     

    Lists can't be indented, or they'll render incorrectly. The * must be aligned with the preceding paragraph.

  8. reviewboard/search/signal_processor.py (Diff revision 4)
     
     

    Looks like some spaces snuck in here.

  9. reviewboard/search/signal_processor.py (Diff revision 4)
     
     

    There's an extra space after pk_set.

  10. 
      
brennie
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/search_backends/whoosh.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/search_backends/whoosh.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
    
    
  2.  'elasticsearch' imported but unused
    
  3. reviewboard/search/signal_processor.py (Diff revision 5)
     
     
    Col: 80
     E501 line too long (105 > 79 characters)
    
  4. reviewboard/search/signal_processor.py (Diff revision 5)
     
     
    Col: 1
     W391 blank line at end of file
    
  5. reviewboard/settings.py (Diff revision 5)
     
     
     'django_reset' imported but unused
    
  6. reviewboard/settings.py (Diff revision 5)
     
     
     'from settings_local import *' used; unable to detect undefined names
    
  7. 
      
brennie
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/search_backends/whoosh.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/search_backends/whoosh.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
    
    
  2.  'elasticsearch' imported but unused
    
  3. reviewboard/search/signal_processor.py (Diff revision 6)
     
     
    Col: 80
     E501 line too long (105 > 79 characters)
    
  4. reviewboard/settings.py (Diff revision 6)
     
     
     'django_reset' imported but unused
    
  5. reviewboard/settings.py (Diff revision 6)
     
     
     'from settings_local import *' used; unable to detect undefined names
    
  6. 
      
chipx86
  1. 
      
  2. Why this change?

    1. It originally was search-index, but when the patch for search backends went it in it changed to search_index. This moves us back to the original.

  3. reviewboard/search/signal_processor.py (Diff revision 6)
     
     

    "on-the-fly"

  4. reviewboard/search/signal_processor.py (Diff revision 6)
     
     

    reviewboard.reviews.models.group.Group, here and anywhere else.

  5. 
      
brennie
chipx86
  1. 
      
  2. reviewboard/admin/forms.py (Diff revision 7)
     
     

    I would say "... for large or multi-server installs."

  3. 
      
brennie
Review request changed

Status: Closed (submitted)

Change Summary:

Pushed to release-3.0.x (7de3a90)
Loading...