Support on-the-fly search indexing

Review Request #8601 - Created Jan. 5, 2017 and submitted

Barret Rennie
Review Board
release-3.0.x
8600
reviewboard

On-the-fly search indexing is now supported for the Elasticsearch search
backend. When the option is enabled (from the search settings
administration form), the search index will be updated whenever review
requests are updated or deleted.

Additionally, the default value of the Whoosh search index has been
renamed to "search-index", which was the original default; it was
accidentally changed to "search_index" in a previous change.

Ran unit tests.

  • 0
  • 0
  • 41
  • 19
  • 60
Description From Last Updated
Review Bot
  1. Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/search/search_backends/base.py
        reviewboard/search/signal_processor.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/htdocs/media/cat.jpg
    
    
    
    Tool: Pyflakes
    Processed Files:
        reviewboard/search/search_backends/base.py
        reviewboard/search/signal_processor.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/htdocs/media/cat.jpg
    
    
  2.  'elasticsearch' imported but unused
    
  3. 
      
Barret Rennie
Review Bot
  1. Tool: Pyflakes
    Processed Files:
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/search/search_backends/base.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/htdocs/media/cat.jpg
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/search/search_backends/base.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/htdocs/media/cat.jpg
    
    
  2.  'elasticsearch' imported but unused
    
  3. reviewboard/settings.py (Diff revision 2)
     
     
     'django_reset' imported but unused
    
  4. reviewboard/settings.py (Diff revision 2)
     
     
     'from settings_local import *' used; unable to detect undefined names
    
  5. 
      
Christian Hammond
  1. 
      
  2. reviewboard/search/search_backends/registry.py (Diff revision 2)
     
     
     
     
     

    One blank line.

  3. reviewboard/search/signal_processor.py (Diff revision 2)
     
     
     

    Each item should be sentence casing, and no "; and" or any other delimiter between items. This is the standard form for lists in documentation.

  4. reviewboard/search/signal_processor.py (Diff revision 2)
     
     
     
     
     
     
     
     

    We also need to index Users when created/modified/deleted or when group membership changes (see accounts/search_indexes.py).

  5. reviewboard/search/signal_processor.py (Diff revision 2)
     
     

    Missing docstring.

  6. reviewboard/search/signal_processor.py (Diff revision 2)
     
     

    Missing docstring.

  7. reviewboard/search/signal_processor.py (Diff revision 2)
     
     

    **kwargs

  8. reviewboard/search/signal_processor.py (Diff revision 2)
     
     

    Can we put review_request in the function argument list instead?

    By the way, if this function's going to be specific to review requests, we should rename the function and update how it's connected.

  9. reviewboard/search/signal_processor.py (Diff revision 2)
     
     

    **kwargs

  10. reviewboard/search/tests.py (Diff revision 2)
     
     
     
     
     

    Alphabetical order.

  11. reviewboard/search/tests.py (Diff revision 2)
     
     
     

    As discussed in Slack, we're going to want to allow this for Whoosh. It may not be great for large deployments, but smaller ones should definitely be able to start out with on-the-fly indexing.

  12. reviewboard/search/tests.py (Diff revision 2)
     
     
     
     

    Probably better as self.assertIsInstance(...)

  13. reviewboard/search/tests.py (Diff revision 2)
     
     
     

    assertIsInstance

  14. 
      
Barret Rennie
Review Bot
  1. Tool: Pyflakes
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/tests.py
        reviewboard/reviews/models/review_request.py
        reviewboard/reviews/__init__.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
        reviewboard/htdocs/media/cat.jpg
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/tests.py
        reviewboard/reviews/models/review_request.py
        reviewboard/reviews/__init__.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
        reviewboard/htdocs/media/cat.jpg
    
    
  2. reviewboard/reviews/__init__.py (Diff revision 3)
     
     
    Col: 80
     E501 line too long (101 > 79 characters)
    
  3.  'elasticsearch' imported but unused
    
  4. reviewboard/settings.py (Diff revision 3)
     
     
     'django_reset' imported but unused
    
  5. reviewboard/settings.py (Diff revision 3)
     
     
     'from settings_local import *' used; unable to detect undefined names
    
  6. 
      
Christian Hammond
  1. Got some thoughts on the signal-related changes. The rest are mostly trivial changes.

  2. You probably didn't mean to include cat.jpg.

    1. cat.jpg is integral to this feature.

  3. reviewboard/admin/forms.py (Diff revision 3)
     
     

    "whe" -> "when"

  4. reviewboard/reviews/__init__.py (Diff revision 3)
     
     
     
     

    Two blank lines.

  5. reviewboard/reviews/__init__.py (Diff revision 3)
     
     
     

    I don't think the way this is all working is thread-safe. If multiple threads are all going through the signal logic below with the same PK as the key, they're going to stomp over each other. You'll need a threading.local() for all this.

    Should also help prevent the same level of leakage that could occur if there's any kind of error in-between the pre_clear/post_clear signals.

  6. reviewboard/reviews/__init__.py (Diff revision 3)
     
     

    Can we move this below, so it's closer to where we connect it?

  7. reviewboard/reviews/__init__.py (Diff revision 3)
     
     

    Just wondering, but is there any chance that this can negatively impact consumers today? Or does this situation emit a signal that's otherwise never sent?

    My concern is that this code, which is really intended for the search index specifically, could end up with side-effects that we're not anticipating. It sounds like it's inconsistent with how signals normally work for these relations, which potentially could trip someone up down the road?

    I wonder if instead of having all this be general, we could make this really about the search index. That is, instead of having this be global here, we have this live in the search indexing code and have the results be function calls into the indexing code instead of signal emissions.

  8. reviewboard/reviews/__init__.py (Diff revision 3)
     
     

    :py:attr: here?

  9. reviewboard/reviews/__init__.py (Diff revision 3)
     
     

    "wasn't instead" ?

  10. reviewboard/reviews/__init__.py (Diff revision 3)
     
     
     

    Blank line between these.

  11. reviewboard/reviews/__init__.py (Diff revision 3)
     
     

    Only one "g" in "arguments" :)

  12. reviewboard/reviews/__init__.py (Diff revision 3)
     
     

    This looks like a Shortest Middle Snake comment.

  13. reviewboard/reviews/signals.py (Diff revision 3)
     
     
     
     
     

    Since it's the search indexing specifically that needs this, I'd prefer not to put this variable in the signal emission. It's a bit confusing and redundant, and sets a precedent for future signals we have to listen to (in our app and maybe others down the road). Instead, the search indexing code should be able to handle the different keyword names.

    What you could do in the indexing signal connection code is something like:

    from functools import partial
    
    ...
    
    save_signals = [
        (ReviewRequest, review_request_published, 'review_request'),
        (User, post_save, 'instance'),
        (User, m2m_changed, 'instance'),
    ]
    
    ...
    
    save_handler = partial(self.check_handle_save, instance_kwarg=
    
    for cls, signal, instance_kwarg in self.save_signals:
        signal.connect(
            partial(self.check_handle_save, instance_kwarg=instance_kwarg),
            sender=cls)
    
    ...
    
    def check_handle_save(self, instance_kwarg, **kwargs):
        instance = kwargs[instance_kwarg]
        ...
    

    (Maybe weak=False will have to be passed to connect. I'm not sure.)

  14. "on-the-fly indexing", I think.

  15. Can be one statement.

  16. 
      
Barret Rennie
Review Bot
  1. Tool: Pyflakes
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
        .gitignore
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
        .gitignore
    
    
  2.  'elasticsearch' imported but unused
    
  3. reviewboard/search/signal_processor.py (Diff revision 4)
     
     
    Col: 80
     E501 line too long (86 > 79 characters)
    
  4. reviewboard/search/signal_processor.py (Diff revision 4)
     
     
    Col: 80
     E501 line too long (105 > 79 characters)
    
  5. reviewboard/settings.py (Diff revision 4)
     
     
     'django_reset' imported but unused
    
  6. reviewboard/settings.py (Diff revision 4)
     
     
     'from settings_local import *' used; unable to detect undefined names
    
  7. 
      
Christian Hammond
  1. Looks great! All my comments here are very minor things. (Mostly typos/ReST errors.)

  2. .gitignore (Diff revision 4)
     
     
     

    Do we need both? Did the name change?

    1. I accidentally changed the default in a previous patch. I've undone that change now.

  3. reviewboard/search/signal_processor.py (Diff revision 4)
     
     
     

    Just to help future expandability, can we put Group into a list like we do with save/delete signals?

  4. reviewboard/search/signal_processor.py (Diff revision 4)
     
     

    You can wrap before the < to satisfy pyflakes.

  5. reviewboard/search/signal_processor.py (Diff revision 4)
     
     
     

    Alignment is off by one.

  6. reviewboard/search/signal_processor.py (Diff revision 4)
     
     

    Wasn't what?

  7. reviewboard/search/signal_processor.py (Diff revision 4)
     
     
     
     
     
     
     
     
     

    Lists can't be indented, or they'll render incorrectly. The * must be aligned with the preceding paragraph.

  8. reviewboard/search/signal_processor.py (Diff revision 4)
     
     

    Looks like some spaces snuck in here.

  9. reviewboard/search/signal_processor.py (Diff revision 4)
     
     

    There's an extra space after pk_set.

  10. 
      
Barret Rennie
Review Bot
  1. Tool: Pyflakes
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/search_backends/whoosh.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/search_backends/whoosh.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
    
    
  2.  'elasticsearch' imported but unused
    
  3. reviewboard/search/signal_processor.py (Diff revision 5)
     
     
    Col: 80
     E501 line too long (105 > 79 characters)
    
  4. reviewboard/search/signal_processor.py (Diff revision 5)
     
     
    Col: 1
     W391 blank line at end of file
    
  5. reviewboard/settings.py (Diff revision 5)
     
     
     'django_reset' imported but unused
    
  6. reviewboard/settings.py (Diff revision 5)
     
     
     'from settings_local import *' used; unable to detect undefined names
    
  7. 
      
Barret Rennie
Review Bot
  1. Tool: Pyflakes
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/search_backends/whoosh.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        reviewboard/admin/siteconfig.py
        reviewboard/admin/forms.py
        reviewboard/settings.py
        reviewboard/search/search_backends/registry.py
        reviewboard/search/search_backends/whoosh.py
        reviewboard/search/tests.py
        reviewboard/search/signal_processor.py
        reviewboard/reviews/signals.py
        reviewboard/search/search_backends/elasticsearch.py
    
    Ignored Files:
        reviewboard/templates/admin/search_settings.html
    
    
  2.  'elasticsearch' imported but unused
    
  3. reviewboard/search/signal_processor.py (Diff revision 6)
     
     
    Col: 80
     E501 line too long (105 > 79 characters)
    
  4. reviewboard/settings.py (Diff revision 6)
     
     
     'django_reset' imported but unused
    
  5. reviewboard/settings.py (Diff revision 6)
     
     
     'from settings_local import *' used; unable to detect undefined names
    
  6. 
      
Christian Hammond
  1. 
      
  2. Why this change?

    1. It originally was search-index, but when the patch for search backends went it in it changed to search_index. This moves us back to the original.

  3. reviewboard/search/signal_processor.py (Diff revision 6)
     
     

    "on-the-fly"

  4. reviewboard/search/signal_processor.py (Diff revision 6)
     
     

    reviewboard.reviews.models.group.Group, here and anywhere else.

  5. 
      
Barret Rennie
Christian Hammond
  1. 
      
  2. reviewboard/admin/forms.py (Diff revision 7)
     
     

    I would say "... for large or multi-server installs."

  3. 
      
Barret Rennie
Review request changed

Status: Closed (submitted)

Change Summary:

Pushed to release-3.0.x (7de3a90)
Loading...