• 
      

    Support on-the-fly search indexing

    Review Request #8601 — Created Jan. 5, 2017 and submitted

    Information

    Review Board
    release-3.0.x

    Reviewers

    On-the-fly search indexing is now supported for the Elasticsearch search
    backend. When the option is enabled (from the search settings
    administration form), the search index will be updated whenever review
    requests are updated or deleted.

    Additionally, the default value of the Whoosh search index has been
    renamed to "search-index", which was the original default; it was
    accidentally changed to "search_index" in a previous change.

    Ran unit tests.

    Description From Last Updated

    You probably didn't mean to include cat.jpg.

    chipx86chipx86

    'elasticsearch' imported but unused

    reviewbotreviewbot

    'elasticsearch' imported but unused

    reviewbotreviewbot

    One blank line.

    chipx86chipx86

    Each item should be sentence casing, and no "; and" or any other delimiter between items. This is the standard …

    chipx86chipx86

    We also need to index Users when created/modified/deleted or when group membership changes (see accounts/search_indexes.py).

    chipx86chipx86

    Missing docstring.

    chipx86chipx86

    Missing docstring.

    chipx86chipx86

    **kwargs

    chipx86chipx86

    Can we put review_request in the function argument list instead? By the way, if this function's going to be specific …

    chipx86chipx86

    **kwargs

    chipx86chipx86

    Alphabetical order.

    chipx86chipx86

    As discussed in Slack, we're going to want to allow this for Whoosh. It may not be great for large …

    chipx86chipx86

    Probably better as self.assertIsInstance(...)

    chipx86chipx86

    assertIsInstance

    chipx86chipx86

    'django_reset' imported but unused

    reviewbotreviewbot

    'from settings_local import *' used; unable to detect undefined names

    reviewbotreviewbot

    "whe" -> "when"

    chipx86chipx86

    Two blank lines.

    chipx86chipx86

    I don't think the way this is all working is thread-safe. If multiple threads are all going through the signal …

    chipx86chipx86

    Can we move this below, so it's closer to where we connect it?

    chipx86chipx86

    Just wondering, but is there any chance that this can negatively impact consumers today? Or does this situation emit a …

    chipx86chipx86

    :py:attr: here?

    chipx86chipx86

    "wasn't instead" ?

    chipx86chipx86

    Col: 80 E501 line too long (101 > 79 characters)

    reviewbotreviewbot

    Blank line between these.

    chipx86chipx86

    Only one "g" in "arguments" :)

    chipx86chipx86

    This looks like a Shortest Middle Snake comment.

    chipx86chipx86

    Since it's the search indexing specifically that needs this, I'd prefer not to put this variable in the signal emission. …

    chipx86chipx86

    'elasticsearch' imported but unused

    reviewbotreviewbot

    "on-the-fly indexing", I think.

    chipx86chipx86

    Can be one statement.

    chipx86chipx86

    'django_reset' imported but unused

    reviewbotreviewbot

    'from settings_local import *' used; unable to detect undefined names

    reviewbotreviewbot

    Do we need both? Did the name change?

    chipx86chipx86

    'elasticsearch' imported but unused

    reviewbotreviewbot

    Just to help future expandability, can we put Group into a list like we do with save/delete signals?

    chipx86chipx86

    Col: 80 E501 line too long (86 > 79 characters)

    reviewbotreviewbot

    You can wrap before the < to satisfy pyflakes.

    chipx86chipx86

    Alignment is off by one.

    chipx86chipx86

    Wasn't what?

    chipx86chipx86

    Col: 80 E501 line too long (105 > 79 characters)

    reviewbotreviewbot

    Lists can't be indented, or they'll render incorrectly. The * must be aligned with the preceding paragraph.

    chipx86chipx86

    Looks like some spaces snuck in here.

    chipx86chipx86

    There's an extra space after pk_set.

    chipx86chipx86

    'django_reset' imported but unused

    reviewbotreviewbot

    'from settings_local import *' used; unable to detect undefined names

    reviewbotreviewbot

    'elasticsearch' imported but unused

    reviewbotreviewbot

    Col: 80 E501 line too long (105 > 79 characters)

    reviewbotreviewbot

    Col: 1 W391 blank line at end of file

    reviewbotreviewbot

    'django_reset' imported but unused

    reviewbotreviewbot

    'from settings_local import *' used; unable to detect undefined names

    reviewbotreviewbot

    'elasticsearch' imported but unused

    reviewbotreviewbot

    Why this change?

    chipx86chipx86

    "on-the-fly"

    chipx86chipx86

    Col: 80 E501 line too long (105 > 79 characters)

    reviewbotreviewbot

    reviewboard.reviews.models.group.Group, here and anywhere else.

    chipx86chipx86

    'django_reset' imported but unused

    reviewbotreviewbot

    'from settings_local import *' used; unable to detect undefined names

    reviewbotreviewbot

    I would say "... for large or multi-server installs."

    chipx86chipx86
    reviewbot
    1. Tool: PEP8 Style Checker
      Processed Files:
          reviewboard/search/search_backends/base.py
          reviewboard/search/signal_processor.py
          reviewboard/search/search_backends/registry.py
          reviewboard/search/search_backends/elasticsearch.py
      
      Ignored Files:
          reviewboard/htdocs/media/cat.jpg
      
      
      
      Tool: Pyflakes
      Processed Files:
          reviewboard/search/search_backends/base.py
          reviewboard/search/signal_processor.py
          reviewboard/search/search_backends/registry.py
          reviewboard/search/search_backends/elasticsearch.py
      
      Ignored Files:
          reviewboard/htdocs/media/cat.jpg
      
      
    2. Show all issues
       'elasticsearch' imported but unused
      
    3. 
        
    brennie
    reviewbot
    1. Tool: Pyflakes
      Processed Files:
          reviewboard/settings.py
          reviewboard/search/search_backends/registry.py
          reviewboard/search/tests.py
          reviewboard/search/signal_processor.py
          reviewboard/search/search_backends/base.py
          reviewboard/search/search_backends/elasticsearch.py
      
      Ignored Files:
          reviewboard/htdocs/media/cat.jpg
      
      
      
      Tool: PEP8 Style Checker
      Processed Files:
          reviewboard/settings.py
          reviewboard/search/search_backends/registry.py
          reviewboard/search/tests.py
          reviewboard/search/signal_processor.py
          reviewboard/search/search_backends/base.py
          reviewboard/search/search_backends/elasticsearch.py
      
      Ignored Files:
          reviewboard/htdocs/media/cat.jpg
      
      
    2. Show all issues
       'elasticsearch' imported but unused
      
    3. reviewboard/settings.py (Diff revision 2)
       
       
      Show all issues
       'django_reset' imported but unused
      
    4. reviewboard/settings.py (Diff revision 2)
       
       
      Show all issues
       'from settings_local import *' used; unable to detect undefined names
      
    5. 
        
    chipx86
    1. 
        
    2. reviewboard/search/search_backends/registry.py (Diff revision 2)
       
       
       
       
       
      Show all issues

      One blank line.

    3. reviewboard/search/signal_processor.py (Diff revision 2)
       
       
       
      Show all issues

      Each item should be sentence casing, and no "; and" or any other delimiter between items. This is the standard form for lists in documentation.

    4. reviewboard/search/signal_processor.py (Diff revision 2)
       
       
       
       
       
       
       
       
      Show all issues

      We also need to index Users when created/modified/deleted or when group membership changes (see accounts/search_indexes.py).

    5. reviewboard/search/signal_processor.py (Diff revision 2)
       
       
      Show all issues

      Missing docstring.

    6. reviewboard/search/signal_processor.py (Diff revision 2)
       
       
      Show all issues

      Missing docstring.

    7. reviewboard/search/signal_processor.py (Diff revision 2)
       
       
      Show all issues

      **kwargs

    8. reviewboard/search/signal_processor.py (Diff revision 2)
       
       
      Show all issues

      Can we put review_request in the function argument list instead?

      By the way, if this function's going to be specific to review requests, we should rename the function and update how it's connected.

    9. reviewboard/search/signal_processor.py (Diff revision 2)
       
       
      Show all issues

      **kwargs

    10. reviewboard/search/tests.py (Diff revision 2)
       
       
       
       
       
      Show all issues

      Alphabetical order.

    11. reviewboard/search/tests.py (Diff revision 2)
       
       
       
      Show all issues

      As discussed in Slack, we're going to want to allow this for Whoosh. It may not be great for large deployments, but smaller ones should definitely be able to start out with on-the-fly indexing.

    12. reviewboard/search/tests.py (Diff revision 2)
       
       
       
       
      Show all issues

      Probably better as self.assertIsInstance(...)

    13. reviewboard/search/tests.py (Diff revision 2)
       
       
       
      Show all issues

      assertIsInstance

    14. 
        
    brennie
    reviewbot
    1. Tool: Pyflakes
      Processed Files:
          reviewboard/admin/siteconfig.py
          reviewboard/admin/forms.py
          reviewboard/settings.py
          reviewboard/search/search_backends/registry.py
          reviewboard/search/tests.py
          reviewboard/reviews/models/review_request.py
          reviewboard/reviews/__init__.py
          reviewboard/search/signal_processor.py
          reviewboard/reviews/signals.py
          reviewboard/search/search_backends/elasticsearch.py
      
      Ignored Files:
          reviewboard/templates/admin/search_settings.html
          reviewboard/htdocs/media/cat.jpg
      
      
      
      Tool: PEP8 Style Checker
      Processed Files:
          reviewboard/admin/siteconfig.py
          reviewboard/admin/forms.py
          reviewboard/settings.py
          reviewboard/search/search_backends/registry.py
          reviewboard/search/tests.py
          reviewboard/reviews/models/review_request.py
          reviewboard/reviews/__init__.py
          reviewboard/search/signal_processor.py
          reviewboard/reviews/signals.py
          reviewboard/search/search_backends/elasticsearch.py
      
      Ignored Files:
          reviewboard/templates/admin/search_settings.html
          reviewboard/htdocs/media/cat.jpg
      
      
    2. reviewboard/reviews/__init__.py (Diff revision 3)
       
       
      Show all issues
      Col: 80
       E501 line too long (101 > 79 characters)
      
    3. Show all issues
       'elasticsearch' imported but unused
      
    4. reviewboard/settings.py (Diff revision 3)
       
       
      Show all issues
       'django_reset' imported but unused
      
    5. reviewboard/settings.py (Diff revision 3)
       
       
      Show all issues
       'from settings_local import *' used; unable to detect undefined names
      
    6. 
        
    chipx86
    1. Got some thoughts on the signal-related changes. The rest are mostly trivial changes.

    2. Show all issues

      You probably didn't mean to include cat.jpg.

      1. cat.jpg is integral to this feature.

    3. reviewboard/admin/forms.py (Diff revision 3)
       
       
      Show all issues

      "whe" -> "when"

    4. reviewboard/reviews/__init__.py (Diff revision 3)
       
       
       
       
      Show all issues

      Two blank lines.

    5. reviewboard/reviews/__init__.py (Diff revision 3)
       
       
       
      Show all issues

      I don't think the way this is all working is thread-safe. If multiple threads are all going through the signal logic below with the same PK as the key, they're going to stomp over each other. You'll need a threading.local() for all this.

      Should also help prevent the same level of leakage that could occur if there's any kind of error in-between the pre_clear/post_clear signals.

    6. reviewboard/reviews/__init__.py (Diff revision 3)
       
       
      Show all issues

      Can we move this below, so it's closer to where we connect it?

    7. reviewboard/reviews/__init__.py (Diff revision 3)
       
       
      Show all issues

      Just wondering, but is there any chance that this can negatively impact consumers today? Or does this situation emit a signal that's otherwise never sent?

      My concern is that this code, which is really intended for the search index specifically, could end up with side-effects that we're not anticipating. It sounds like it's inconsistent with how signals normally work for these relations, which potentially could trip someone up down the road?

      I wonder if instead of having all this be general, we could make this really about the search index. That is, instead of having this be global here, we have this live in the search indexing code and have the results be function calls into the indexing code instead of signal emissions.

    8. reviewboard/reviews/__init__.py (Diff revision 3)
       
       
      Show all issues

      :py:attr: here?

    9. reviewboard/reviews/__init__.py (Diff revision 3)
       
       
      Show all issues

      "wasn't instead" ?

    10. reviewboard/reviews/__init__.py (Diff revision 3)
       
       
       
      Show all issues

      Blank line between these.

    11. reviewboard/reviews/__init__.py (Diff revision 3)
       
       
      Show all issues

      Only one "g" in "arguments" :)

    12. reviewboard/reviews/__init__.py (Diff revision 3)
       
       
      Show all issues

      This looks like a Shortest Middle Snake comment.

    13. reviewboard/reviews/signals.py (Diff revision 3)
       
       
       
       
       
      Show all issues

      Since it's the search indexing specifically that needs this, I'd prefer not to put this variable in the signal emission. It's a bit confusing and redundant, and sets a precedent for future signals we have to listen to (in our app and maybe others down the road). Instead, the search indexing code should be able to handle the different keyword names.

      What you could do in the indexing signal connection code is something like:

      from functools import partial
      
      ...
      
      save_signals = [
          (ReviewRequest, review_request_published, 'review_request'),
          (User, post_save, 'instance'),
          (User, m2m_changed, 'instance'),
      ]
      
      ...
      
      save_handler = partial(self.check_handle_save, instance_kwarg=
      
      for cls, signal, instance_kwarg in self.save_signals:
          signal.connect(
              partial(self.check_handle_save, instance_kwarg=instance_kwarg),
              sender=cls)
      
      ...
      
      def check_handle_save(self, instance_kwarg, **kwargs):
          instance = kwargs[instance_kwarg]
          ...
      

      (Maybe weak=False will have to be passed to connect. I'm not sure.)

    14. Show all issues

      "on-the-fly indexing", I think.

    15. Show all issues

      Can be one statement.

    16. 
        
    brennie
    reviewbot
    1. Tool: Pyflakes
      Processed Files:
          reviewboard/admin/siteconfig.py
          reviewboard/admin/forms.py
          reviewboard/settings.py
          reviewboard/search/search_backends/registry.py
          reviewboard/search/tests.py
          reviewboard/search/signal_processor.py
          reviewboard/reviews/signals.py
          reviewboard/search/search_backends/elasticsearch.py
      
      Ignored Files:
          reviewboard/templates/admin/search_settings.html
          .gitignore
      
      
      
      Tool: PEP8 Style Checker
      Processed Files:
          reviewboard/admin/siteconfig.py
          reviewboard/admin/forms.py
          reviewboard/settings.py
          reviewboard/search/search_backends/registry.py
          reviewboard/search/tests.py
          reviewboard/search/signal_processor.py
          reviewboard/reviews/signals.py
          reviewboard/search/search_backends/elasticsearch.py
      
      Ignored Files:
          reviewboard/templates/admin/search_settings.html
          .gitignore
      
      
    2. Show all issues
       'elasticsearch' imported but unused
      
    3. reviewboard/search/signal_processor.py (Diff revision 4)
       
       
      Show all issues
      Col: 80
       E501 line too long (86 > 79 characters)
      
    4. reviewboard/search/signal_processor.py (Diff revision 4)
       
       
      Show all issues
      Col: 80
       E501 line too long (105 > 79 characters)
      
    5. reviewboard/settings.py (Diff revision 4)
       
       
      Show all issues
       'django_reset' imported but unused
      
    6. reviewboard/settings.py (Diff revision 4)
       
       
      Show all issues
       'from settings_local import *' used; unable to detect undefined names
      
    7. 
        
    chipx86
    1. Looks great! All my comments here are very minor things. (Mostly typos/ReST errors.)

    2. .gitignore (Diff revision 4)
       
       
       
      Show all issues

      Do we need both? Did the name change?

      1. I accidentally changed the default in a previous patch. I've undone that change now.

    3. reviewboard/search/signal_processor.py (Diff revision 4)
       
       
       
      Show all issues

      Just to help future expandability, can we put Group into a list like we do with save/delete signals?

    4. reviewboard/search/signal_processor.py (Diff revision 4)
       
       
      Show all issues

      You can wrap before the < to satisfy pyflakes.

    5. reviewboard/search/signal_processor.py (Diff revision 4)
       
       
       
      Show all issues

      Alignment is off by one.

    6. reviewboard/search/signal_processor.py (Diff revision 4)
       
       
      Show all issues

      Wasn't what?

    7. reviewboard/search/signal_processor.py (Diff revision 4)
       
       
       
       
       
       
       
       
       
      Show all issues

      Lists can't be indented, or they'll render incorrectly. The * must be aligned with the preceding paragraph.

    8. reviewboard/search/signal_processor.py (Diff revision 4)
       
       
      Show all issues

      Looks like some spaces snuck in here.

    9. reviewboard/search/signal_processor.py (Diff revision 4)
       
       
      Show all issues

      There's an extra space after pk_set.

    10. 
        
    brennie
    reviewbot
    1. Tool: Pyflakes
      Processed Files:
          reviewboard/admin/siteconfig.py
          reviewboard/admin/forms.py
          reviewboard/settings.py
          reviewboard/search/search_backends/registry.py
          reviewboard/search/search_backends/whoosh.py
          reviewboard/search/tests.py
          reviewboard/search/signal_processor.py
          reviewboard/reviews/signals.py
          reviewboard/search/search_backends/elasticsearch.py
      
      Ignored Files:
          reviewboard/templates/admin/search_settings.html
      
      
      
      Tool: PEP8 Style Checker
      Processed Files:
          reviewboard/admin/siteconfig.py
          reviewboard/admin/forms.py
          reviewboard/settings.py
          reviewboard/search/search_backends/registry.py
          reviewboard/search/search_backends/whoosh.py
          reviewboard/search/tests.py
          reviewboard/search/signal_processor.py
          reviewboard/reviews/signals.py
          reviewboard/search/search_backends/elasticsearch.py
      
      Ignored Files:
          reviewboard/templates/admin/search_settings.html
      
      
    2. Show all issues
       'elasticsearch' imported but unused
      
    3. reviewboard/search/signal_processor.py (Diff revision 5)
       
       
      Show all issues
      Col: 80
       E501 line too long (105 > 79 characters)
      
    4. reviewboard/search/signal_processor.py (Diff revision 5)
       
       
      Show all issues
      Col: 1
       W391 blank line at end of file
      
    5. reviewboard/settings.py (Diff revision 5)
       
       
      Show all issues
       'django_reset' imported but unused
      
    6. reviewboard/settings.py (Diff revision 5)
       
       
      Show all issues
       'from settings_local import *' used; unable to detect undefined names
      
    7. 
        
    brennie
    reviewbot
    1. Tool: Pyflakes
      Processed Files:
          reviewboard/admin/siteconfig.py
          reviewboard/admin/forms.py
          reviewboard/settings.py
          reviewboard/search/search_backends/registry.py
          reviewboard/search/search_backends/whoosh.py
          reviewboard/search/tests.py
          reviewboard/search/signal_processor.py
          reviewboard/reviews/signals.py
          reviewboard/search/search_backends/elasticsearch.py
      
      Ignored Files:
          reviewboard/templates/admin/search_settings.html
      
      
      
      Tool: PEP8 Style Checker
      Processed Files:
          reviewboard/admin/siteconfig.py
          reviewboard/admin/forms.py
          reviewboard/settings.py
          reviewboard/search/search_backends/registry.py
          reviewboard/search/search_backends/whoosh.py
          reviewboard/search/tests.py
          reviewboard/search/signal_processor.py
          reviewboard/reviews/signals.py
          reviewboard/search/search_backends/elasticsearch.py
      
      Ignored Files:
          reviewboard/templates/admin/search_settings.html
      
      
    2. Show all issues
       'elasticsearch' imported but unused
      
    3. reviewboard/search/signal_processor.py (Diff revision 6)
       
       
      Show all issues
      Col: 80
       E501 line too long (105 > 79 characters)
      
    4. reviewboard/settings.py (Diff revision 6)
       
       
      Show all issues
       'django_reset' imported but unused
      
    5. reviewboard/settings.py (Diff revision 6)
       
       
      Show all issues
       'from settings_local import *' used; unable to detect undefined names
      
    6. 
        
    chipx86
    1. 
        
    2. Show all issues

      Why this change?

      1. It originally was search-index, but when the patch for search backends went it in it changed to search_index. This moves us back to the original.

    3. reviewboard/search/signal_processor.py (Diff revision 6)
       
       
      Show all issues

      "on-the-fly"

    4. reviewboard/search/signal_processor.py (Diff revision 6)
       
       
      Show all issues

      reviewboard.reviews.models.group.Group, here and anywhere else.

    5. 
        
    brennie
    chipx86
    1. 
        
    2. reviewboard/admin/forms.py (Diff revision 7)
       
       
      Show all issues

      I would say "... for large or multi-server installs."

    3. 
        
    brennie
    Review request changed
    Status:
    Completed
    Change Summary:
    Pushed to release-3.0.x (7de3a90)