• 
      

    Only index full names for Users with public profiles

    Review Request #9924 — Created May 11, 2018 and submitted

    Information

    Review Board
    release-3.0.x
    ed0c3ef...

    Reviewers

    Previously we were indexing the full name, which allowed users to search
    by full name. However, if a user has marked their profile as private, we
    do not want them to be searchable by their name.

    Additionally, this patch addresses two further issues:

    1. The show_profile attribute was stored as a BooleanField, but
      there is a longstanding Haystack bug that results in boolean fields
      always being returned from Whoosh as True. This has been replaced with
      a custom BooleanField with the correct behaviour.

    2. The show_profile as indexing the result of User.is_profile_visible(),
      which will always return False when there is no user argument
      supplied. Instead, we now store the is_private attribute of
      Profile directly on the index.

    Finally, the search API has been updated to strip the full name out of
    results when using the database query backend (i.e., when either search
    or on-the-fly indexing is disabled).

    Ran unit tests.

    Description From Last Updated

    E303 too many blank lines (2)

    reviewbotreviewbot

    E303 too many blank lines (2)

    reviewbotreviewbot

    E303 too many blank lines (2)

    reviewbotreviewbot

    E303 too many blank lines (2)

    reviewbotreviewbot

    This should be stripped out for private profiles as well.

    chipx86chipx86

    This function returns a single unicode string, not a list.

    chipx86chipx86

    Backticks in summaries can render strangely. We should just show the text as-is.

    chipx86chipx86

    Missing "Returns".

    chipx86chipx86

    We don't want to use get_profile() in search indexes, because any users without a profile will end up triggering database …

    chipx86chipx86

    Same as above regarding backticks.

    chipx86chipx86

    This is missing a description.

    chipx86chipx86

    Same comment about backticks.

    chipx86chipx86

    Same regarding get_profile usage.

    chipx86chipx86

    Blank line between these.

    chipx86chipx86

    What license is that under?

    chipx86chipx86

    This bit would be faster as: try: return self.value_map[value] except KeyError: return bool(value)

    chipx86chipx86

    How about: if isinstance(instance, Profile): kwargs['sender'] = User instance = instance.user self.handle_save(...)

    chipx86chipx86

    I don't believe filters are allowed in a blocktrans. This may need more closer testing. Instead, I think you can …

    chipx86chipx86

    The e-mail address also needs to be wrapped in this.

    chipx86chipx86

    Can you print the resulting expression and make sure the grouping for the full name query ends up looking like …

    chipx86chipx86

    Can this be moved into the initial query = above?

    chipx86chipx86

    Missing the unicode_literals import.

    chipx86chipx86
    Checks run (1 failed, 1 succeeded)
    flake8 failed.
    JSHint passed.

    flake8

    brennie
    brennie
    brennie
    Review request changed
    Change Summary:

    Fix actual indexed text

    Commit:
    f04c5287bc45de7b705c6312e986b12d717283e8
    a1549989c90a625a90c5d0ec5e005e68a41fd02b

    Checks run (1 failed, 1 succeeded)

    flake8 failed.
    JSHint passed.

    flake8

    brennie
    Review request changed
    Change Summary:

    Fix actual indexed text

    Commit:
    a1549989c90a625a90c5d0ec5e005e68a41fd02b
    7d8a61dcaa6cd7f183c95330f332d1a3554dc45f

    Checks run (1 failed, 1 succeeded)

    flake8 failed.
    JSHint passed.

    flake8

    brennie
    Review request changed
    brennie
    chipx86
    1. 
        
    2. reviewboard/accounts/search_indexes.py (Diff revision 7)
       
       
      Show all issues

      This should be stripped out for private profiles as well.

    3. reviewboard/accounts/search_indexes.py (Diff revision 7)
       
       
       
      Show all issues

      This function returns a single unicode string, not a list.

    4. reviewboard/accounts/search_indexes.py (Diff revision 7)
       
       
      Show all issues

      Backticks in summaries can render strangely. We should just show the text as-is.

    5. reviewboard/accounts/search_indexes.py (Diff revision 7)
       
       
       
       
       
      Show all issues

      Missing "Returns".

    6. reviewboard/accounts/search_indexes.py (Diff revision 7)
       
       
       
       
       
      Show all issues

      We don't want to use get_profile() in search indexes, because any users without a profile will end up triggering database changes. This will be slow and disruptive. Instead, we need to see if a profile was fetched for the user at all (query cache has a value) and operate based on that.

      We should probably do that in a private method in this class, or maybe add a cached_only= argument to get_profile()?

    7. reviewboard/accounts/search_indexes.py (Diff revision 7)
       
       
      Show all issues

      Same as above regarding backticks.

    8. reviewboard/accounts/search_indexes.py (Diff revision 7)
       
       
      Show all issues

      This is missing a description.

    9. reviewboard/reviews/search_indexes.py (Diff revision 7)
       
       
      Show all issues

      Same comment about backticks.

    10. reviewboard/reviews/search_indexes.py (Diff revision 7)
       
       
       
      Show all issues

      Same regarding get_profile usage.

    11. reviewboard/search/fields.py (Diff revision 7)
       
       
       
      Show all issues

      Blank line between these.

    12. reviewboard/search/fields.py (Diff revision 7)
       
       
       
      Show all issues

      What license is that under?

      1. There isn't a copyright notice. Should I try to get a hold of the user to ask them to release it under CC0 or should I remove the attribution since the code is so simple?

      2. We're not using his original code anymore, with the changes I requested.

    13. reviewboard/search/fields.py (Diff revision 7)
       
       
       
       
       
      Show all issues

      This bit would be faster as:

      try:
          return self.value_map[value]
      except KeyError:
          return bool(value)
      
    14. reviewboard/search/signal_processor.py (Diff revision 7)
       
       
       
       
       
       
       
      Show all issues

      How about:

      if isinstance(instance, Profile):
          kwargs['sender'] = User
          instance = instance.user
      
      self.handle_save(...)
      
    15. Show all issues

      I don't believe filters are allowed in a blocktrans. This may need more closer testing.

      Instead, I think you can do: and result.author|default:result.username as author.

    16. Show all issues

      The e-mail address also needs to be wrapped in this.

    17. reviewboard/webapi/resources/search.py (Diff revision 7)
       
       
       
       
       
       
       
       
       
       
       
       
       
      Show all issues

      Can you print the resulting expression and make sure the grouping for the full name query ends up looking like ((a | b) & is_private=True)? I want to be sure the grouping is going to be future-proof.

      1. It does:

        In [1]: from django.db.models.query import Q
        
        In [2]: parts = ['parts[0]', 'parts[1]']
        
        In [3]: q = 'q'
        
        In [4]:query = (
           ...:    (Q(first_name__istartswith=parts[0]) &
           ...:     Q(last_name__istartswith=parts[1])) |
           ...:    (Q(first_name__istartswith=parts[1]) &
           ...:     Q(last_name__istartswith=parts[0]))
           ...:)
        
        In [5]: str(query)
        Out[5]: "(OR: (AND: ('first_name__istartswith', 'parts[0]'), ('last_name__istartswith', 'parts[1]')), (AND: ('first_name__istartswith', 'parts[1]'), ('last_name__istartswith', 'parts[0]')))"
        
        In [6]: query |= (Q(first_name__istartswith=q) | Q(last_name__istartswith=q))
        
        In [7]: str(query)
        Out[7]: "(OR: (AND: ('first_name__istartswith', 'parts[0]'), ('last_name__istartswith', 'parts[1]')), (AND: ('first_name__istartswith', 'parts[1]'), ('last_name__istartswith', 'parts[0]')), ('first_name__istartswith', 'q'), ('last_name__istartswith', 'q'))"
        
        In [8]: query &= Q(profile__is_private=False)
        
        In [9]: str(query)
        Out[9]: "(AND: (OR: (AND: ('first_name__istartswith', 'parts[0]'), ('last_name__istartswith', 'parts[1]')), (AND: ('first_name__istartswith', 'parts[1]'), ('last_name__istartswith', 'parts[0]')), ('first_name__istartswith', 'q'), ('last_name__istartswith', 'q')), ('profile__is_private', False))"
        
    18. reviewboard/webapi/resources/search.py (Diff revision 7)
       
       
      Show all issues

      Can this be moved into the initial query = above?

      1. No. The way that Q works is that it builds a tree of nodes. Each operator makes a new tree with the operator at the root and the expressions on the LHS and RHS of the tree.

        If I put this into the intial query we would get:

        ((big_first_query) & Q(profile__is_private=False)) | (Q(first_name__istartswith=q) | Q(last_name__istartswith=q))

        Which is not the same.

    19. 
        
    brennie
    brennie
    brennie
    chipx86
    1. 
        
    2. reviewboard/search/fields.py (Diff revision 10)
       
       
       
       
      Show all issues

      Missing the unicode_literals import.

    3. 
        
    brennie
    david
    1. Ship It!
    2. 
        
    brennie
    Review request changed
    Status:
    Completed
    Change Summary:
    Pushed to release-3.0.x (dc27fa4)