Only index full names for Users with public profiles

Review Request #9924 — Created May 10, 2018 and submitted

brennie
Review Board
release-3.0.x
9930
9932
ed0c3ef...
reviewboard

Previously we were indexing the full name, which allowed users to search
by full name. However, if a user has marked their profile as private, we
do not want them to be searchable by their name.

Additionally, this patch addresses two further issues:

  1. The show_profile attribute was stored as a BooleanField, but
    there is a longstanding Haystack bug that results in boolean fields
    always being returned from Whoosh as True. This has been replaced with
    a custom BooleanField with the correct behaviour.

  2. The show_profile as indexing the result of User.is_profile_visible(),
    which will always return False when there is no user argument
    supplied. Instead, we now store the is_private attribute of
    Profile directly on the index.

Finally, the search API has been updated to strip the full name out of
results when using the database query backend (i.e., when either search
or on-the-fly indexing is disabled).

Ran unit tests.

  • 0
  • 0
  • 21
  • 1
  • 22
Description From Last Updated
Checks run (1 failed, 1 succeeded)
flake8 failed.
JSHint passed.

flake8

brennie
brennie
brennie
Review request changed

Change Summary:

Fix actual indexed text

Commit:

-f04c5287bc45de7b705c6312e986b12d717283e8
+a1549989c90a625a90c5d0ec5e005e68a41fd02b

Diff:

Revision 4 (+187 -7)

Show changes

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

brennie
Review request changed

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

brennie
Review request changed
brennie
chipx86
  1. 
      
  2. reviewboard/accounts/search_indexes.py (Diff revision 7)
     
     

    This should be stripped out for private profiles as well.

  3. reviewboard/accounts/search_indexes.py (Diff revision 7)
     
     
     

    This function returns a single unicode string, not a list.

  4. reviewboard/accounts/search_indexes.py (Diff revision 7)
     
     

    Backticks in summaries can render strangely. We should just show the text as-is.

  5. reviewboard/accounts/search_indexes.py (Diff revision 7)
     
     
     
     
     

    Missing "Returns".

  6. reviewboard/accounts/search_indexes.py (Diff revision 7)
     
     
     
     
     

    We don't want to use get_profile() in search indexes, because any users without a profile will end up triggering database changes. This will be slow and disruptive. Instead, we need to see if a profile was fetched for the user at all (query cache has a value) and operate based on that.

    We should probably do that in a private method in this class, or maybe add a cached_only= argument to get_profile()?

  7. reviewboard/accounts/search_indexes.py (Diff revision 7)
     
     

    Same as above regarding backticks.

  8. reviewboard/accounts/search_indexes.py (Diff revision 7)
     
     

    This is missing a description.

  9. reviewboard/reviews/search_indexes.py (Diff revision 7)
     
     

    Same comment about backticks.

  10. reviewboard/reviews/search_indexes.py (Diff revision 7)
     
     
     

    Same regarding get_profile usage.

  11. reviewboard/search/fields.py (Diff revision 7)
     
     
     

    Blank line between these.

  12. reviewboard/search/fields.py (Diff revision 7)
     
     
     

    What license is that under?

    1. There isn't a copyright notice. Should I try to get a hold of the user to ask them to release it under CC0 or should I remove the attribution since the code is so simple?

    2. We're not using his original code anymore, with the changes I requested.

  13. reviewboard/search/fields.py (Diff revision 7)
     
     
     
     
     

    This bit would be faster as:

    try:
        return self.value_map[value]
    except KeyError:
        return bool(value)
    
  14. reviewboard/search/signal_processor.py (Diff revision 7)
     
     
     
     
     
     
     

    How about:

    if isinstance(instance, Profile):
        kwargs['sender'] = User
        instance = instance.user
    
    self.handle_save(...)
    
  15. I don't believe filters are allowed in a blocktrans. This may need more closer testing.

    Instead, I think you can do: and result.author|default:result.username as author.

  16. The e-mail address also needs to be wrapped in this.

  17. reviewboard/webapi/resources/search.py (Diff revision 7)
     
     
     
     
     
     
     
     
     
     
     
     
     

    Can you print the resulting expression and make sure the grouping for the full name query ends up looking like ((a | b) & is_private=True)? I want to be sure the grouping is going to be future-proof.

    1. It does:

      In [1]: from django.db.models.query import Q
      
      In [2]: parts = ['parts[0]', 'parts[1]']
      
      In [3]: q = 'q'
      
      In [4]:query = (
         ...:    (Q(first_name__istartswith=parts[0]) &
         ...:     Q(last_name__istartswith=parts[1])) |
         ...:    (Q(first_name__istartswith=parts[1]) &
         ...:     Q(last_name__istartswith=parts[0]))
         ...:)
      
      In [5]: str(query)
      Out[5]: "(OR: (AND: ('first_name__istartswith', 'parts[0]'), ('last_name__istartswith', 'parts[1]')), (AND: ('first_name__istartswith', 'parts[1]'), ('last_name__istartswith', 'parts[0]')))"
      
      In [6]: query |= (Q(first_name__istartswith=q) | Q(last_name__istartswith=q))
      
      In [7]: str(query)
      Out[7]: "(OR: (AND: ('first_name__istartswith', 'parts[0]'), ('last_name__istartswith', 'parts[1]')), (AND: ('first_name__istartswith', 'parts[1]'), ('last_name__istartswith', 'parts[0]')), ('first_name__istartswith', 'q'), ('last_name__istartswith', 'q'))"
      
      In [8]: query &= Q(profile__is_private=False)
      
      In [9]: str(query)
      Out[9]: "(AND: (OR: (AND: ('first_name__istartswith', 'parts[0]'), ('last_name__istartswith', 'parts[1]')), (AND: ('first_name__istartswith', 'parts[1]'), ('last_name__istartswith', 'parts[0]')), ('first_name__istartswith', 'q'), ('last_name__istartswith', 'q')), ('profile__is_private', False))"
      
  18. reviewboard/webapi/resources/search.py (Diff revision 7)
     
     

    Can this be moved into the initial query = above?

    1. No. The way that Q works is that it builds a tree of nodes. Each operator makes a new tree with the operator at the root and the expressions on the LHS and RHS of the tree.

      If I put this into the intial query we would get:

      ((big_first_query) & Q(profile__is_private=False)) | (Q(first_name__istartswith=q) | Q(last_name__istartswith=q))

      Which is not the same.

  19. 
      
brennie
brennie
brennie
chipx86
  1. 
      
  2. reviewboard/search/fields.py (Diff revision 10)
     
     
     
     

    Missing the unicode_literals import.

  3. 
      
brennie
david
  1. Ship It!
  2. 
      
brennie
Review request changed

Status: Closed (submitted)

Change Summary:

Pushed to release-3.0.x (dc27fa4)
Loading...