Only index full names for Users with public profiles

Review Request #9924 — Created May 10, 2018 and submitted

Information

Review Board
release-3.0.x
ed0c3ef...

Reviewers

Previously we were indexing the full name, which allowed users to search
by full name. However, if a user has marked their profile as private, we
do not want them to be searchable by their name.

Additionally, this patch addresses two further issues:

  1. The show_profile attribute was stored as a BooleanField, but
    there is a longstanding Haystack bug that results in boolean fields
    always being returned from Whoosh as True. This has been replaced with
    a custom BooleanField with the correct behaviour.

  2. The show_profile as indexing the result of User.is_profile_visible(),
    which will always return False when there is no user argument
    supplied. Instead, we now store the is_private attribute of
    Profile directly on the index.

Finally, the search API has been updated to strip the full name out of
results when using the database query backend (i.e., when either search
or on-the-fly indexing is disabled).

Ran unit tests.

Description From Last Updated

E303 too many blank lines (2)

reviewbotreviewbot

E303 too many blank lines (2)

reviewbotreviewbot

E303 too many blank lines (2)

reviewbotreviewbot

E303 too many blank lines (2)

reviewbotreviewbot

This should be stripped out for private profiles as well.

chipx86chipx86

This function returns a single unicode string, not a list.

chipx86chipx86

Backticks in summaries can render strangely. We should just show the text as-is.

chipx86chipx86

Missing "Returns".

chipx86chipx86

We don't want to use get_profile() in search indexes, because any users without a profile will end up triggering database …

chipx86chipx86

Same as above regarding backticks.

chipx86chipx86

This is missing a description.

chipx86chipx86

Same comment about backticks.

chipx86chipx86

Same regarding get_profile usage.

chipx86chipx86

Blank line between these.

chipx86chipx86

What license is that under?

chipx86chipx86

This bit would be faster as: try: return self.value_map[value] except KeyError: return bool(value)

chipx86chipx86

How about: if isinstance(instance, Profile): kwargs['sender'] = User instance = instance.user self.handle_save(...)

chipx86chipx86

I don't believe filters are allowed in a blocktrans. This may need more closer testing. Instead, I think you can …

chipx86chipx86

The e-mail address also needs to be wrapped in this.

chipx86chipx86

Can you print the resulting expression and make sure the grouping for the full name query ends up looking like …

chipx86chipx86

Can this be moved into the initial query = above?

chipx86chipx86

Missing the unicode_literals import.

chipx86chipx86
Checks run (1 failed, 1 succeeded)
flake8 failed.
JSHint passed.

flake8

brennie
brennie
brennie
Review request changed

Change Summary:

Fix actual indexed text

Commit:

-f04c5287bc45de7b705c6312e986b12d717283e8
+a1549989c90a625a90c5d0ec5e005e68a41fd02b

Diff:

Revision 4 (+187 -7)

Show changes

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

brennie
Review request changed

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

brennie
Review request changed
brennie
chipx86
  1. 
      
  2. reviewboard/accounts/search_indexes.py (Diff revision 7)
     
     
    Show all issues

    This should be stripped out for private profiles as well.

  3. reviewboard/accounts/search_indexes.py (Diff revision 7)
     
     
     
    Show all issues

    This function returns a single unicode string, not a list.

  4. reviewboard/accounts/search_indexes.py (Diff revision 7)
     
     
    Show all issues

    Backticks in summaries can render strangely. We should just show the text as-is.

  5. reviewboard/accounts/search_indexes.py (Diff revision 7)
     
     
     
     
     
    Show all issues

    Missing "Returns".

  6. reviewboard/accounts/search_indexes.py (Diff revision 7)
     
     
     
     
     
    Show all issues

    We don't want to use get_profile() in search indexes, because any users without a profile will end up triggering database changes. This will be slow and disruptive. Instead, we need to see if a profile was fetched for the user at all (query cache has a value) and operate based on that.

    We should probably do that in a private method in this class, or maybe add a cached_only= argument to get_profile()?

  7. reviewboard/accounts/search_indexes.py (Diff revision 7)
     
     
    Show all issues

    Same as above regarding backticks.

  8. reviewboard/accounts/search_indexes.py (Diff revision 7)
     
     
    Show all issues

    This is missing a description.

  9. reviewboard/reviews/search_indexes.py (Diff revision 7)
     
     
    Show all issues

    Same comment about backticks.

  10. reviewboard/reviews/search_indexes.py (Diff revision 7)
     
     
     
    Show all issues

    Same regarding get_profile usage.

  11. reviewboard/search/fields.py (Diff revision 7)
     
     
     
    Show all issues

    Blank line between these.

  12. reviewboard/search/fields.py (Diff revision 7)
     
     
     
    Show all issues

    What license is that under?

    1. There isn't a copyright notice. Should I try to get a hold of the user to ask them to release it under CC0 or should I remove the attribution since the code is so simple?

    2. We're not using his original code anymore, with the changes I requested.

  13. reviewboard/search/fields.py (Diff revision 7)
     
     
     
     
     
    Show all issues

    This bit would be faster as:

    try:
        return self.value_map[value]
    except KeyError:
        return bool(value)
    
  14. reviewboard/search/signal_processor.py (Diff revision 7)
     
     
     
     
     
     
     
    Show all issues

    How about:

    if isinstance(instance, Profile):
        kwargs['sender'] = User
        instance = instance.user
    
    self.handle_save(...)
    
  15. Show all issues

    I don't believe filters are allowed in a blocktrans. This may need more closer testing.

    Instead, I think you can do: and result.author|default:result.username as author.

  16. Show all issues

    The e-mail address also needs to be wrapped in this.

  17. reviewboard/webapi/resources/search.py (Diff revision 7)
     
     
     
     
     
     
     
     
     
     
     
     
     
    Show all issues

    Can you print the resulting expression and make sure the grouping for the full name query ends up looking like ((a | b) & is_private=True)? I want to be sure the grouping is going to be future-proof.

    1. It does:

      In [1]: from django.db.models.query import Q
      
      In [2]: parts = ['parts[0]', 'parts[1]']
      
      In [3]: q = 'q'
      
      In [4]:query = (
         ...:    (Q(first_name__istartswith=parts[0]) &
         ...:     Q(last_name__istartswith=parts[1])) |
         ...:    (Q(first_name__istartswith=parts[1]) &
         ...:     Q(last_name__istartswith=parts[0]))
         ...:)
      
      In [5]: str(query)
      Out[5]: "(OR: (AND: ('first_name__istartswith', 'parts[0]'), ('last_name__istartswith', 'parts[1]')), (AND: ('first_name__istartswith', 'parts[1]'), ('last_name__istartswith', 'parts[0]')))"
      
      In [6]: query |= (Q(first_name__istartswith=q) | Q(last_name__istartswith=q))
      
      In [7]: str(query)
      Out[7]: "(OR: (AND: ('first_name__istartswith', 'parts[0]'), ('last_name__istartswith', 'parts[1]')), (AND: ('first_name__istartswith', 'parts[1]'), ('last_name__istartswith', 'parts[0]')), ('first_name__istartswith', 'q'), ('last_name__istartswith', 'q'))"
      
      In [8]: query &= Q(profile__is_private=False)
      
      In [9]: str(query)
      Out[9]: "(AND: (OR: (AND: ('first_name__istartswith', 'parts[0]'), ('last_name__istartswith', 'parts[1]')), (AND: ('first_name__istartswith', 'parts[1]'), ('last_name__istartswith', 'parts[0]')), ('first_name__istartswith', 'q'), ('last_name__istartswith', 'q')), ('profile__is_private', False))"
      
  18. reviewboard/webapi/resources/search.py (Diff revision 7)
     
     
    Show all issues

    Can this be moved into the initial query = above?

    1. No. The way that Q works is that it builds a tree of nodes. Each operator makes a new tree with the operator at the root and the expressions on the LHS and RHS of the tree.

      If I put this into the intial query we would get:

      ((big_first_query) & Q(profile__is_private=False)) | (Q(first_name__istartswith=q) | Q(last_name__istartswith=q))

      Which is not the same.

  19. 
      
brennie
brennie
brennie
chipx86
  1. 
      
  2. reviewboard/search/fields.py (Diff revision 10)
     
     
     
     
    Show all issues

    Missing the unicode_literals import.

  3. 
      
brennie
david
  1. Ship It!
  2. 
      
brennie
Review request changed

Status: Closed (submitted)

Change Summary:

Pushed to release-3.0.x (dc27fa4)
Loading...