Only index full names for Users with public profiles

Review Request #9924 — Created May 10, 2018 and submitted

Review Board

Previously we were indexing the full name, which allowed users to search
by full name. However, if a user has marked their profile as private, we
do not want them to be searchable by their name.

Additionally, this patch addresses two further issues:

  1. The show_profile attribute was stored as a BooleanField, but
    there is a longstanding Haystack bug that results in boolean fields
    always being returned from Whoosh as True. This has been replaced with
    a custom BooleanField with the correct behaviour.

  2. The show_profile as indexing the result of User.is_profile_visible(),
    which will always return False when there is no user argument
    supplied. Instead, we now store the is_private attribute of
    Profile directly on the index.

Finally, the search API has been updated to strip the full name out of
results when using the database query backend (i.e., when either search
or on-the-fly indexing is disabled).

Ran unit tests.

  • 0
  • 0
  • 21
  • 1
  • 22
Description From Last Updated
Checks run (1 failed, 1 succeeded)
flake8 failed.
JSHint passed.


Review request changed

Change Summary:

Fix actual indexed text




Revision 4 (+187 -7)

Show changes

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.


Review request changed

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.


Review request changed
  2. reviewboard/accounts/ (Diff revision 7)

    This should be stripped out for private profiles as well.

  3. reviewboard/accounts/ (Diff revision 7)

    This function returns a single unicode string, not a list.

  4. reviewboard/accounts/ (Diff revision 7)

    Backticks in summaries can render strangely. We should just show the text as-is.

  5. reviewboard/accounts/ (Diff revision 7)

    Missing "Returns".

  6. reviewboard/accounts/ (Diff revision 7)

    We don't want to use get_profile() in search indexes, because any users without a profile will end up triggering database changes. This will be slow and disruptive. Instead, we need to see if a profile was fetched for the user at all (query cache has a value) and operate based on that.

    We should probably do that in a private method in this class, or maybe add a cached_only= argument to get_profile()?

  7. reviewboard/accounts/ (Diff revision 7)

    Same as above regarding backticks.

  8. reviewboard/accounts/ (Diff revision 7)

    This is missing a description.

  9. reviewboard/reviews/ (Diff revision 7)

    Same comment about backticks.

  10. reviewboard/reviews/ (Diff revision 7)

    Same regarding get_profile usage.

  11. reviewboard/search/ (Diff revision 7)

    Blank line between these.

  12. reviewboard/search/ (Diff revision 7)

    What license is that under?

    1. There isn't a copyright notice. Should I try to get a hold of the user to ask them to release it under CC0 or should I remove the attribution since the code is so simple?

    2. We're not using his original code anymore, with the changes I requested.

  13. reviewboard/search/ (Diff revision 7)

    This bit would be faster as:

        return self.value_map[value]
    except KeyError:
        return bool(value)
  14. reviewboard/search/ (Diff revision 7)

    How about:

    if isinstance(instance, Profile):
        kwargs['sender'] = User
        instance = instance.user
  15. I don't believe filters are allowed in a blocktrans. This may need more closer testing.

    Instead, I think you can do: and|default:result.username as author.

  16. The e-mail address also needs to be wrapped in this.

  17. reviewboard/webapi/resources/ (Diff revision 7)

    Can you print the resulting expression and make sure the grouping for the full name query ends up looking like ((a | b) & is_private=True)? I want to be sure the grouping is going to be future-proof.

    1. It does:

      In [1]: from django.db.models.query import Q
      In [2]: parts = ['parts[0]', 'parts[1]']
      In [3]: q = 'q'
      In [4]:query = (
         ...:    (Q(first_name__istartswith=parts[0]) &
         ...:     Q(last_name__istartswith=parts[1])) |
         ...:    (Q(first_name__istartswith=parts[1]) &
         ...:     Q(last_name__istartswith=parts[0]))
      In [5]: str(query)
      Out[5]: "(OR: (AND: ('first_name__istartswith', 'parts[0]'), ('last_name__istartswith', 'parts[1]')), (AND: ('first_name__istartswith', 'parts[1]'), ('last_name__istartswith', 'parts[0]')))"
      In [6]: query |= (Q(first_name__istartswith=q) | Q(last_name__istartswith=q))
      In [7]: str(query)
      Out[7]: "(OR: (AND: ('first_name__istartswith', 'parts[0]'), ('last_name__istartswith', 'parts[1]')), (AND: ('first_name__istartswith', 'parts[1]'), ('last_name__istartswith', 'parts[0]')), ('first_name__istartswith', 'q'), ('last_name__istartswith', 'q'))"
      In [8]: query &= Q(profile__is_private=False)
      In [9]: str(query)
      Out[9]: "(AND: (OR: (AND: ('first_name__istartswith', 'parts[0]'), ('last_name__istartswith', 'parts[1]')), (AND: ('first_name__istartswith', 'parts[1]'), ('last_name__istartswith', 'parts[0]')), ('first_name__istartswith', 'q'), ('last_name__istartswith', 'q')), ('profile__is_private', False))"
  18. reviewboard/webapi/resources/ (Diff revision 7)

    Can this be moved into the initial query = above?

    1. No. The way that Q works is that it builds a tree of nodes. Each operator makes a new tree with the operator at the root and the expressions on the LHS and RHS of the tree.

      If I put this into the intial query we would get:

      ((big_first_query) & Q(profile__is_private=False)) | (Q(first_name__istartswith=q) | Q(last_name__istartswith=q))

      Which is not the same.

  2. reviewboard/search/ (Diff revision 10)

    Missing the unicode_literals import.

  1. Ship It!
Review request changed

Status: Closed (submitted)

Change Summary:

Pushed to release-3.0.x (dc27fa4)