- Testing Done:
-
Tested searching using the following:
~ id (single-digit IDs do not work), summary, description, testing done, bugs, author, submitter. ~ id (single-digit IDs do not work), summary, description, testing done, bugs, author, submitter, diff filenames.
Replace Lucene search for Review Requests with Haystack and Whoosh.
Review Request #4935 — Created Nov. 11, 2013 and submitted
Replace Lucene search for Review Requests with Haystack and Whoosh.
Full-text search with Lucene is a hassle to set up. Haystack and Whoosh are
pure python so it's easy to include them with Review Board and have full-text
search available out of the box. Regular maintenance of the search index is
still required for the search function to return up-to-date results.These Review Request fields are indexed : id, summary, description, commit_id,
testing done, bugs, author (submitter's full name), submitter (username),
filenames of the files in the diffs (diff content is not indexed).
Enabled and disabled search in General Settings and changing index file path and results per page,
followed by rebuilding the index.Search using the following fields:
id, summary, description, testing done, bugs, author, submitter, diff filenames.Built and inspected Sphinx docs.
Built and updated the index through rb-site.
Unit tests passed.
Description | From | Last Updated |
---|---|---|
Hrm. Does this render to a form with a bunch of input elements? I think we want a more google-like … |
david | |
What is this file used for, and how is it referenced? I don't see this filename in any of your … |
david | |
Haystack loads HAYSTACK_CONNECTIONS from settings.py in its init.py. Overwriting HAYSTACK_CONNECTIONS through load_site_config() does not work for search because the new … |
ED edwlee | |
This is a little bit confusing to me, because when I see "depends" I think "dependency". How about "An up-to-date … |
david | |
Can we call this get_diff_filenames? |
david | |
I'm not sure where the best place to put it is (here, in the view, in SearchQuerySet, etc), but at … |
david | |
Revert this removed line. |
david | |
How about just making filenames a set to begin with? |
david | |
This should probably render a template with the error message instead of giving a (mostly) blank page. This text should … |
david | |
Do you need to include this method if it doesn't do anything? |
david | |
We should probably limit sqs here to some maximum number of results (maybe a couple hundred?). This could be a … |
david | |
Including this in here makes haystack a hard dependency. We should probably include it in setup.py and change the docs/UI … |
david | |
This was poorly localized to begin with, but it's worse now. How about using blocktrans instead? {% blocktrans %} <b>{{hits}}</b> … |
david | |
It's probably ok to have an empty div in the case that there are no page links. I don't think … |
david | |
The text in here should be inside {% trans %} tags. |
david | |
Can we just depend on these as part of Review Board? I think, given that these are pure-Python, that it's … |
chipx86 | |
Can we move the import up around the others? |
chipx86 | |
If we make these dependencies, this whole function and all conditionals involving it can just go away. |
chipx86 | |
Mind adding a comment just explaining what we're doing here? |
chipx86 | |
Not sure I love having this here, but I understand why.. Can we rename this to get_all_diff_filenames? |
chipx86 | |
You can reduce the number of queries to 1 by doing: for filediff in FileDiff.objects.filter(diffset_history=self.diffset_history_id): In fact, you can probably … |
chipx86 | |
This is a start, but it's not enough. We also need to limit this only to those review requests that … |
chipx86 | |
This template is old, but the indentation for template tags and HTML should be independent. HTML should be indented relative … |
chipx86 | |
No spaces around variable names. So, {{hits}}, not {{ hits }}. |
chipx86 | |
No spaces. What is this file for, btw? |
chipx86 | |
Should be one statement, and immediately after the {% extends %}. |
chipx86 | |
Should apply indentation rules here too. |
chipx86 | |
Can you put a blank line above this? |
chipx86 | |
Does assigning these to self.results_per_page/max_search_results not work? Might as well also move this below the search_enable check. |
david | |
Can you prefix this comment with "XXX:" to make it searchable as something we should fix in the future? This … |
david | |
Hmm. Does SearchQuerySet order by relevance? This probably loses the ordering. We might want to turn it into a normal … |
david |
- Change Summary:
-
Support searching on specific fields.
- Description:
-
Prototype indexed search for Review Requests using Haystack and Whoosh.
Work in progress:
To test this patch, you need django-haystack and whoosh installed. Run ./reviewboard/manage.py rebuild_index
to build the search index for thefirst time. Run ./reviewboard/manage.py update_index
to update the search index.~ These Review Request fields are indexed : id, summary, description, changenum,
~ These Review Request fields are indexed : id, summary, description, commit_id,
testing done, bugs, author (submitter's full name), submitter (username), filenames of the files in the diffs (diff content is not indexed). ~ A temporary search page is set up at
/search
for testing.~ A temporary search page is set up at
/search
for testing. The Full-Text+ form field searches all indexed fields. You can further filter the full-text + search result using the field-specific search filters. If the Full-Text filter + is not specified, the field-specific filters apply to all indexed + Review Requests. To-do:
~ Add support for boolean search (AND, OR, NOT) and searching specific fields. ~ Support boolean search (AND, OR, NOT).
-
-
Hrm. Does this render to a form with a bunch of input elements?
I think we want a more google-like experience with just the single search box that can search everything. Part of that might require the query parsing, though...
-
-
What is this file used for, and how is it referenced? I don't see this filename in any of your code.
- Change Summary:
-
Removed field specific form fields, since we only want a single search field.
Removed the temporary search field and integrate search into the main RB search box. - Description:
-
Prototype indexed search for Review Requests using Haystack and Whoosh.
Work in progress:
To test this patch, you need django-haystack and whoosh installed. Run ./reviewboard/manage.py rebuild_index
to build the search index for thefirst time. Run ./reviewboard/manage.py update_index
to update the search index.These Review Request fields are indexed : id, summary, description, commit_id,
testing done, bugs, author (submitter's full name), submitter (username), filenames of the files in the diffs (diff content is not indexed). - A temporary search page is set up at
/search
for testing. The Full-Text- form field searches all indexed fields. You can further filter the full-text - search result using the field-specific search filters. If the Full-Text filter - is not specified, the field-specific filters apply to all indexed - Review Requests. - To-do:
~ Support boolean search (AND, OR, NOT). ~ Figure out how to support setting the path for the search index file through the admin page. - Testing Done:
-
Tested searching using the following:
~ id (single-digit IDs do not work), summary, description, testing done, bugs, author, submitter, diff filenames. ~ id, summary, description, testing done, bugs, author, submitter, diff filenames. - Diff:
-
Revision 3 (+92 -274)
- Change Summary:
-
Added support for boolean and field searches using Whoosh's default query parser.
Updated documentation. - Testing Done:
-
+ Tested enabling and disabling search in General Settings and changing index file path,
+ followed by rebuilding the index. + Tested searching using the following:
id, summary, description, testing done, bugs, author, submitter, diff filenames. + + Not yet tested: building and updating the index through rb-site.
- Diff:
-
Revision 4 (+167 -384)
- Added Files:
-
clean.png: This is what the page looks like on the initial visit.[old] - This is what the page looks like on the initial visit.search_results.png: This shows some search results from a full-text search.[old] - This shows some search results from a full-text search.
- Testing Done:
-
Tested enabling and disabling search in General Settings and changing index file path,
followed by rebuilding the index. Tested searching using the following:
id, summary, description, testing done, bugs, author, submitter, diff filenames. + Built and inspected Sphinx docs.
+ Not yet tested: building and updating the index through rb-site.
-
-
This is a little bit confusing to me, because when I see "depends" I think "dependency".
How about "An up-to-date index is required to provide useful search results. See :ref:
search-indexing
for more information" -
-
I'm not sure where the best place to put it is (here, in the view, in SearchQuerySet, etc), but at some point we should definitely filter based on
result.is_accessible_by(request.user)
- Description:
-
Prototype indexed search for Review Requests using Haystack and Whoosh.
Work in progress:
To test this patch, you need django-haystack and whoosh installed. Run ./reviewboard/manage.py rebuild_index
to build the search index for thefirst time. Run ./reviewboard/manage.py update_index
to update the search index.These Review Request fields are indexed : id, summary, description, commit_id,
testing done, bugs, author (submitter's full name), submitter (username), filenames of the files in the diffs (diff content is not indexed). To-do:
~ Figure out how to support setting the path for the search index file through the admin page. ~ Paginate search results.
- Change Summary:
-
Added pagination to search results and fixed some small issues.
Search results are now filtered by user accessibility. - Summary:
-
Prototype indexed search for Review Requests using Haystack and Whoosh.Indexed search for Review Requests with Haystack and Whoosh.
- Description:
-
~ Prototype indexed search for Review Requests using Haystack and Whoosh.
~ Indexed search for Review Requests with Haystack and Whoosh.
~ Work in progress:
~ To test this patch, you need django-haystack and whoosh installed.
- To test this patch, you need django-haystack and whoosh installed. Run ./reviewboard/manage.py rebuild_index
to build the search index for thefirst time. Run ./reviewboard/manage.py update_index
to update the search index.These Review Request fields are indexed : id, summary, description, commit_id,
testing done, bugs, author (submitter's full name), submitter (username), filenames of the files in the diffs (diff content is not indexed). - - To-do:
- Paginate search results. - Testing Done:
-
~ Tested enabling and disabling search in General Settings and changing index file path,
~ Enabled and disabled search in General Settings and changing index file path and results per page,
followed by rebuilding the index. ~ Tested searching using the following:
~ Search using the following fields:
id, summary, description, testing done, bugs, author, submitter, diff filenames. Built and inspected Sphinx docs.
Not yet tested: building and updating the index through rb-site.
- Diff:
-
Revision 5 (+207 -403)
- Added Files:
- Change Summary:
-
Load site config in ManageCommand.
- Testing Done:
-
Enabled and disabled search in General Settings and changing index file path and results per page,
followed by rebuilding the index. Search using the following fields:
id, summary, description, testing done, bugs, author, submitter, diff filenames. Built and inspected Sphinx docs.
~ Not yet tested: building and updating the index through rb-site.
~ Built and updated the index through rb-site.
- Diff:
-
Revision 6 (+209 -403)
- Change Summary:
-
Fixed a Haystack settings issue causing unit tests to fail.
- Testing Done:
-
Enabled and disabled search in General Settings and changing index file path and results per page,
followed by rebuilding the index. Search using the following fields:
id, summary, description, testing done, bugs, author, submitter, diff filenames. Built and inspected Sphinx docs.
Built and updated the index through rb-site.
+ + Unit tests passed.
- Diff:
-
Revision 7 (+208 -403)
-
-
-
-
This should probably render a template with the error message instead of giving a (mostly) blank page. This text should also me marked for localization.
-
-
Including this in here makes haystack a hard dependency. We should probably include it in setup.py and change the docs/UI to only talk about installing whoosh.
-
This was poorly localized to begin with, but it's worse now. How about using blocktrans instead?
{% blocktrans %} <b>{{hits}}</b> results for <b>{{query}}</b> {% endblocktrans %}
-
It's probably ok to have an empty div in the case that there are no page links. I don't think we need this check.
-
-
Oh, one more thing.
-
We should probably limit sqs here to some maximum number of results (maybe a couple hundred?). This could be a big performance hit if someone searches for "a" or "the" and it comes back with 500000 results, and we fetch them all from the db to make this list.
If we limit the list of results, we should include a small note on the results page telling the user that there are more potential hits but they need to make their query more specific.
- Change Summary:
-
Rebased on Master.
Fixed several open issues and revised some docstrings / comments. - Diff:
-
Revision 8 (+221 -404)
- Change Summary:
-
Updated the search result paginator to only show the Next/Previous text if there there are next/previous pages, respectively.
- Diff:
-
Revision 9 (+219 -404)
-
A handful of comments, but I also have another thought:
We have some setups out there that are calling
manage index
, and I don't want these to break. As it is, upgrading to this release will cause errors. I think, ideally, we'd continue to have a management command calledindex
that supported the old parameters, but called the new ones. We could mark this as deprecated for, say, 2.0, and then 2.1 or 3.0 or whatever could remove it, giving people time to migrate. -
Can we just depend on these as part of Review Board? I think, given that these are pure-Python, that it's worth just offering search out of the box.
-
-
If we make these dependencies, this whole function and all conditionals involving it can just go away.
-
-
Not sure I love having this here, but I understand why..
Can we rename this to
get_all_diff_filenames
? -
You can reduce the number of queries to 1 by doing:
for filediff in FileDiff.objects.filter(diffset_history=self.diffset_history_id):
In fact, you can probably simplify this whole thing even more by doing something like:
q = FileDiff.objects.filter(diffset_history=self.diffset_history_id) return set(q.values_list('source_file', 'dest_file', flat=True))
-
This is a start, but it's not enough.
We also need to limit this only to those review requests that are not private to private repositories or invite-only groups.
Fortunately, we have some nice query methods that take care of this for you. Just change this to:
return self.get_model().objects.public( status=None, extra_query=Q(status='P') | Q(status='S'))
That should do the job.
-
This template is old, but the indentation for template tags and HTML should be independent.
HTML should be indented relative to their parent tags by one space.
Template tags should be indented 1 relative to their parent template tag, but that indentation should occur within the tag body. So:
{% block ... %} {% if foo %} {% for blah %} {% endfor %} {% endif %} {% endblock %}
-
-
-
-
- Change Summary:
-
Fixed several open issues.
- Diff:
-
Revision 10 (+226 -412)
- Change Summary:
-
Fixed an import issue that broke the rb-site command.
Restored the original index command and modified for Haystack.
Added the --noinput option to the rebuild_index command in the cron sample file to turn off iteractive mode. - Diff:
-
Revision 12 (+224 -432)
- Change Summary:
-
Added a setting for maximum number of search results to return (default = 200).
- Diff:
-
Revision 13 (+264 -430)
- Change Summary:
-
Fixed bug for searching a review request by its ID.
- Diff:
-
Revision 14 (+264 -430)
- Change Summary:
-
Limit indexing to only those review requests that are not private to private repositories or invite-only groups.
- Summary:
-
Indexed search for Review Requests with Haystack and Whoosh.Replace Lucene search for Review Requests with Haystack and Whoosh.
- Description:
-
~ Indexed search for Review Requests with Haystack and Whoosh.
~ Replace Lucene search for Review Requests with Haystack and Whoosh.
~ To test this patch, you need django-haystack and whoosh installed.
~ Run ./reviewboard/manage.py rebuild_index
to build the search index for the~ first time. ~ Run ./reviewboard/manage.py update_index
to update the search index.~ Full-text search with Lucene is a hassle to set up. Haystack and Whoosh are
~ pure python so it's easy to include them with Review Board and have full-text ~ search available out of the box. Regular maintenance of the search index is ~ still required for the search function to return up-to-date results. These Review Request fields are indexed : id, summary, description, commit_id,
testing done, bugs, author (submitter's full name), submitter (username), filenames of the files in the diffs (diff content is not indexed). - Diff:
-
Revision 15 (+272 -431)
-
-
Does assigning these to self.results_per_page/max_search_results not work?
Might as well also move this below the search_enable check.
-
Can you prefix this comment with "XXX:" to make it searchable as something we should fix in the future?
This comment should probably also move to be right above the
result_ids = set(...)
line. -
Hmm. Does SearchQuerySet order by relevance? This probably loses the ordering.
We might want to turn it into a normal list, and then call distinct() on the QuerySet to remove duplicates.
- Change Summary:
-
Reorganized comments on the Haystack/Whoosh bug.
Preserve SearchQuerySet result order when extracting unique IDs.
Made max_search_results an instance variable of ReviewRequestSearchView instead of a class variable. - Diff:
-
Revision 16 (+277 -432)
- Change Summary:
-
Fixed bug with duplicate results after updating the search index.
- Diff:
-
Revision 17 (+252 -432)