Better control search indexing of datagrids.
Review Request #14362 — Created March 5, 2025 and submitted
Some search index bots have a tendency to get stuck on datagrid pages.
They navigate through pages, click sort/unsort and column edit links,
and end up generating mass amounts of URLs to index.This change works to control this through sane defaults and full opt-out
of search indexing.The sort and column choice links now use
role="button"
and
rel="nofollow noindex"
. Therole="button"
should prevent indexing
by itself, but we use both to cover bases.Pagination buttons now use the same
nofollow noindex
for the "Last"
link, to prevent jumping to the end of the list where queries are most
expensive.Other pagination buttons set
rel
tofirst
for the first page,last
for the last page,next
for the next page,and prev
for the previous
page. Not all engines care about this (Google ignores them), but some
do, and can help search engines with their indexing choices.Datagrid pages define a canonical URL that excludes
columns
,sort
,
and others, helping to de-index previously-indexed pages with these
options set, and reducing the indexing queue.Standard query string arguments for sorting/unsorting now use sort order
instead of dictionary order, keeping URLs stable.Search indexing can also be disabled entirely for a datagrid by setting
allow_search_indexing = False
. In this case, all pagination links will
userel="nofollow noindex"
, and the datagid pages will set the same in
a<meta>
tag. This is particularly useful when there are multiple
datagrids that ultimately cover subsets of a larger list of items.In the process, a small optimization was made to the code determining if
there should be a First Page or Last Page link shown. We were searching
the entire list of pages for page 1 or page <length>, which was
unnecessary. We now just check the first and last page numbers in the
display list, respectively.
Unit tests pass.
Tested this with Review Board's datagrids, checking the resulting
HTML with search indexing on and off.