Better control search indexing of datagrids.

Review Request #14362 — Created March 5, 2025 and submitted

Information

Djblets
release-5.x

Reviewers

Some search index bots have a tendency to get stuck on datagrid pages.
They navigate through pages, click sort/unsort and column edit links,
and end up generating mass amounts of URLs to index.

This change works to control this through sane defaults and full opt-out
of search indexing.

The sort and column choice links now use role="button" and
rel="nofollow noindex". The role="button" should prevent indexing
by itself, but we use both to cover bases.

Pagination buttons now use the same nofollow noindex for the "Last"
link, to prevent jumping to the end of the list where queries are most
expensive.

Other pagination buttons set rel to first for the first page, last
for the last page, next for the next page,and prev for the previous
page. Not all engines care about this (Google ignores them), but some
do, and can help search engines with their indexing choices.

Datagrid pages define a canonical URL that excludes columns, sort,
and others, helping to de-index previously-indexed pages with these
options set, and reducing the indexing queue.

Standard query string arguments for sorting/unsorting now use sort order
instead of dictionary order, keeping URLs stable.

Search indexing can also be disabled entirely for a datagrid by setting
allow_search_indexing = False. In this case, all pagination links will
use rel="nofollow noindex", and the datagid pages will set the same in
a <meta> tag. This is particularly useful when there are multiple
datagrids that ultimately cover subsets of a larger list of items.

In the process, a small optimization was made to the code determining if
there should be a First Page or Last Page link shown. We were searching
the entire list of pages for page 1 or page <length>, which was
unnecessary. We now just check the first and last page numbers in the
display list, respectively.

Unit tests pass.

Tested this with Review Board's datagrids, checking the resulting
HTML with search indexing on and off.

Summary ID
Better control search indexing of datagrids.
Some search index bots have a tendency to get stuck on datagrid pages. They navigate through pages, click sort/unsort and column edit links, and end up generating mass amounts of URLs to index. This change works to control this through sane defaults and full opt-out of search indexing. The sort and column choice links now use `role="button"` and `rel="nofollow noindex"`. The `role="button"` should prevent indexing by itself, but we use both to cover bases. Pagination buttons now use the same `nofollow noindex` for the "Last" link, to prevent jumping to the end of the list where queries are most expensive. Other pagination buttons set `rel` to `first` for the first page, `last` for the last page, `next` for the next page,` and prev` for the previous page. Not all engines care about this (Google ignores them), but some do, and can help search engines with their indexing choices. Datagrid pages define a canonical URL that excludes `columns`, `sort`, and others, helping to de-index previously-indexed pages with these options set, and reducing the indexing queue. Standard query string arguments for sorting/unsorting now use sort order instead of dictionary order, keeping URLs stable. Search indexing can also be disabled entirely for a datagrid by setting `allow_search_indexing = False`. In this case, all pagination links will use `rel="nofollow noindex"`, and the datagid pages will set the same in a `<meta>` tag. This is particularly useful when there are multiple datagrids that ultimately cover subsets of a larger list of items. In the process, a small optimization was made to the code determining if there should be a First Page or Last Page link shown. We were searching the entire list of pages for page 1 or page <length>, which was unnecessary. We now just check the first and last page numbers in the display list, respectively.
6bc288b34ff4fa8be5e30107d58d16e4e4bd55d0
Description From Last Updated

Seems like a set would be better here.

daviddavid

Blank lines between bullet points is weird.

daviddavid
There are no open issues
david
  1. 
      
  2. djblets/datagrid/grids.py (Diff revision 1)
     
     
    Show all issues

    Seems like a set would be better here.

    1. Where would a set come in? We're operating off of a QueryDict, which gives us .pop() for deleting. The tuple is just items to delete, so a set doesn't buy us anything.

  3. djblets/datagrid/grids.py (Diff revision 1)
     
     
     
     
     
     
     
     
    Show all issues

    Blank lines between bullet points is weird.

    1. I find it so much easier to read that way, especially as they become multi-paragraph.

  4. 
      
david
  1. Ship It!
  2. 
      
maubin
  1. Ship It!
  2. 
      
chipx86
Review request changed
Status:
Completed
Change Summary:
Pushed to release-5.x (26772ac)
Loading...