Add a safer method of constructing cache keys.

Review Request #14477 — Created June 25, 2025 and updated

Information

Djblets
release-5.x

Reviewers

Cache keys sometimes need to involve input that comes from another
source, such as a value in a database. While we're pretty careful about
how we compose these keys, there's always a risk of a cache key
poisoning vulnerability.

That sort of bug could manifest as a key structured like:

path:to:<value>[:optional_param]

Where value could potentially be my_value:forced_param.

To avoid these sort of issues, we'd need to escape the values going into
the key. The safest way of doing that consistently is leaving it up to a
central method to handle.

All of Djblets's caching methods now accept a cache key as a sequence of
strings. This includes cache_memoize, cache_memoize_iter, and
make_cache_key.

Each key in the sequence will be escaped, first converting any %
characters to %25 (escape code for %) and then any : characters to
%3A.

This is opt-in. Callers passing a plain string won't see any difference
in behavior. Those keys will be used as-is.

It does means that keys constructed as part of another key that's either
been normalized or contains escapable characters may get an unexpected
key in return.

Going forward, we'll want to make use of these sequences rather than
hard-coding strings, in order to ensure safety of all keys.

Unit tests pass.

Made use of this in some in-progress code.

Summary ID
Add a safer method of constructing cache keys.
Cache keys sometimes need to involve input that comes from another source, such as a value in a database. While we're pretty careful about how we compose these keys, there's always a risk of a cache key poisoning vulnerability. That sort of bug could manifest as a key structured like: ``` path:to:<value>[:optional_param] ``` Where `value` could potentially be `my_value:forced_param`. To avoid these sort of issues, we'd need to escape the values going into the key. The safest way of doing that consistently is leaving it up to a central method to handle. All of Djblets's caching methods now accept a cache key as a sequence of strings. This includes `cache_memoize`, `cache_memoize_iter`, and `make_cache_key`. Each key in the sequence will be escaped, first converting any `%` characters to `%25` (escape code for `%`) and then any `:` characters to `%3A`. This is opt-in. Callers passing a plain string won't see any difference in behavior. Those keys will be used as-is. It does means that keys constructed as part of another key that's either been normalized or contains escapable characters may get an unexpected key in return. Going forward, we'll want to make use of these sequences rather than hard-coding strings, in order to ensure safety of all keys.
48d7ea0d2d65b79b96fc86598a7e2a1dad9f2e70
Description From Last Updated

Need to update the type here too.

maubinmaubin

We have a mix of tuples and lists here. While it all works out because of the * unpacking at …

daviddavid
maubin
  1. 
      
  2. djblets/cache/backend.py (Diff revision 1)
     
     
    Show all issues

    Need to update the type here too.

  3. 
      
david
  1. 
      
  2. djblets/cache/backend.py (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    Show all issues

    We have a mix of tuples and lists here. While it all works out because of the * unpacking at the end, I think it would be nicer to be consistent and just use lists.

    1. I wanted to dig in and make sure the numbers matched my understanding of this. We're in a hot path (this code is accessed a lot, 115 times in just one Dashboard request), and I knew that I was introducing a small performance penalty in this change by dealing with escaping and joinin, so my goal was to keep performance as close to the original as possible with minimal loss.

      Tuples are faster and smaller than lists most of the time, and typically measure as such.

      There's a disclaimer by that: List comprehensions are faster than doing a tuple(x for x in y), because that sets up a generator and then constructs a tuple around it, and there's an expense to this (I wish there was native tuple syntax for this). tuple([x for x in y]) ends up being faster, but then you might as well use a list for that case.

      But if you're just building up a static tuple, or accessing a tuple, it's faster. At least in theory.

      Decided to measure it to be sure.

      Measurements

      These tests were run on my 2021 M1 Max. I ran each timeit once, threw out the first result (bytecode compilation), and ran it 4 more times, getting the average.

      Tests were run with:

      # String keys
      timeit.timeit('make_cache_key("a:b:c")', setup='from djblets.cache.backend import make_cache_key')
      
      # List keys
      timeit.timeit('make_cache_key(["a", "b", "c"])', setup='from djblets.cache.backend import make_cache_key')
      
      # Tuple keys
      timeit.timeit('make_cache_key(("a", "b", "c"))', setup='from djblets.cache.backend import make_cache_key')
      

      Here's what I got:

      Original code

      Keys Time (seconds)
      String 3.333815235999888

      Tuple-wrapped keys

      Keys Time (seconds)
      String 3.4278556352501255
      List 3.6817527916664403
      Tuple 3.659096694666611

      List-wrapped keys

      Keys Time (seconds)
      String 3.4579977917501306
      List 4.244010402999872
      Tuple 4.055572395750005

      Results

      Overall, tuple-wrapped keys were faster than list-wrapped keys, and passing in tuples was faster than passing in lists.

      Tuple-wrapped keys were also much closer in performance to the original code. The list-based results were as expensive as ~1 second slower across tests.

      That's for 10,000 requests, which is about ~86 concurrent Dashboard views at 115 cache requests per view. It's not seconds of performance gain, but it's 0.1-0.2s of loss for tuples vs. 0.1-0.9 for lists, at least in the tests I ran.

      (I re-ran the battery of tests a few times, but got roughly the same results. There's a margin for error, but the results told the same story.)

      I also tried alternative approaches, like building norm_key as a string and then using f-strings or % to build the final result. It was slower than the current method with list-wrapped keys. Interestingly, in order of speed, %-based formatting is the fastest (0.14s across 10,000 tests), followed by f-strings (0.17s), followed by .format (0.22s).

      Anyway, given that, I'm confident tuples are the right choice for this code.

  3. 
      
chipx86
Review request changed
Change Summary:
  • Fixed some missing updates to the docs.
  • Simplified the logic around joining of key parts when there's no Site (uncommon case).
Commits:
Summary ID
Add a safer method of constructing cache keys.
Cache keys sometimes need to involve input that comes from another source, such as a value in a database. While we're pretty careful about how we compose these keys, there's always a risk of a cache key poisoning vulnerability. That sort of bug could manifest as a key structured like: ``` path:to:<value>[:optional_param] ``` Where `value` could potentially be `my_value:forced_param`. To avoid these sort of issues, we'd need to escape the values going into the key. The safest way of doing that consistently is leaving it up to a central method to handle. All of Djblets's caching methods now accept a cache key as a sequence of strings. This includes `cache_memoize`, `cache_memoize_iter`, and `make_cache_key`. Each key in the sequence will be escaped, first converting any `%` characters to `%25` (escape code for `%`) and then any `:` characters to `%3A`. This is opt-in. Callers passing a plain string won't see any difference in behavior. Those keys will be used as-is. It does means that keys constructed as part of another key that's either been normalized or contains escapable characters may get an unexpected key in return. Going forward, we'll want to make use of these sequences rather than hard-coding strings, in order to ensure safety of all keys.
b26178c0e411d8a61b8bb81bb28f91b5c8a53c77
Add a safer method of constructing cache keys.
Cache keys sometimes need to involve input that comes from another source, such as a value in a database. While we're pretty careful about how we compose these keys, there's always a risk of a cache key poisoning vulnerability. That sort of bug could manifest as a key structured like: ``` path:to:<value>[:optional_param] ``` Where `value` could potentially be `my_value:forced_param`. To avoid these sort of issues, we'd need to escape the values going into the key. The safest way of doing that consistently is leaving it up to a central method to handle. All of Djblets's caching methods now accept a cache key as a sequence of strings. This includes `cache_memoize`, `cache_memoize_iter`, and `make_cache_key`. Each key in the sequence will be escaped, first converting any `%` characters to `%25` (escape code for `%`) and then any `:` characters to `%3A`. This is opt-in. Callers passing a plain string won't see any difference in behavior. Those keys will be used as-is. It does means that keys constructed as part of another key that's either been normalized or contains escapable characters may get an unexpected key in return. Going forward, we'll want to make use of these sequences rather than hard-coding strings, in order to ensure safety of all keys.
48d7ea0d2d65b79b96fc86598a7e2a1dad9f2e70

Checks run (2 succeeded)

flake8 passed.
JSHint passed.
david
  1. Ship It!
  2.