• 
      

    Add a safer method of constructing cache keys.

    Review Request #14477 — Created June 25, 2025 and submitted

    Information

    Djblets
    release-5.x

    Reviewers

    Cache keys sometimes need to involve input that comes from another
    source, such as a value in a database. While we're pretty careful about
    how we compose these keys, there's always a risk of a cache key
    poisoning vulnerability.

    That sort of bug could manifest as a key structured like:

    path:to:<value>[:optional_param]
    

    Where value could potentially be my_value:forced_param.

    To avoid these sort of issues, we'd need to escape the values going into
    the key. The safest way of doing that consistently is leaving it up to a
    central method to handle.

    All of Djblets's caching methods now accept a cache key as a sequence of
    strings. This includes cache_memoize, cache_memoize_iter, and
    make_cache_key.

    Each key in the sequence will be escaped, first converting any %
    characters to %25 (escape code for %) and then any : characters to
    %3A.

    This is opt-in. Callers passing a plain string won't see any difference
    in behavior. Those keys will be used as-is.

    It does means that keys constructed as part of another key that's either
    been normalized or contains escapable characters may get an unexpected
    key in return.

    Going forward, we'll want to make use of these sequences rather than
    hard-coding strings, in order to ensure safety of all keys.

    Unit tests pass.

    Made use of this in some in-progress code.

    Summary ID
    Add a safer method of constructing cache keys.
    Cache keys sometimes need to involve input that comes from another source, such as a value in a database. While we're pretty careful about how we compose these keys, there's always a risk of a cache key poisoning vulnerability. That sort of bug could manifest as a key structured like: ``` path:to:<value>[:optional_param] ``` Where `value` could potentially be `my_value:forced_param`. To avoid these sort of issues, we'd need to escape the values going into the key. The safest way of doing that consistently is leaving it up to a central method to handle. All of Djblets's caching methods now accept a cache key as a sequence of strings. This includes `cache_memoize`, `cache_memoize_iter`, and `make_cache_key`. Each key in the sequence will be escaped, first converting any `%` characters to `%25` (escape code for `%`) and then any `:` characters to `%3A`. This is opt-in. Callers passing a plain string won't see any difference in behavior. Those keys will be used as-is. It does means that keys constructed as part of another key that's either been normalized or contains escapable characters may get an unexpected key in return. Going forward, we'll want to make use of these sequences rather than hard-coding strings, in order to ensure safety of all keys.
    48d7ea0d2d65b79b96fc86598a7e2a1dad9f2e70
    Description From Last Updated

    Need to update the type here too.

    maubinmaubin

    We have a mix of tuples and lists here. While it all works out because of the * unpacking at …

    daviddavid
    maubin
    1. 
        
    2. djblets/cache/backend.py (Diff revision 1)
       
       
      Show all issues

      Need to update the type here too.

    3. 
        
    david
    1. 
        
    2. djblets/cache/backend.py (Diff revision 1)
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
      Show all issues

      We have a mix of tuples and lists here. While it all works out because of the * unpacking at the end, I think it would be nicer to be consistent and just use lists.

      1. I wanted to dig in and make sure the numbers matched my understanding of this. We're in a hot path (this code is accessed a lot, 115 times in just one Dashboard request), and I knew that I was introducing a small performance penalty in this change by dealing with escaping and joinin, so my goal was to keep performance as close to the original as possible with minimal loss.

        Tuples are faster and smaller than lists most of the time, and typically measure as such.

        There's a disclaimer by that: List comprehensions are faster than doing a tuple(x for x in y), because that sets up a generator and then constructs a tuple around it, and there's an expense to this (I wish there was native tuple syntax for this). tuple([x for x in y]) ends up being faster, but then you might as well use a list for that case.

        But if you're just building up a static tuple, or accessing a tuple, it's faster. At least in theory.

        Decided to measure it to be sure.

        Measurements

        These tests were run on my 2021 M1 Max. I ran each timeit once, threw out the first result (bytecode compilation), and ran it 4 more times, getting the average.

        Tests were run with:

        # String keys
        timeit.timeit('make_cache_key("a:b:c")', setup='from djblets.cache.backend import make_cache_key')
        
        # List keys
        timeit.timeit('make_cache_key(["a", "b", "c"])', setup='from djblets.cache.backend import make_cache_key')
        
        # Tuple keys
        timeit.timeit('make_cache_key(("a", "b", "c"))', setup='from djblets.cache.backend import make_cache_key')
        

        Here's what I got:

        Original code

        Keys Time (seconds)
        String 3.333815235999888

        Tuple-wrapped keys

        Keys Time (seconds)
        String 3.4278556352501255
        List 3.6817527916664403
        Tuple 3.659096694666611

        List-wrapped keys

        Keys Time (seconds)
        String 3.4579977917501306
        List 4.244010402999872
        Tuple 4.055572395750005

        Results

        Overall, tuple-wrapped keys were faster than list-wrapped keys, and passing in tuples was faster than passing in lists.

        Tuple-wrapped keys were also much closer in performance to the original code. The list-based results were as expensive as ~1 second slower across tests.

        That's for 10,000 requests, which is about ~86 concurrent Dashboard views at 115 cache requests per view. It's not seconds of performance gain, but it's 0.1-0.2s of loss for tuples vs. 0.1-0.9 for lists, at least in the tests I ran.

        (I re-ran the battery of tests a few times, but got roughly the same results. There's a margin for error, but the results told the same story.)

        I also tried alternative approaches, like building norm_key as a string and then using f-strings or % to build the final result. It was slower than the current method with list-wrapped keys. Interestingly, in order of speed, %-based formatting is the fastest (0.14s across 10,000 tests), followed by f-strings (0.17s), followed by .format (0.22s).

        Anyway, given that, I'm confident tuples are the right choice for this code.

    3. 
        
    chipx86
    david
    1. Ship It!
    2. 
        
    chipx86
    Review request changed
    Status:
    Completed
    Change Summary:
    Pushed to release-5.x (f962c15)