• 
      

    Add safer cache key normalization, switch to SHA256, and fix return type.

    Review Request #12376 — Created June 15, 2022 and submitted — Latest diff uploaded

    Information

    Djblets
    release-3.x

    Reviewers

    make_cache_key() has been with us for a long time, and though it
    generally worked as expected, there's been room for improvement.

    This change starts by fixing up the normalization code for invalid
    characters. Rather than stripping a few select characters (space, tab,
    newlines), which wasn't sufficient, we now encode the hex values of any
    invalid characters. According to Django's own modern validation code,
    these are any characters in the ASCII range of 0-20, or ASCII 127 (DEL).
    These become literal '\x00' through '\x20' and '\x7f'. The new
    code uses regexes to do this, which should also be a bit faster.

    The hash for the key now uses SHA256 instead of MD5, which is less
    likely to generate collisions. This will of course invalidate any
    existing keys, but that won't matter because of the final point.

    We now return Unicode strings instead of byte strings. Turns out we made
    a mistake when porting this to Python 3. We began returning byte
    strings when we should have returned native strings. Django expects
    Unicode strings in the case of Python 3. Behind the scenes, what ended
    up happening was that a key like 'abc123' was turning into
    b'abc123' in the cache. This could in theory result in keys that were
    too big for the cache, when dealing with large cache keys + large data
    split in cache_memoize_iter() (but in practice this hasn't happened).

    Comprehensive unit tests have been added for make_cache_key().

    Unit tests pass.

    Commits

    Files