Add safer cache key normalization, switch to SHA256, and fix return type.

Review Request #12376 — Created June 15, 2022 and submitted — Latest diff uploaded

Information

Djblets
release-3.x

Reviewers

make_cache_key() has been with us for a long time, and though it
generally worked as expected, there's been room for improvement.

This change starts by fixing up the normalization code for invalid
characters. Rather than stripping a few select characters (space, tab,
newlines), which wasn't sufficient, we now encode the hex values of any
invalid characters. According to Django's own modern validation code,
these are any characters in the ASCII range of 0-20, or ASCII 127 (DEL).
These become literal '\x00' through '\x20' and '\x7f'. The new
code uses regexes to do this, which should also be a bit faster.

The hash for the key now uses SHA256 instead of MD5, which is less
likely to generate collisions. This will of course invalidate any
existing keys, but that won't matter because of the final point.

We now return Unicode strings instead of byte strings. Turns out we made
a mistake when porting this to Python 3. We began returning byte
strings when we should have returned native strings. Django expects
Unicode strings in the case of Python 3. Behind the scenes, what ended
up happening was that a key like 'abc123' was turning into
b'abc123' in the cache. This could in theory result in keys that were
too big for the cache, when dealing with large cache keys + large data
split in cache_memoize_iter() (but in practice this hasn't happened).

Comprehensive unit tests have been added for make_cache_key().

Unit tests pass.

Commits

Files