Add safer cache key normalization, switch to SHA256, and fix return type.
Review Request #12376 — Created June 15, 2022 and submitted
make_cache_key()
has been with us for a long time, and though it
generally worked as expected, there's been room for improvement.This change starts by fixing up the normalization code for invalid
characters. Rather than stripping a few select characters (space, tab,
newlines), which wasn't sufficient, we now encode the hex values of any
invalid characters. According to Django's own modern validation code,
these are any characters in the ASCII range of 0-20, or ASCII 127 (DEL).
These become literal'\x00'
through'\x20'
and'\x7f'
. The new
code uses regexes to do this, which should also be a bit faster.The hash for the key now uses SHA256 instead of MD5, which is less
likely to generate collisions. This will of course invalidate any
existing keys, but that won't matter because of the final point.We now return Unicode strings instead of byte strings. Turns out we made
a mistake when porting this to Python 3. We began returning byte
strings when we should have returned native strings. Django expects
Unicode strings in the case of Python 3. Behind the scenes, what ended
up happening was that a key like'abc123'
was turning into
b'abc123'
in the cache. This could in theory result in keys that were
too big for the cache, when dealing with large cache keys + large data
split incache_memoize_iter()
(but in practice this hasn't happened).Comprehensive unit tests have been added for
make_cache_key()
.
Unit tests pass.
Summary | ID |
---|---|
8fd4c1b28702b8aee9df7fc517fce293c7d97e88 |
- Change Summary:
-
Improved the regex, and added an unstaged fix for a unit test.
- Commits:
-
Summary ID cfadd6cd3bb2049fdef33ce7c19fba59d0eea79f d3bb9f3ed7e6e532c2022eb9eeaddf2fba300388