Keep intermediate files when building static media.

Review Request #12407 — Created June 24, 2022 and submitted

Information

Djblets
release-3.x

Reviewers

When Django builds static media files, it's common to go through a
hashing process, which generates a final resulting file in the name of
<filename>.<md5hash>.<ext>. This is built by:

  1. Opening the original (potentially compiled) filename (e.g.,
    common.min.css).

  2. Generating an intermediate MD5 hash on the contents of that file
    (e.g. b026324c6904b2a9cb4b88d6d61c81d1).

  3. Processing the file to transform any URL references to include hashes
    and formalize basic syntax (e.g., url(image.png) ->
    url("image.abcd12345.png")).

  4. Generating another MD5 hash based on the resulting file
    (e.g., 26ab0db90d72e28ad0ba1e22ee510510)

  5. Writing out a filename comprising the original filename and 12
    characters of the resulting MD5 hash (e.g.,
    common.min.26ab0db90d72.png)

Later, when Django goes to look up a referenced file (e.g.,
common.min.css), it opens the written file, generates a hash, and then
tries to find the resulting file.

This goes poorly.

The file being referenced will resolve to the first MD5 hash (attempting
a lookup of common.min.b026324c6904.css), which isn't being found,
rather than generating a hash for the final file.

Part of the reason it does this is that it thinks it can just look up
the final file through a staticfiles.json, which is normally written
to the root. We don't actually package this, but even if we did, this
would do no good for extensions.

This is not a new problem, actually. Previous versions of Django would
write out "intermediary" files. These are copies of the final file but
with the intermediate MD5 hash. The newer ManfiestStorage classes we
use turn this off, though.

This change turns that back on.

This is not a great long-term solution. We are losing out on caching,
meaning that every lookup has to re-parse the files. We are not
benefiting from staticfiles.json, which would be the better option
going forward. We are back to storing two copies of every static media
file, which is large.

We will need to address this through new storage classes that allow for
re-generating staticfiles.json files on demand, in a writable area.

For now, this gets us back to a working state where we can ship and look
up static media files.

Successfully built packages with the intermediate files, and verified
that web servers could look them up.

Summary ID
Keep intermediate files when building static media.
When Django builds static media files, it's common to go through a hashing process, which generates a final resulting file in the name of `<filename>.<md5hash>.<ext>`. This is built by: 1. Opening the original (potentially compiled) filename (e.g., `common.min.css`). 2. Generating an intermediate MD5 hash on the contents of that file (e.g. `b026324c6904b2a9cb4b88d6d61c81d1`). 3. Processing the file to transform any URL references to include hashes and formalize basic syntax (e.g., `url(image.png)` -> `url("image.abcd12345.png")`). 4. Generating another MD5 hash based on the resulting file (e.g., `26ab0db90d72e28ad0ba1e22ee510510`) 5. Writing out a filename comprising the original filename and 12 characters of the resulting MD5 hash (e.g., `common.min.26ab0db90d72.png`) Later, when Django goes to look up a referenced file (e.g., `common.min.css`), it opens the written file, generates a hash, and then tries to find the resulting file. This goes poorly. The file being referenced will resolve to the first MD5 hash (attempting a lookup of `common.min.b026324c6904.css`), which isn't being found, rather than generating a hash for the final file. Part of the reason it does this is that it thinks it can just look up the final file through a `staticfiles.json`, which is normally written to the root. We don't actually package this, but even if we did, this would do no good for extensions. This is not a new problem, actually. Previous versions of Django would write out "intermediary" files. These are copies of the final file but with the intermediate MD5 hash. The newer `ManfiestStorage` classes we use turn this off, though. This change turns that back on. This is not a great long-term solution. We are losing out on caching, meaning that every lookup has to re-parse the files. We are not benefiting from `staticfiles.json`, which would be the better option going forward. We are back to storing two copies of every static media file, which is large. We will need to address this through new storage classes that allow for re-generating `staticfiles.json` files on demand, in a writable area. For now, this gets us back to a working state where we can ship and look up static media files.
8f339291fb33bf5d674ec6864c59290586cf6916
david
  1. Ship It!
  2. 
      
chipx86
Review request changed
Status:
Completed
Change Summary:
Pushed to release-3.x (04dee8b)