Add a method to check if a mimetype is supported and improve guess_mimetype.

Review Request #14310 — Created Jan. 27, 2025 and submitted

Information

Review Board
release-7.1.x

Reviewers

This adds a method for checking whether a mimetype is supported or not.
This is useful for code that deals with uploading binary files, to help
decide which files should be uploaded. We have similar logic for this in
RBTools. Valid and invalid mimetypes are saved to the cache and are
checked against when available, which should help speed things up when
the caller is checking a sequence of mimetypes.

This change also fixes a fault with guess_mimetype. The function runs
the file command in a subprocess to determine the given file's mimetype.
The function doesn't properly close and clean up the subprocess's resources
(the file descriptors for stdin and stdout). In the Review Board server
process which runs for a long time, we're relying on Python's garbage
collector to close everything. The garbage collector is not deterministic
and it could take a while before the resources are cleaned up. It's better
to deal with these ourselves to avoid any unexpected side effects. This
change makes sure the resources get cleaned up when they're no longer used.

  • Ran unit tests.
  • Used in the Bitbucket Server Pull Request integration code in Power
    Pack.
  • Saw the ResourceWarning: unclosed file coming from guess_mimetype
    go away in the logs.
Summary ID
Add a method to check if a mimetype is supported and improve guess_mimetype.
This adds a method for checking whether a mimetype is supported or not. This is useful for code that deals with uploading binary files, to help decide which files should be uploaded. We have similar logic for this in RBTools. Valid and invalid mimetypes are saved to the cache and are checked against when available, which should help speed things up when the caller is checking a sequence of mimetypes. This change also fixes a fault with `guess_mimetype`. The function runs the `file` command in a subprocess to determine the given file's mimetype. The function doesn't properly close and clean up the subprocess's resources (the file descriptors for stdin and stdout). In the Review Board server process which runs for a long time, we're relying on Python's garbage collector to close everything. The garbage collector is not deterministic and it could take a while before the resources are cleaned up. It's better to deal with these ourselves to avoid any unexpected side effects. This change makes sure the resources get cleaned up when they're no longer used.
dd965637ff55487c8ee49a120c19ef6aa04185ab
Description From Last Updated

Maybe indent this as: ... stdin=subprocess.PIPE, ) as p: assert p.stdin is not None ... to make the block clearer?

daviddavid

This blank line should go away now.

daviddavid

I feel like this might be simplified if we did individual cache entries with a key like review-uis:supported-mimetype:{mimetype} and then …

daviddavid

It might be nice to have some in-memory caching (like a global) for this, so we don't have to re-parse …

daviddavid

Blank line after this.

daviddavid

I feel like this file would be better served using cache_memoize(), since that's designed for this kind of use case.

chipx86chipx86
david
  1. 
      
  2. reviewboard/attachments/mimetypes.py (Diff revision 1)
     
     
    Show all issues

    Maybe indent this as:

        ...
        stdin=subprocess.PIPE,
    ) as p:
        assert p.stdin is not None
        ...
    

    to make the block clearer?

  3. reviewboard/attachments/mimetypes.py (Diff revision 1)
     
     
    Show all issues

    This blank line should go away now.

  4. reviewboard/attachments/mimetypes.py (Diff revision 1)
     
     
     
     
     
     
    Show all issues

    I feel like this might be simplified if we did individual cache entries with a key like review-uis:supported-mimetype:{mimetype} and then just stored a True or False.

  5. reviewboard/attachments/mimetypes.py (Diff revision 1)
     
     
     
     
     
     
    Show all issues

    It might be nice to have some in-memory caching (like a global) for this, so we don't have to re-parse the mime types if this gets called repeatedly.

  6. Show all issues

    Blank line after this.

  7. 
      
maubin
david
  1. Ship It!
  2. 
      
chipx86
  1. 
      
  2. reviewboard/attachments/mimetypes.py (Diff revision 2)
     
     
    Show all issues

    I feel like this file would be better served using cache_memoize(), since that's designed for this kind of use case.

  3. 
      
maubin
david
  1. Ship It!
  2. 
      
maubin
Review request changed
Status:
Completed
Change Summary:
Pushed to release-7.1.x (5bcb517)