Add a method to check if a mimetype is supported and improve guess_mimetype.

Review Request #14310 — Created Jan. 27, 2025 and updated

Information

Review Board
release-7.x

Reviewers

This adds a method for checking whether a mimetype is supported or not.
This is useful for code that deals with uploading binary files, to help
decide which files should be uploaded. We have similar logic for this in
RBTools. Valid and invalid mimetypes are saved to the cache and are
checked against when available, which should help speed things up when
the caller is checking a sequence of mimetypes.

This change also fixes a fault with guess_mimetype. The function runs
the file command in a subprocess to determine the given file's mimetype.
The function doesn't properly close and clean up the subprocess's resources
(the file descriptors for stdin and stdout). In the Review Board server
process which runs for a long time, we're relying on Python's garbage
collector to close everything. The garbage collector is not deterministic
and it could take a while before the resources are cleaned up. It's better
to deal with these ourselves to avoid any unexpected side effects. This
change makes sure the resources get cleaned up when they're no longer used.

  • Ran unit tests.
  • Used in the Bitbucket Server Pull Request integration code in Power
    Pack.
  • Saw the ResourceWarning: unclosed file coming from guess_mimetype
    go away in the logs.
Summary ID
Add a method to check if a mimetype is supported and improve guess_mimetype.
This adds a method for checking whether a mimetype is supported or not. This is useful for code that deals with uploading binary files, to help decide which files should be uploaded. We have similar logic for this in RBTools. Valid and invalid mimetypes are saved to the cache and are checked against when available, which should help speed things up when the caller is checking a sequence of mimetypes. This change also fixes a fault with `guess_mimetype`. The function runs the `file` command in a subprocess to determine the given file's mimetype. The function doesn't properly close and clean up the subprocess's resources (the file descriptors for stdin and stdout). In the Review Board server process which runs for a long time, we're relying on Python's garbage collector to close everything. The garbage collector is not deterministic and it could take a while before the resources are cleaned up. It's better to deal with these ourselves to avoid any unexpected side effects. This change makes sure the resources get cleaned up when they're no longer used.
1e49515439845acb8737028c2d9d4f7cc517133b
Description From Last Updated

Maybe indent this as: ... stdin=subprocess.PIPE, ) as p: assert p.stdin is not None ... to make the block clearer?

daviddavid

This blank line should go away now.

daviddavid

I feel like this might be simplified if we did individual cache entries with a key like review-uis:supported-mimetype:{mimetype} and then …

daviddavid

It might be nice to have some in-memory caching (like a global) for this, so we don't have to re-parse …

daviddavid

Blank line after this.

daviddavid
There are no open issues
david
  1. 
      
  2. reviewboard/attachments/mimetypes.py (Diff revision 1)
     
     
    Show all issues

    Maybe indent this as:

        ...
        stdin=subprocess.PIPE,
    ) as p:
        assert p.stdin is not None
        ...
    

    to make the block clearer?

  3. reviewboard/attachments/mimetypes.py (Diff revision 1)
     
     
    Show all issues

    This blank line should go away now.

  4. reviewboard/attachments/mimetypes.py (Diff revision 1)
     
     
     
     
     
     
    Show all issues

    I feel like this might be simplified if we did individual cache entries with a key like review-uis:supported-mimetype:{mimetype} and then just stored a True or False.

  5. reviewboard/attachments/mimetypes.py (Diff revision 1)
     
     
     
     
     
     
    Show all issues

    It might be nice to have some in-memory caching (like a global) for this, so we don't have to re-parse the mime types if this gets called repeatedly.

  6. Show all issues

    Blank line after this.

  7. 
      
maubin
Review request changed
Commits:
Summary ID
Add a method to check if a mimetype is supported and improve guess_mimetype.
This adds a method for checking whether a mimetype is supported or not. This is useful for code that deals with uploading binary files, to help decide which files should be uploaded. We have similar logic for this in RBTools. Valid and invalid mimetypes are saved to the cache and are checked against when available, which should help speed things up when the caller is checking a sequence of mimetypes. This change also fixes a fault with `guess_mimetype`. The function runs the `file` command in a subprocess to determine the given file's mimetype. The function doesn't properly close and clean up the subprocess's resources (the file descriptors for stdin and stdout). In the Review Board server process which runs for a long time, we're relying on Python's garbage collector to close everything. The garbage collector is not deterministic and it could take a while before the resources are cleaned up. It's better to deal with these ourselves to avoid any unexpected side effects. This change makes sure the resources get cleaned up when they're no longer used.
28b5db79689bfe033eb31a4be91b56ae6c771903
Add a method to check if a mimetype is supported and improve guess_mimetype.
This adds a method for checking whether a mimetype is supported or not. This is useful for code that deals with uploading binary files, to help decide which files should be uploaded. We have similar logic for this in RBTools. Valid and invalid mimetypes are saved to the cache and are checked against when available, which should help speed things up when the caller is checking a sequence of mimetypes. This change also fixes a fault with `guess_mimetype`. The function runs the `file` command in a subprocess to determine the given file's mimetype. The function doesn't properly close and clean up the subprocess's resources (the file descriptors for stdin and stdout). In the Review Board server process which runs for a long time, we're relying on Python's garbage collector to close everything. The garbage collector is not deterministic and it could take a while before the resources are cleaned up. It's better to deal with these ourselves to avoid any unexpected side effects. This change makes sure the resources get cleaned up when they're no longer used.
1e49515439845acb8737028c2d9d4f7cc517133b
Diff:

Revision 2 (+286 -62)

Show changes

reviewboard/attachments/mimetypes.py
reviewboard/attachments/tests/test_mimetypes.py

Checks run (2 succeeded)

flake8 passed.
JSHint passed.
david
  1. Ship It!
  2. 
      
Loading...