• 
      

    Add run_process(), a modern replacement for execute().

    Review Request #12564 — Created Aug. 23, 2022 and submitted

    Information

    RBTools
    release-4.x

    Reviewers

    Our execute() function, used for calling out into other processes and
    capturing results, has grown out of control over the years, providing 4
    sets of inputs that produce different outputs in different combinations.
    An attempt to add type annotations for these combinations quickly grew
    out of control, and along with causing numerous type-related regressions
    in the codebase over the years, it's finally become time to put
    execute() to rest.

    This change introduces the successor, run_process(). The interface is
    like a slimmed-down version of execute(), allowing basic options like
    the working directory and environment to be set, stderr to be redirected
    to stdout, expected output encoding, and control over error handling and
    logging.

    There is no control over typing. Instead, this always returns a
    RunProcessResult object, which stores a string version of the command
    that was run, the exit code, and byte streams for the standard output
    and error output.

    From there, a caller can access the raw bytes (via stdout_bytes.read()
    and stderr_bytes.read()), Unicode-decoded content (via stdout.read()
    and stderr.read()), or line-based versions (via .readlines() on
    any of the streams).

    Results are always byte strings. Decoding to Unicode strings happens
    on-the-fly via io.TextIOWrapper (which is what Python uses for
    sys.stdout and sys.stderr). These wrappers are created only on
    access.

    While these are all stream objects, they don't stream output from the
    program. They always contain the full results from the program. If we
    were to add streaming someday for some SCM, we could do this with a new
    argument to run_process() and new flags on RunProcessResult without
    a redesign.

    Like with execute(), all errors (non-0 exit codes) or selective errors
    can be ignored. Unlike execute(), the caller can still get results
    when an error is raised. The exception, RunProcessError, contains the
    results as an attribute (result), containing all the same information
    that would normally be returned. Given that, the support for ignoring
    errors is really just a convenience around catching an exception.

    This new object has full type safety, so type checkers can make sure
    that consumers are processing results correctly.

    run_process() uses run_process_exec() to perform the actual execution
    of a command. This is a wrapper around subprocess.run() that takes just
    a few flags and returns a tuple of (exit_code, stdout, stderr). Spies
    should connect to this instead of run_process(), as it's as low-level
    as a test would need to go, and the results from this are checked against
    the parameters passed to run_process().

    Logging has been improved to provide better output, distinguishing non-0
    exit codes representing failure from those with ignored errors (to help
    alleviate concerns when people look at debug logs).

    Old Python 2 compatibility code around subprocess have been removed,
    and new defaults (like close_fds) utilized.

    execute() has been updated to wrap run_process(). This function is
    now deprecated, though still widely used in the codebase. Upcoming work
    will transition over to run_process() instead.

    All unit tests pass on Python 3.7-3.11.

    Summary ID
    Add run_process(), a modern replacement for execute().
    Our `execute()` function, used for calling out into other processes and capturing results, has grown out of control over the years, providing 4 sets of inputs that produce different outputs in different combinations. An attempt to add type annotations for these combinations quickly grew out of control, and along with causing numerous type-related regressions in the codebase over the years, it's finally become time to put `execute()` to rest. This change introduces the successor, `run_process()`. The interface is like a slimmed-down version of `execute()`, allowing basic options like the working directory and environment to be set, stderr to be redirected to stdout, expected output encoding, and control over error handling and logging. There is no control over typing. Instead, this always returns a `RunProcessResult` object, which stores a string version of the command that was run, the exit code, and byte streams for the standard output and error output. From there, a caller can access the raw bytes (via `stdout_bytes.read()` and `stderr_bytes.read()`), Unicode-decoded content (via `stdout.read()` and `stderr.read()`), or line-based versions (via `.readlines()` on any of the streams). Results are *always* byte strings. Decoding to Unicode strings happens on-the-fly via `io.TextIOWrapper` (which is what Python uses for `sys.stdout` and `sys.stderr`). These wrappers are created only on access. While these are all stream objects, they don't stream output from the program. They always contain the full results from the program. If we were to add streaming someday for some SCM, we could do this with a new argument to `run_process()` and new flags on `RunProcessResult` without a redesign. Like with `execute()`, all errors (non-0 exit codes) or selective errors can be ignored. Unlike `execute()`, the caller can still get results when an error is raised. The exception, `RunProcessError`, contains the results as an attribute (`result`), containing all the same information that would normally be returned. Given that, the support for ignoring errors is really just a convenience around catching an exception. This new object has full type safety, so type checkers can make sure that consumers are processing results correctly. Logging has been improved to provide better output, distinguishing non-0 exit codes representing failure from those with ignored errors (to help alleviate concerns when people look at debug logs). Old Python 2 compatibility code around `subprocess` have been removed, and new defaults (like `close_fds`) utilized. `execute()` has been updated to wrap `run_process()`. This function is now deprecated, though still widely used in the codebase. Upcoming work will transition over to `run_process()` instead.
    1290949c07fc9ea0765bfe999262036f3cd11262
    Description From Last Updated

    'rbtools.deprecation.RemovedInRBTools50Warning' imported but unused Column: 1 Error code: F401

    reviewbotreviewbot

    local variable 'e' is assigned to but never used Column: 5 Error code: F841

    reviewbotreviewbot

    This should be Raises: and should list the TypeError for the command arg. Should we also list Exception in here …

    maubinmaubin

    Wrap run_process with :py:func:.

    maubinmaubin

    Shouldn't these 2 come before RunProcessError?

    maubinmaubin

    unicode -> string

    daviddavid

    This seems a little mangled.

    daviddavid
    Checks run (1 failed, 1 succeeded)
    flake8 failed.
    JSHint passed.

    flake8

    chipx86
    chipx86
    Review request changed
    Change Summary:

    Added documentation, tests, and logging for PermissionError, FileNotFoundError, and general exceptions when running a command.

    Commits:
    Summary ID
    Add run_process(), a modern replacement for execute().
    Our `execute()` function, used for calling out into other processes and capturing results, has grown out of control over the years, providing 4 sets of inputs that produce different outputs in different combinations. An attempt to add type annotations for these combinations quickly grew out of control, and along with causing numerous type-related regressions in the codebase over the years, it's finally become time to put `execute()` to rest. This change introduces the successor, `run_process()`. The interface is like a slimmed-down version of `execute()`, allowing basic options like the working directory and environment to be set, stderr to be redirected to stdout, expected output encoding, and control over error handling and logging. There is no control over typing. Instead, this always returns a `RunProcessResult` object, which stores a string version of the command that was run, the exit code, and byte streams for the standard output and error output. From there, a caller can access the raw bytes (via `stdout_bytes.read()` and `stderr_bytes.read()`), Unicode-decoded content (via `stdout.read()` and `stderr.read()`), or line-based versions (via `.readlines()` on any of the streams). Results are *always* byte strings. Decoding to Unicode strings happens on-the-fly via `io.TextIOWrapper` (which is what Python uses for `sys.stdout` and `sys.stderr`). These wrappers are created only on access. While these are all stream objects, they don't stream output from the program. They always contain the full results from the program. If we were to add streaming someday for some SCM, we could do this with a new argument to `run_process()` and new flags on `RunProcessResult` without a redesign. Like with `execute()`, all errors (non-0 exit codes) or selective errors can be ignored. Unlike `execute()`, the caller can still get results when an error is raised. The exception, `RunProcessError`, contains the results as an attribute (`result`), containing all the same information that would normally be returned. Given that, the support for ignoring errors is really just a convenience around catching an exception. This new object has full type safety, so type checkers can make sure that consumers are processing results correctly. Logging has been improved to provide better output, distinguishing non-0 exit codes representing failure from those with ignored errors (to help alleviate concerns when people look at debug logs). Old Python 2 compatibility code around `subprocess` have been removed, and new defaults (like `close_fds`) utilized. `execute()` has been updated to wrap `run_process()`. This function is now deprecated, though still widely used in the codebase. Upcoming work will transition over to `run_process()` instead.
    7c4bccdf5facf0d9c9413d3b52db2ac14f65e900
    Add run_process(), a modern replacement for execute().
    Our `execute()` function, used for calling out into other processes and capturing results, has grown out of control over the years, providing 4 sets of inputs that produce different outputs in different combinations. An attempt to add type annotations for these combinations quickly grew out of control, and along with causing numerous type-related regressions in the codebase over the years, it's finally become time to put `execute()` to rest. This change introduces the successor, `run_process()`. The interface is like a slimmed-down version of `execute()`, allowing basic options like the working directory and environment to be set, stderr to be redirected to stdout, expected output encoding, and control over error handling and logging. There is no control over typing. Instead, this always returns a `RunProcessResult` object, which stores a string version of the command that was run, the exit code, and byte streams for the standard output and error output. From there, a caller can access the raw bytes (via `stdout_bytes.read()` and `stderr_bytes.read()`), Unicode-decoded content (via `stdout.read()` and `stderr.read()`), or line-based versions (via `.readlines()` on any of the streams). Results are *always* byte strings. Decoding to Unicode strings happens on-the-fly via `io.TextIOWrapper` (which is what Python uses for `sys.stdout` and `sys.stderr`). These wrappers are created only on access. While these are all stream objects, they don't stream output from the program. They always contain the full results from the program. If we were to add streaming someday for some SCM, we could do this with a new argument to `run_process()` and new flags on `RunProcessResult` without a redesign. Like with `execute()`, all errors (non-0 exit codes) or selective errors can be ignored. Unlike `execute()`, the caller can still get results when an error is raised. The exception, `RunProcessError`, contains the results as an attribute (`result`), containing all the same information that would normally be returned. Given that, the support for ignoring errors is really just a convenience around catching an exception. This new object has full type safety, so type checkers can make sure that consumers are processing results correctly. Logging has been improved to provide better output, distinguishing non-0 exit codes representing failure from those with ignored errors (to help alleviate concerns when people look at debug logs). Old Python 2 compatibility code around `subprocess` have been removed, and new defaults (like `close_fds`) utilized. `execute()` has been updated to wrap `run_process()`. This function is now deprecated, though still widely used in the codebase. Upcoming work will transition over to `run_process()` instead.
    973715d8ed5bbcba1b4ec09628f19267c0604b87

    Checks run (1 failed, 1 succeeded)

    flake8 failed.
    JSHint passed.

    flake8

    chipx86
    maubin
    1. 
        
    2. rbtools/utils/process.py (Diff revision 4)
       
       

      Question: What does the single * here do?

      1. This enforces that all following arguments are only ever passed as keyword arguments. Any attempt at passing as positional arguments will result in an error. Helps create a clean dividing line and means we don't need to worry about preserving any ordering after the *.

        Also allows for a concept of required keyword arguments (ones without a default value).

      2. Cool.

    3. rbtools/utils/process.py (Diff revision 4)
       
       
      Show all issues

      This should be Raises: and should list the TypeError for the command arg. Should we also list Exception in here for the unexpected errors that may happen when running a command?

      Also need to add a Returns: section.

    4. rbtools/utils/process.py (Diff revision 4)
       
       
      Show all issues

      Wrap run_process with :py:func:.

      1. This is fine in the description, but shouldn't be done in the summary (that should be plain text, no formatting, or some things get wonky).

    5. rbtools/utils/tests/test_process.py (Diff revision 4)
       
       
       
      Show all issues

      Shouldn't these 2 come before RunProcessError?

      1. We do sorting as case-sensitive (effectively ASCII character code sorting), which has capitals first. If you feed the value into |sort, that's the result you'll get.

      2. Got it.

    6. 
        
    chipx86
    maubin
    1. Ship It!
    2. 
        
    david
    1. 
        
    2. rbtools/utils/encoding.py (Diff revision 5)
       
       
      Show all issues

      unicode -> string

    3. rbtools/utils/process.py (Diff revision 5)
       
       
       

      It would be pretty cool if we had a decorator that allowed us to mark something as not-to-be-spied-upon.

      1. That would be neat. Want to add that to Asana for me?

    4. rbtools/utils/process.py (Diff revision 5)
       
       
       
       
       
       
       
      Show all issues

      This seems a little mangled.

    5. 
        
    chipx86
    david
    1. Ship It!
    2. 
        
    chipx86
    Review request changed
    Status:
    Completed
    Change Summary:
    Pushed to release-4.x (768dcdc)