• 
      

    Fix issues in rbssh that could lead to race conditions or bad logging.

    Review Request #12834 — Created Feb. 20, 2023 and submitted

    Information

    Review Board
    release-5.0.x

    Reviewers

    A few problems were encountered during some testing with a company that
    resulted in missing results from rbssh and problems logging.

    The logging issue was a combination of Django overwriting our logging
    setup, and logging not always flushing. We now set up Django as the
    first thing in main(), and we explicitly shut down before exit.

    The data loss issue could occur if the stdin stream was opened,
    selected for new data, yet empty. This was tricky to reproduce, but we
    hit this on CentOS 7.9.2009 with a custom-built Python 3.7.9 running
    within Apache 2.4.6 and mod_wsgi in embedded mode.

    When stdin was closed, we finished selecting for data. This was
    definitely a bug. We hadn't seen this come up in production before, but
    had a reliable test case to verify this now.

    Now, when closed, we simply exclude it from future selects, and only
    close out the stream when the remote process itself ends.

    The loop for reading data has been made a bit less expensive. We now
    wait 0.01 seconds between reads, using Paramiko's own constant and
    following its logic.

    There are a few other small fixes/improvements:

    1. Errors are now outputted to stderr, and not just the log, helping to
      catch issues.

    2. Setting streams to non-blocking now uses some handy Python helpers,
      rather than going through fcntl.

    3. More conditions and data are now logged, to help with debugging,
      and log messages have been cleaned up.

    Locally tested rbssh with shell mode and remote execution with
    short-running processes, long-running processes, and streaming
    processes.

    Locally tested within a web server, interfacing with Cliosoft SOS
    over SSH mode.

    Tested remotely on the CentOS system that exhibited the data loss
    issue. Verified that all interaction worked correctly under reliable
    failure conditions.

    Summary ID
    Fix issues in rbssh that could lead to race conditions or bad logging.
    A few problems were encountered during some testing with a company that resulted in missing results from rbssh and problems logging. The logging issue was a combination of Django overwriting our logging setup, and logging not always flushing. We now set up Django as the first thing in `main()`, and we explicitly shut down before exit. The data loss issue could occur if the `stdin` stream was opened, selected for new data, yet empty. This was tricky to reproduce, but we hit this on CentOS 7.9.2009 with a custom-built Python 3.7.9 running within Apache 2.4.6 and mod_wsgi in embedded mode. When `stdin` was closed, we finished selecting for data. This was definitely a bug. We hadn't seen this come up in production before, but had a reliable test case to verify this now. Now, when closed, we simply exclude it from future selects, and only close out the stream when the remote process itself ends. The loop for reading data has been made a bit less expensive. We now wait 0.01 seconds between reads, using Paramiko's own constant and following its logic. There are a few other small fixes/improvements: 1. Errors are now outputted to stderr, and not just the log, helping to catch issues. 2. Setting streams to non-blocking now uses some handy Python helpers, rather than going through `fcntl`. 3. More conditions and data are now logged, to help with debugging, and log messages have been cleaned up.
    7c153135ff6858391bfe10516d00b188e2be145c
    david
    1. Ship It!
    2. 
        
    chipx86
    Review request changed
    Status:
    Completed
    Change Summary:
    Pushed to release-5.0.x (2b3345a)