• 
      

    Fix issues in rbssh that could lead to race conditions or bad logging.

    Review Request #12834 — Created Feb. 20, 2023 and submitted — Latest diff uploaded

    Information

    Review Board
    release-5.0.x

    Reviewers

    A few problems were encountered during some testing with a company that
    resulted in missing results from rbssh and problems logging.

    The logging issue was a combination of Django overwriting our logging
    setup, and logging not always flushing. We now set up Django as the
    first thing in main(), and we explicitly shut down before exit.

    The data loss issue could occur if the stdin stream was opened,
    selected for new data, yet empty. This was tricky to reproduce, but we
    hit this on CentOS 7.9.2009 with a custom-built Python 3.7.9 running
    within Apache 2.4.6 and mod_wsgi in embedded mode.

    When stdin was closed, we finished selecting for data. This was
    definitely a bug. We hadn't seen this come up in production before, but
    had a reliable test case to verify this now.

    Now, when closed, we simply exclude it from future selects, and only
    close out the stream when the remote process itself ends.

    The loop for reading data has been made a bit less expensive. We now
    wait 0.01 seconds between reads, using Paramiko's own constant and
    following its logic.

    There are a few other small fixes/improvements:

    1. Errors are now outputted to stderr, and not just the log, helping to
      catch issues.

    2. Setting streams to non-blocking now uses some handy Python helpers,
      rather than going through fcntl.

    3. More conditions and data are now logged, to help with debugging,
      and log messages have been cleaned up.

    Locally tested rbssh with shell mode and remote execution with
    short-running processes, long-running processes, and streaming
    processes.

    Locally tested within a web server, interfacing with Cliosoft SOS
    over SSH mode.

    Tested remotely on the CentOS system that exhibited the data loss
    issue. Verified that all interaction worked correctly under reliable
    failure conditions.

    Commits

    Files