Fix issues in rbssh that could lead to race conditions or bad logging.

Review Request #12834 — Created Feb. 20, 2023 and submitted

Information

Review Board
release-5.0.x

Reviewers

A few problems were encountered during some testing with a company that
resulted in missing results from rbssh and problems logging.

The logging issue was a combination of Django overwriting our logging
setup, and logging not always flushing. We now set up Django as the
first thing in main(), and we explicitly shut down before exit.

The data loss issue could occur if the stdin stream was opened,
selected for new data, yet empty. This was tricky to reproduce, but we
hit this on CentOS 7.9.2009 with a custom-built Python 3.7.9 running
within Apache 2.4.6 and mod_wsgi in embedded mode.

When stdin was closed, we finished selecting for data. This was
definitely a bug. We hadn't seen this come up in production before, but
had a reliable test case to verify this now.

Now, when closed, we simply exclude it from future selects, and only
close out the stream when the remote process itself ends.

The loop for reading data has been made a bit less expensive. We now
wait 0.01 seconds between reads, using Paramiko's own constant and
following its logic.

There are a few other small fixes/improvements:

  1. Errors are now outputted to stderr, and not just the log, helping to
    catch issues.

  2. Setting streams to non-blocking now uses some handy Python helpers,
    rather than going through fcntl.

  3. More conditions and data are now logged, to help with debugging,
    and log messages have been cleaned up.

Locally tested rbssh with shell mode and remote execution with
short-running processes, long-running processes, and streaming
processes.

Locally tested within a web server, interfacing with Cliosoft SOS
over SSH mode.

Tested remotely on the CentOS system that exhibited the data loss
issue. Verified that all interaction worked correctly under reliable
failure conditions.

Summary ID
Fix issues in rbssh that could lead to race conditions or bad logging.
A few problems were encountered during some testing with a company that resulted in missing results from rbssh and problems logging. The logging issue was a combination of Django overwriting our logging setup, and logging not always flushing. We now set up Django as the first thing in `main()`, and we explicitly shut down before exit. The data loss issue could occur if the `stdin` stream was opened, selected for new data, yet empty. This was tricky to reproduce, but we hit this on CentOS 7.9.2009 with a custom-built Python 3.7.9 running within Apache 2.4.6 and mod_wsgi in embedded mode. When `stdin` was closed, we finished selecting for data. This was definitely a bug. We hadn't seen this come up in production before, but had a reliable test case to verify this now. Now, when closed, we simply exclude it from future selects, and only close out the stream when the remote process itself ends. The loop for reading data has been made a bit less expensive. We now wait 0.01 seconds between reads, using Paramiko's own constant and following its logic. There are a few other small fixes/improvements: 1. Errors are now outputted to stderr, and not just the log, helping to catch issues. 2. Setting streams to non-blocking now uses some handy Python helpers, rather than going through `fcntl`. 3. More conditions and data are now logged, to help with debugging, and log messages have been cleaned up.
7c153135ff6858391bfe10516d00b188e2be145c
david
  1. Ship It!
  2. 
      
chipx86
Review request changed
Status:
Completed
Change Summary:
Pushed to release-5.0.x (2b3345a)