Fix data truncation, blocking I/O, and performance issues in rbssh.

Review Request #12574 — Created Sept. 2, 2022 and submitted — Latest diff uploaded


Review Board


rbssh has been in use for many, many years, and has worked in the cases
that people most often use it for (Git, Subversion, etc.). However, a
new integration that remote-controls a server over SSH has unveiled a
problem where we can stop processing the channel before we finish
handling all stdout/stderr content.

This will happen if the other end sends a large payload of data
(extending a buffer beyond our read limit of 4096) without blocking, and
then closes the stream with an EOF.

Paramiko will see the EOF and then mark the channel as "closed". This is
sort of a lie. "closed" does not mean "stop reading", as we've
interpreted it, but rather "no more data will be come in." There's a
difference there, as the buffers may still contain content.

This was one issue. Another is that we'd try to output data and,
sometimes this would result in a Resource Not Available error. This
depends on the version of Python and how the data is being consumed
(e.g., outputting to iTerm on macOS using Python 2.7 can trigger this,
but redirecting to a file does not).

These have been solved by doing two things:

  1. We no longer assume we're done reading if the channel is marked as
    closed. Instead, we only close if we've received an exit code, we did
    not receive new data, and we don't have any data left in the buffers.

  2. stdout/stderr are now non-blocking, like stdin. This ensures data
    gets sent immediately to the caller, and forces us to handle writes
    accordingly (which could be an issue if a caller were to set our
    blocking flags for those streams).

With these fixed, we're also able to increase the buffer sizes to
improve performance.

For stdout/stderr data, we now read up to 16KB at a time (it will not
wait for a full 16KB, it'll just read up to that).

For stdin, we also read up to 16KB, instead of 1 byte. This is far more
efficient. We use the same blocking-aware logic we now have for working
with stdout/stderr streams.

Unit tests pass.

Tested the new integration that occasionally triggered the buffer
truncation issue. With this fix, the issue no longer occurred across
many dozens of repeated attempts.

Ran many tests using Python 2.7 and various 3.x versions where I ran
commands on multiple remote machines, checking for blocking I/O issues
and truncated data. These included using cat to transfer data to a
remote server and to transfer from a remote server, verifying hashes.

Tested performance and verified a pretty substantial (several seconds)
decrease in time when transferring 100MB+ files.