Ensure rb-site console output is always written as UTF-8.

Review Request #11707 — Created July 6, 2021 and submitted

Review Board

Some users noticed an issue where, depending on their local encoding
preferences, Python 3 might not be able to write rb-site output to the
console. This could be reproduced by running with
PYTHONIOENCODING=latin1. In such encoding modes, Unicode characters
would fail to encode to the target encoding before their bytes were sent
to the underlying stdout/stderr streams.

This happened because, by default, sys.stdout and sys.stderr are
io.TextIOWrapper instances that wrap the stream, encoding any Unicode
strings with the configured encoding. PYTHONIOENCODING, amongst other
environmental factors, could influence the default encoding.

The solution is to avoid using this wrapper, and instead write to the
output stream directly. To do this correctly, we have to wrap the stream
with our own wrapper, using a hard-coded encoding of utf-8. Since
we're already taking in stream objects, this is trivial to do.

With this change, users should no longer hit issues going through
rb-site or displaying any other output on their terminals, or when
outputting to files.

Unit tests were added that check for this condition and ensure that
content is correctly encoded.

Reproduced the original problem by setting PYTHONIOENCODING=latin1 and
attempted to an install a site. Verified that it crashed with the reported
encoding errors prior to this patch, but succeeded after.

Attempted the same test with redirecting console output.

Unit tests pass.

Ensure rb-site console output is always written as UTF-8.
  1. Ship It!
Review request changed

Status: Closed (submitted)

Change Summary:

Pushed to release-4.0.x (edefd2b)