Ensure rb-site console output is always written as UTF-8.
Review Request #11707 — Created July 6, 2021 and submitted — Latest diff uploaded
Some users noticed an issue where, depending on their local encoding
preferences, Python 3 might not be able to write rb-site output to the
console. This could be reproduced by running with
PYTHONIOENCODING=latin1
. In such encoding modes, Unicode characters
would fail to encode to the target encoding before their bytes were sent
to the underlying stdout/stderr streams.This happened because, by default,
sys.stdout
andsys.stderr
are
io.TextIOWrapper
instances that wrap the stream, encoding any Unicode
strings with the configured encoding.PYTHONIOENCODING
, amongst other
environmental factors, could influence the default encoding.The solution is to avoid using this wrapper, and instead write to the
output stream directly. To do this correctly, we have to wrap the stream
with our own wrapper, using a hard-coded encoding ofutf-8
. Since
we're already taking in stream objects, this is trivial to do.With this change, users should no longer hit issues going through
rb-site
or displaying any other output on their terminals, or when
outputting to files.Unit tests were added that check for this condition and ensure that
content is correctly encoded.
Reproduced the original problem by setting
PYTHONIOENCODING=latin1
and
attempted to an install a site. Verified that it crashed with the reported
encoding errors prior to this patch, but succeeded after.Attempted the same test with redirecting console output.
Unit tests pass.