• 
      

    Ensure rb-site console output is always written as UTF-8.

    Review Request #11707 — Created July 6, 2021 and submitted

    Information

    Review Board
    release-4.0.x

    Reviewers

    Some users noticed an issue where, depending on their local encoding
    preferences, Python 3 might not be able to write rb-site output to the
    console. This could be reproduced by running with
    PYTHONIOENCODING=latin1. In such encoding modes, Unicode characters
    would fail to encode to the target encoding before their bytes were sent
    to the underlying stdout/stderr streams.

    This happened because, by default, sys.stdout and sys.stderr are
    io.TextIOWrapper instances that wrap the stream, encoding any Unicode
    strings with the configured encoding. PYTHONIOENCODING, amongst other
    environmental factors, could influence the default encoding.

    The solution is to avoid using this wrapper, and instead write to the
    output stream directly. To do this correctly, we have to wrap the stream
    with our own wrapper, using a hard-coded encoding of utf-8. Since
    we're already taking in stream objects, this is trivial to do.

    With this change, users should no longer hit issues going through
    rb-site or displaying any other output on their terminals, or when
    outputting to files.

    Unit tests were added that check for this condition and ensure that
    content is correctly encoded.

    Reproduced the original problem by setting PYTHONIOENCODING=latin1 and
    attempted to an install a site. Verified that it crashed with the reported
    encoding errors prior to this patch, but succeeded after.

    Attempted the same test with redirecting console output.

    Unit tests pass.

    Summary ID
    Ensure rb-site console output is always written as UTF-8.
    Some users noticed an issue where, depending on their local encoding preferences, Python 3 might not be able to write rb-site output to the console. This could be reproduced by running with `PYTHONIOENCODING=latin1`. In such encoding modes, Unicode characters would fail to encode to the target encoding before their bytes were sent to the underlying stdout/stderr streams. This happened because, by default, `sys.stdout` and `sys.stderr` are `io.TextIOWrapper` instances that wrap the stream, encoding any Unicode strings with the configured encoding. `PYTHONIOENCODING`, amongst other environmental factors, could influence the default encoding. The solution is to avoid using this wrapper, and instead write to the output stream directly. To do this correctly, we have to wrap the stream with our own wrapper, using a hard-coded encoding of `utf-8`. Since we're already taking in stream objects, this is trivial to do. With this change, users should no longer hit issues going through `rb-site` or displaying any other output on their terminals, or when outputting to files.
    f08efa9a9a9dcb883c26d08a55429091a6f14e78
    david
    1. Ship It!
    2. 
        
    chipx86
    Review request changed
    Status:
    Completed
    Change Summary:
    Pushed to release-4.0.x (edefd2b)