Summary

Rework encoding of Revisions during diff processing to avoid regressions.

Review Request #12825 — Created Feb. 7, 2023 and submitted Feb. 7, 2023, 4:51 a.m. — Latest diff uploaded Feb. 7, 2023, 3:44 a.m.

Information

Owner

chipx86

Repository

Review Board

Branch

release-4.0.x

Bugs

4988

Depends On

Reviewers

Groups

reviewboard

People

Description

Due to a bug during a refactor late in 4.0.x, we were attempting to
store a raw Revision object in extra_data. The intent of the code
was to convert a revision to something we could store by using
convert_to_unicode() and factoring in the repository's configured
encoding list. However, this didn't know how to convert Revision
objects.

We received a patch that addresses this by performing a conversion
before passing it in, which guaranteed we'd get a string out of it. This
used force_str()/force_text(). However, that performs its decoding
from byte strings to Unicode strings, and makes its own assumptions
about the encoding type that should be used. That could result in an
incorrectly-decoded string.

This isn't a major issue for revisions on most SCMs, which are generally
using ASCII (though, there's no guarantee here). However, the filename
was also force-decoded prematurely, which could impact some setups.

This change removes the force-decoding for filenames entirely, which was
probably just added as a "just in case" measure, and then does a very
explicit Revision to bytes conversion before passing to
convert_to_unicode(). This is the safest way to handle this, avoiding
the heuristics in Django's string conversion machinery, and gets us the
result we want.

Unit tests were added to cover this case.

Testing Done

Unit tests pass. Verified the new test case failed with the reported
error without the fix.

Review Board + Power Pack 7.1 alpha 0 (dev)

Rework encoding of Revisions during diff processing to avoid regressions.

Information

Reviewers

Commits

Files