Fix a Base64Field regression on Postgres leading to bad stored data.

Review Request #11341 — Created Dec. 21, 2020 and submitted

Information

Djblets
release-2.0.x

Reviewers

Base64Field in Djblets 2.0 was updated for Python 2/3 string
compatibility, but it ended up returning the wrong string type when
preparing data for the database. It generated Base64 content, which
comes in the form of a byte string, and returned it for storage. This
was handled fine by SQLite and MySQL, but Postgres replaced it with a
literal \x (2 byte string, not the escape code). Upon loading the data
again, this would lead to an error about invalid Base64 content.

The correct thing to do was to return a Unicode string. On Python 2,
either byte string or Unicode strings are handled fine as a return
value, so long as the database backend can handle the content. On Python
3, the string type seems to heavily depend on the backend. Since we're
storing pure ASCII content (Base64 encoded value), a Unicode string is
the safest option.

Since Djblets 2.0 went live, and we've had some customers test an
install of Review Board 4.0 that ended up breaking due to this bug, we
need to ensure we're compatible with this bad stored data. We now look
for this when pulling from the database, and if we see it, we turn it
back into an empty string.

Unit tests pass on all versions of Python.

Manually tested the failure condition (storing an empty byte string) on
SQLite, MySQL, and Postgres. Verified it only failed on Postgres.

Manually tested the fix on all, and the workaround for loading the
previously-stored bad data on all.

Summary ID
Fix a Base64Field regression on Postgres leading to bad stored data.
`Base64Field` in Djblets 2.0 was updated for Python 2/3 string compatibility, but it ended up returning the wrong string type when preparing data for the database. It generated Base64 content, which comes in the form of a byte string, and returned it for storage. This was handled fine by SQLite and MySQL, but Postgres replaced it with a literal `\x` (2 byte string, not the escape code). Upon loading the data again, this would lead to an error about invalid Base64 content. The correct thing to do was to return a Unicode string. On Python 2, either byte string or Unicode strings are handled fine as a return value, so long as the database backend can handle the content. On Python 3, the string type seems to heavily depend on the backend. Since we're storing pure ASCII content (Base64 encoded value), a Unicode string is the safest option. Since Djblets 2.0 went live, and we've had some customers test an install of Review Board 4.0 that ended up breaking due to this bug, we need to ensure we're compatible with this bad stored data. We now look for this when pulling from the database, and if we see it, we turn it back into an empty string.
54aa6c601304e22974a641b5219249c338ddac91
david
  1. Ship It!
  2. 
      
chipx86
Review request changed
Status:
Completed
Change Summary:
Pushed to release-2.0.x (5acdba5)