Normalize, encode, and decode cert hostnames for storage.
Review Request #15016 — Created April 16, 2026 and submitted
When a certificate represents a hostname, or a client requests one, that
hostname may be presented in any casing (uppercase, lowercase, mixed
case), which can pose issues for comparison. Further, when dealing with
filesystem storage, we may encounter hostnames with non-ASCII characters
in them, which may pose challenges depending on the filesystem.This change introduces casing normalization of hostnames in the storage
objects to ease comparisons, and normalization/encoding/decoding in the
file storage backend to handle encoding and representation differences.The base storage objects that deal with hostnames now keep a version of
the hostname normalized for comparison purposes. This is a Unicode
string that can resolve to a hostname, but with casing converted to
lowercase. This eases comparison and gives a consistent representation
of these hostnames.The file storage backend handles its own normalization and translation
behavior when computing filenames for a given hostname. Encoding
involves removing any trailing period on the hostname and then
then converting to an IDNA 2008 representation to handle Unicode
characters. The result is an ASCII filename safe for all filesystems.
Decoding does the inverse of this.IDNA handling depends on the
idnalibrary, which is a new dependency
added to Review Board 8. This supports IDNA 2008 standards with UTS46
normalization, which amongst other things handles casing differences.Note that the standard
Certificate,CertificateFingerprints, etc.
objects do not normalize hostnames. They are a representation of their
source. Whether that source is caller-supplied input, an X.509
certificate, or a storage object, it will reflect the version of the
hostname on there. That allows for creating an object that can represent
a piece of state that can then be introspected or validated, which we do
today.
Unit tests pass.
| Summary | ID |
|---|---|
| 7787cc93803afad7474d670b33238f41ef1aa460 |
| Description | From | Last Updated |
|---|---|---|
|
Looks like reviewboard.certs.cert.Certificate.__init__ is also storing hostname, we should normalize there. |
|
|
|
Could we make a central helper for normalizing the hostname? Right now we only do casefold but it seems like … |
|
|
|
Can we add additional tests to verify mixed-case comparisons with certificates and fingerprints? |
|
|
|
This comparison is happening before we do any casefold()ing |
|
|
|
We should normalize here too. |
|
|
|
We should normalize here too. |
|
|
|
We should probably be normalizing the hostname here at the ingress point instead of deep inside _build* |
|
|
|
Same with the hostname here. |
|
|
|
And here. |
|
|
|
Is there a reason to use lower() here instead of casefold()? |
|
|
|
Apparently python's idna codec is old. We already have the idna package available because it's a dependency of cryptography, so … |
|
|
|
Typo: an decoded -> a decoded |
|
|
|
We don't return here anymore, we raise. |
|
|
|
This is a little confusing. How about "The hostname {hostname} contains invalid characters and cannot be stored"? |
|
|
|
Given that this isn't really a fatal error, we should probably use warning instead of error |
|
|
|
redefinition of unused 'test_init_with_unicode_hostname' from line 830 Column: 5 Error code: F811 |
|
|
|
SyntaxError: unterminated string literal (detected at line 2223) Column: 22 Error code: E999 |
|
-
-
Looks like
reviewboard.certs.cert.Certificate.__init__is also storing hostname, we should normalize there. -
Could we make a central helper for normalizing the hostname? Right now we only do
casefoldbut it seems like there might be other things we'd want to do in the future (trimming, IDNA conversion, etc). -
-
-
-
-
We should probably be normalizing the hostname here at the ingress point instead of deep inside
_build* -
-
- Change Summary:
-
- Expanded the scope of the change a bit.
- Changed the normalization at the filename level to handle things like Unicode characters.
- Added hostname casing normalization before comparison in
BaseStoredCertificate.__init__. - Added hostname casing normalization to the rest of the base storage objects.
- Added a whole lot of new unit tests for hostname casing normalization and storage encoding.
- Summary:
-
Normalize cert hostnames for storage.Normalize, encode, and decode cert hostnames for storage.
- Description:
-
~ Certificates may present hostnames in any casing (uppercase, lowercase,
~ mixed case) in both the Subject and SAN fields, so it's important to ~ look up and store certificates in a normalized form. ~ When a certificate represents a hostname, or a client requests one, that
~ hostname may be presented in any casing (uppercase, lowercase, mixed ~ case), which can pose issues for comparison. Further, when dealing with + filesystem storage, we may encounter hostnames with non-ASCII characters + in them, which may pose challenges depending on the filesystem. ~ Now, all stored certificates case fold the hostname, converting to
~ lowercase in a Unicode-safe manner. This is done for both storage and ~ lookup. ~ This change introduces casing normalization of hostnames in the storage
~ objects to ease comparisons, and normalization/encoding/decoding in the ~ file storage backend to handle encoding and representation differences. + + The base storage objects that deal with hostnames now keep a version of
+ the hostname normalized for comparison purposes. This is a Unicode + string that can resolve to a hostname, but with casing converted to + lowercase. This eases comparison and gives a consistent representation + of these hostnames. + + The file storage backend handles its own normalization and translation
+ behavior when computing filenames for a given hostname. Encoding + involves removing any trailing period on the hostname, converting to + lowercase, and then converting to an IDNA representation to handle + Unicode characters. The result is an ASCII filename safe for all + filesystems. Decoding does the inverse of this. + + Note that the standard
Certificate,CertificateFingerprints, etc.+ objects do not normalize hostnames. They are a representation of their + source. Whether that source is caller-supplied input, an X.509 + certificate, or a storage object, it will reflect the version of the + hostname on there. That allows for creating an object that can represent + a piece of state that can then be introspected or validated, which we do + today. - Commits:
-
Summary ID 72783c2820dd7e081847de3ab03059a18116a6e6 8ef5b55bd70f83aa73a5c59d1bcaa217d5bc32c6 - Branch:
-
release-7.1.xrelease-8.x
- Diff:
-
Revision 2 (+2226 -212)
Checks run (2 succeeded)
- Change Summary:
-
- Added a dependency on the
idnalibrary, which we now use for the IDNA encoding/decoding (including case normalization). - Added new error handling if encoding/decoding fails.
- Added unit tests and data files that cover these new failure conditions.
- Fixed a typo in a docstring.
- Added a dependency on the
- Description:
-
When a certificate represents a hostname, or a client requests one, that
hostname may be presented in any casing (uppercase, lowercase, mixed case), which can pose issues for comparison. Further, when dealing with filesystem storage, we may encounter hostnames with non-ASCII characters in them, which may pose challenges depending on the filesystem. This change introduces casing normalization of hostnames in the storage
objects to ease comparisons, and normalization/encoding/decoding in the file storage backend to handle encoding and representation differences. The base storage objects that deal with hostnames now keep a version of
the hostname normalized for comparison purposes. This is a Unicode string that can resolve to a hostname, but with casing converted to lowercase. This eases comparison and gives a consistent representation of these hostnames. The file storage backend handles its own normalization and translation
behavior when computing filenames for a given hostname. Encoding ~ involves removing any trailing period on the hostname, converting to ~ lowercase, and then converting to an IDNA representation to handle ~ Unicode characters. The result is an ASCII filename safe for all ~ filesystems. Decoding does the inverse of this. ~ involves removing any trailing period on the hostname and then ~ then converting to an IDNA 2008 representation to handle Unicode ~ characters. The result is an ASCII filename safe for all filesystems. ~ Decoding does the inverse of this. + + IDNA handling depends on the
idnalibrary, which is a new dependency+ added to Review Board 8. This supports IDNA 2008 standards with UTS46 + normalization, which amongst other things handles casing differences. Note that the standard
Certificate,CertificateFingerprints, etc.objects do not normalize hostnames. They are a representation of their source. Whether that source is caller-supplied input, an X.509 certificate, or a storage object, it will reflect the version of the hostname on there. That allows for creating an object that can represent a piece of state that can then be introspected or validated, which we do today. - Commits:
-
Summary ID 8ef5b55bd70f83aa73a5c59d1bcaa217d5bc32c6 7d5a227b55167c5f78acd8999437877bc1539693 - Diff:
-
Revision 3 (+2654 -230)
- Change Summary:
-
Removed a duplicate unit test.
- Commits:
-
Summary ID 7d5a227b55167c5f78acd8999437877bc1539693 a3938a0688f72a0326a9e502725faccbae72b81c - Diff:
-
Revision 4 (+2586 -206)
Checks run (2 succeeded)
- Change Summary:
-
Improved logging, errors, and comments.
- Commits:
-
Summary ID a3938a0688f72a0326a9e502725faccbae72b81c 7787cc93803afad7474d670b33238f41ef1aa460 - Diff:
-
Revision 5 (+2584 -206)