Summary

Normalize, encode, and decode cert hostnames for storage.

Review Request #15016 — Created April 16, 2026 and submitted May 6, 2026, 1:31 a.m.

Information

Owner

chipx86

Repository

Review Board

Branch

release-8.x

Bugs

Depends On

Blocks

15017

Reviewers

Groups

reviewboard

People

Description

This change introduces casing normalization of hostnames in the storage
objects to ease comparisons, and normalization/encoding/decoding in the
file storage backend to handle encoding and representation differences.

The base storage objects that deal with hostnames now keep a version of
the hostname normalized for comparison purposes. This is a Unicode
string that can resolve to a hostname, but with casing converted to
lowercase. This eases comparison and gives a consistent representation
of these hostnames.

The file storage backend handles its own normalization and translation
behavior when computing filenames for a given hostname. Encoding
involves removing any trailing period on the hostname and then
then converting to an IDNA 2008 representation to handle Unicode
characters. The result is an ASCII filename safe for all filesystems.
Decoding does the inverse of this.

IDNA handling depends on the idna library, which is a new dependency
added to Review Board 8. This supports IDNA 2008 standards with UTS46
normalization, which amongst other things handles casing differences.

Note that the standard Certificate, CertificateFingerprints, etc.
objects do not normalize hostnames. They are a representation of their
source. Whether that source is caller-supplied input, an X.509
certificate, or a storage object, it will reflect the version of the
hostname on there. That allows for creating an object that can represent
a piece of state that can then be introspected or validated, which we do
today.

Testing Done

Unit tests pass.

Commits

Summary	ID
Normalize, encode, and decode cert hostnames for storage. When a certificate represents a hostname, or a client requests one, that hostname may be presented in any casing (uppercase, lowercase, mixed case), which can pose issues for comparison. Further, when dealing with filesystem storage, we may encounter hostnames with non-ASCII characters in them, which may pose challenges depending on the filesystem. This change introduces casing normalization of hostnames in the storage objects to ease comparisons, and normalization/encoding/decoding in the file storage backend to handle encoding and representation differences. The base storage objects that deal with hostnames now keep a version of the hostname normalized for comparison purposes. This is a Unicode string that can resolve to a hostname, but with casing converted to lowercase. This eases comparison and gives a consistent representation of these hostnames. The file storage backend handles its own normalization and translation behavior when computing filenames for a given hostname. Encoding involves removing any trailing period on the hostname and then then converting to an IDNA 2008 representation to handle Unicode characters. The result is an ASCII filename safe for all filesystems. Decoding does the inverse of this. IDNA handling depends on the `idna` library, which is a new dependency added to Review Board 8. This supports IDNA 2008 standards with UTS46 normalization, which amongst other things handles casing differences. Note that the standard `Certificate`, `CertificateFingerprints`, etc. objects do not normalize hostnames. They are a representation of their source. Whether that source is caller-supplied input, an X.509 certificate, or a storage object, it will reflect the version of the hostname on there. That allows for creating an object that can represent a piece of state that can then be introspected or validated, which we do today.	7787cc93803afad7474d670b33238f41ef1aa460

Summary

Normalize, encode, and decode cert hostnames for storage.

When a certificate represents a hostname, or a client requests one, that hostname may be presented in any casing (uppercase, lowercase, mixed case), which can pose issues for comparison. Further, when dealing with filesystem storage, we may encounter hostnames with non-ASCII characters in them, which may pose challenges depending on the filesystem. This change introduces casing normalization of hostnames in the storage objects to ease comparisons, and normalization/encoding/decoding in the file storage backend to handle encoding and representation differences. The base storage objects that deal with hostnames now keep a version of the hostname normalized for comparison purposes. This is a Unicode string that can resolve to a hostname, but with casing converted to lowercase. This eases comparison and gives a consistent representation of these hostnames. The file storage backend handles its own normalization and translation behavior when computing filenames for a given hostname. Encoding involves removing any trailing period on the hostname and then then converting to an IDNA 2008 representation to handle Unicode characters. The result is an ASCII filename safe for all filesystems. Decoding does the inverse of this. IDNA handling depends on the `idna` library, which is a new dependency added to Review Board 8. This supports IDNA 2008 standards with UTS46 normalization, which amongst other things handles casing differences. Note that the standard `Certificate`, `CertificateFingerprints`, etc. objects do not normalize hostnames. They are a representation of their source. Whether that source is caller-supplied input, an X.509 certificate, or a storage object, it will reflect the version of the hostname on there. That allows for creating an object that can represent a piece of state that can then be introspected or validated, which we do today.

7787cc93803afad7474d670b33238f41ef1aa460

Issues

Description	From	Last Updated
Looks like reviewboard.certs.cert.Certificate.__init__ is also storing hostname, we should normalize there.	david	April 20, 2026, 1:31 a.m.
Could we make a central helper for normalizing the hostname? Right now we only do casefold but it seems like …	david	April 20, 2026, 2:43 p.m.
Can we add additional tests to verify mixed-case comparisons with certificates and fingerprints?	david	April 20, 2026, 9:04 p.m.
This comparison is happening before we do any casefold()ing	david	April 20, 2026, 2:48 p.m.
We should normalize here too.	david	April 20, 2026, 2:50 p.m.
We should normalize here too.	david	April 20, 2026, 2:50 p.m.
We should probably be normalizing the hostname here at the ingress point instead of deep inside _build*	david	April 20, 2026, 3:52 p.m.
Same with the hostname here.	david	April 20, 2026, 3:52 p.m.
And here.	david	April 20, 2026, 3:52 p.m.
Is there a reason to use lower() here instead of casefold()?	david	April 28, 2026, 10:21 p.m.
Apparently python's idna codec is old. We already have the idna package available because it's a dependency of cryptography, so …	david	April 28, 2026, 10:21 p.m.
Typo: an decoded -> a decoded	david	April 27, 2026, 3:53 p.m.
We don't return here anymore, we raise.	david	April 29, 2026, 2:49 p.m.
This is a little confusing. How about "The hostname {hostname} contains invalid characters and cannot be stored"?	david	April 29, 2026, 2:49 p.m.
Given that this isn't really a fatal error, we should probably use warning instead of error	david	April 29, 2026, 2:50 p.m.
redefinition of unused 'test_init_with_unicode_hostname' from line 830 Column: 5 Error code: F811	reviewbot	April 28, 2026, 10:22 p.m.
SyntaxError: unterminated string literal (detected at line 2223) Column: 22 Error code: E999	reviewbot	May 6, 2026, 1:24 a.m.

flake8 passed.

JSHint passed.

The issue has been dropped. Show all issues

Looks like reviewboard.certs.cert.Certificate.__init__ is also storing hostname, we should normalize there.

chipx86

April 20, 2026, 3:52 p.m.

I intentionally left that out there in my final version. Originally I had it normalize there, but ultimately decided that the initially-constructed Certificate should be fully representative of any parsed state. Once it's been committed to the database and loaded back out, we'll be dealing with a normalized version, but we use a newly-constructed Certificate partly for verification and then logging purposes, so keeping it as consistent with the source data at that stage is important.

The issue has been resolved. Show all issues

Could we make a central helper for normalizing the hostname? Right now we only do casefold but it seems like there might be other things we'd want to do in the future (trimming, IDNA conversion, etc).

chipx86

April 20, 2026, 3:52 p.m.

Went ahead and put this in the match_host() change.

The issue has been resolved. Show all issues

Can we add additional tests to verify mixed-case comparisons with certificates and fingerprints?

reviewboard/certs/storage/base.py (Diff revision 1)
The issue has been resolved. Show all issues
```
This comparison is happening before we do any casefold()ing
```
reviewboard/certs/storage/base.py (Diff revision 1)
The issue has been resolved. Show all issues
```
We should normalize here too.
```
reviewboard/certs/storage/file_storage.py (Diff revision 1)
The issue has been resolved. Show all issues
```
We should normalize here too.
```

reviewboard/certs/storage/file_storage.py (Diff revision 1)

The issue has been dropped. Show all issues

We should probably be normalizing the hostname here at the ingress point instead of deep inside _build*

chipx86

April 20, 2026, 3:52 p.m.

So here are my thoughts on normalization.

There are two reasons to normalize:

When comparing hostnames, making sure that two different-cased but otherwise identical hostnames are equal.
When dealing with the filesystem.

These ultimately may have very different normalization approaches as we go forward. We will likely, for instance, need to normalize Unicode characters in a cross-filesystem manner, encoding/decoding characters, but we wouldn't want to represent that version in our objects that way. Similarly, as above, we want to avoid straying too much from the original input too high up.

So I see the hostname handling as follows:

On the high-level classes (Certificate, etc.), we take the raw input as-is, and do not normalize.
In the storage classes (FileStoredCertificate, FileStoredCertificateFingerprints, etc.), we can work with a normalized form for comparison purposes, because we're dealing with a wrapper around certificates and not certificates themselves. We're dealing with comparisons and lookup. So those objects can take that form.
When dealing with the filesystem, we're dealing with a separate normalization, one for storage purposes. This is independent of any normalization that may have been done for comparison purposes. I'm going to make that more clear in the code.

Given that, we don't want to normalize here. Normalization will happen when building FileStoredCertificateFingerprints, and storage normalization will happen in _build*.

I'm also going to be improving storage normalization, so they'll be differing anyway.

reviewboard/certs/storage/file_storage.py (Diff revision 1)

The issue has been dropped. Show all issues

Same with the hostname here.

reviewboard/certs/storage/file_storage.py (Diff revision 1)

The issue has been dropped. Show all issues

And here.

Change Summary:


Expanded the scope of the change a bit.
Changed the normalization at the filename level to handle things like Unicode characters.
Added hostname casing normalization before comparison in BaseStoredCertificate.__init__.
Added hostname casing normalization to the rest of the base storage objects.
Added a whole lot of new unit tests for hostname casing normalization and storage encoding.

Summary:

Normalize cert hostnames for storage.

Normalize, encode, and decode cert hostnames for storage.

Description:

~		Certificates may present hostnames in any casing (uppercase, lowercase,
~		mixed case) in both the Subject and SAN fields, so it's important to
~		look up and store certificates in a normalized form.
	~	When a certificate represents a hostname, or a client requests one, that
	~	hostname may be presented in any casing (uppercase, lowercase, mixed
	~	case), which can pose issues for comparison. Further, when dealing with
	+	filesystem storage, we may encounter hostnames with non-ASCII characters
	+	in them, which may pose challenges depending on the filesystem.

~		Now, all stored certificates case fold the hostname, converting to
~		lowercase in a Unicode-safe manner. This is done for both storage and
~		lookup.
	~	This change introduces casing normalization of hostnames in the storage
	~	objects to ease comparisons, and normalization/encoding/decoding in the
	~	file storage backend to handle encoding and representation differences.
	+
	+	The base storage objects that deal with hostnames now keep a version of
	+	the hostname normalized for comparison purposes. This is a Unicode
	+	string that can resolve to a hostname, but with casing converted to
	+	lowercase. This eases comparison and gives a consistent representation
	+	of these hostnames.
	+
	+	The file storage backend handles its own normalization and translation
	+	behavior when computing filenames for a given hostname. Encoding
	+	involves removing any trailing period on the hostname, converting to
	+	lowercase, and then converting to an IDNA representation to handle
	+	Unicode characters. The result is an ASCII filename safe for all
	+	filesystems. Decoding does the inverse of this.
	+
	+	Note that the standard `Certificate`, `CertificateFingerprints`, etc.
	+	objects do not normalize hostnames. They are a representation of their
	+	source. Whether that source is caller-supplied input, an X.509
	+	certificate, or a storage object, it will reflect the version of the
	+	hostname on there. That allows for creating an object that can represent
	+	a piece of state that can then be introspected or validated, which we do
	+	today.

Commits:

	Summary	ID
	Normalize cert hostnames for storage. Certificates may present hostnames in any casing (uppercase, lowercase, mixed case) in both the Subject and SAN fields, so it's important to look up and store certificates in a normalized form. Now, all stored certificates case fold the hostname, converting to lowercase in a Unicode-safe manner. This is done for both storage and lookup.	72783c2820dd7e081847de3ab03059a18116a6e6
	Normalize, encode, and decode cert hostnames for storage. When a certificate represents a hostname, or a client requests one, that hostname may be presented in any casing (uppercase, lowercase, mixed case), which can pose issues for comparison. Further, when dealing with filesystem storage, we may encounter hostnames with non-ASCII characters in them, which may pose challenges depending on the filesystem. This change introduces casing normalization of hostnames in the storage objects to ease comparisons, and normalization/encoding/decoding in the file storage backend to handle encoding and representation differences. The base storage objects that deal with hostnames now keep a version of the hostname normalized for comparison purposes. This is a Unicode string that can resolve to a hostname, but with casing converted to lowercase. This eases comparison and gives a consistent representation of these hostnames. The file storage backend handles its own normalization and translation behavior when computing filenames for a given hostname. Encoding involves removing any trailing period on the hostname, converting to lowercase, and then converting to an IDNA representation to handle Unicode characters. The result is an ASCII filename safe for all filesystems. Decoding does the inverse of this. Note that the standard `Certificate`, `CertificateFingerprints`, etc. objects do not normalize hostnames. They are a representation of their source. Whether that source is caller-supplied input, an X.509 certificate, or a storage object, it will reflect the version of the hostname on there. That allows for creating an object that can represent a piece of state that can then be introspected or validated, which we do today.	8ef5b55bd70f83aa73a5c59d1bcaa217d5bc32c6

Branch:

release-7.1.x

release-8.x

Diff:

Revision 2 (+2226 -212)

Show changes

	reviewboard/certs/storage/base.py
	reviewboard/certs/storage/file_storage.py
	reviewboard/certs/tests/test_file_storage_backend.py
	reviewboard/certs/tests/testdata/file_storage/certs/client/__.eng.example.xn--cm-fka__443.crt
	reviewboard/certs/tests/testdata/file_storage/certs/client/__.eng.example.xn--cm-fka__443.key
	reviewboard/certs/tests/testdata/file_storage/certs/trust/__.eng.example.xn--cm-fka__443.crt
	reviewboard/certs/tests/testdata/file_storage/fingerprints/www.example.xn--cm-fka__443.json

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

reviewboard/certs/storage/file_storage.py (Diff revision 2)

The issue has been resolved. Show all issues

Is there a reason to use lower() here instead of casefold()?

chipx86

April 27, 2026, 3:52 p.m.

So the following is based on using .encode('idna'), and I spent all this time writing this, but things change with the idna package. I'm looking into that now.
We're encoding for IDNA, and it ultimately doesn't matter in this case which we use when using .encode('idna'). lower() is cheaper (casefold() is more aggressive in how it modifies things), but .encode('idna') will do the right thing with either. When Unicode characters are present, it'll ultimately lowercase it all correctly, but when not, it'll leave it alone. So here's how it ends up working:

# Differences between lower() and casefold() for a string with certain mixed-case unicode characters:
>>> 'Straße.de'.lower()
'straße.de'

>>> 'Straße.de'.casefold()
'strasse.de'

>>> 'Éxamplé.COM'.lower()
'éxamplé.com'

>>> 'Éxamplé.COM'.casefold()
'éxamplé.com'


# IDNA encoding with mixed-case Unicode and mixed-case ASCII:
>>> 'Straße.de'.encode('idna')
b'strasse.de'

>>> >>> 'FooBar'.encode('idna')
b'FooBar'

>>> 'Éxamplé.COM'.encode('idna')
b'xn--xampl-9raf.COM'


# Combining the two:
>>> 'Straße.de'.lower().encode('idna')
b'strasse.de'

>>> 'Straße.de'.casefold().encode('idna')
b'strasse.de'

>>> 'Éxamplé.COM'.lower().encode('idna')
b'xn--xampl-9raf.com'

>>> 'Éxamplé.COM'.casefold().encode('idna')
b'xn--xampl-9raf.com'


Okay, that was fun. Now all that said, the idna package has a uts46=True mode, which is probably the right answer if we go with this package. It handles further normalization for user-provided domains.

reviewboard/certs/storage/file_storage.py (Diff revision 2)

The issue has been resolved. Show all issues

Apparently python's idna codec is old. We already have the idna package available because it's a dependency of cryptography, so we should probably use idna.encode() here instead.

chipx86

April 27, 2026, 3:52 p.m.

I can't find anything indicating it's a dependency of cryptography. I only have it through requests. The cryptography source doesn't reference it except as a suggestion in an error message when failing to encode a provided hostname. If we want to use this, we'll need to add it explicitly.

chipx86

April 27, 2026, 4:01 p.m.

Worth pointing out, this module's a little bit heavyweight compared to the built-in stuff.

I'm trying to decide if this is overkill or not. We ultimately just need something we can store in a predictable format. The IDNA version may not matter too much, but I need to learn the differences between them (and figure out what happens if the encoding changes).

reviewboard/certs/storage/file_storage.py (Diff revision 2)
The issue has been resolved. Show all issues
```
Typo: an decoded -> a decoded
```

Change Summary:


Added a dependency on the idna library, which we now use for the IDNA encoding/decoding (including case normalization).
Added new error handling if encoding/decoding fails.
Added unit tests and data files that cover these new failure conditions.
Fixed a typo in a docstring.

Description:

		When a certificate represents a hostname, or a client requests one, that
		hostname may be presented in any casing (uppercase, lowercase, mixed
		case), which can pose issues for comparison. Further, when dealing with
		filesystem storage, we may encounter hostnames with non-ASCII characters
		in them, which may pose challenges depending on the filesystem.

		This change introduces casing normalization of hostnames in the storage
		objects to ease comparisons, and normalization/encoding/decoding in the
		file storage backend to handle encoding and representation differences.

		The base storage objects that deal with hostnames now keep a version of
		the hostname normalized for comparison purposes. This is a Unicode
		string that can resolve to a hostname, but with casing converted to
		lowercase. This eases comparison and gives a consistent representation
		of these hostnames.

		The file storage backend handles its own normalization and translation
		behavior when computing filenames for a given hostname. Encoding
~		involves removing any trailing period on the hostname, converting to
~		lowercase, and then converting to an IDNA representation to handle
~		Unicode characters. The result is an ASCII filename safe for all
~		filesystems. Decoding does the inverse of this.
	~	involves removing any trailing period on the hostname and then
	~	then converting to an IDNA 2008 representation to handle Unicode
	~	characters. The result is an ASCII filename safe for all filesystems.
	~	Decoding does the inverse of this.
	+
	+	IDNA handling depends on the `idna` library, which is a new dependency
	+	added to Review Board 8. This supports IDNA 2008 standards with UTS46
	+	normalization, which amongst other things handles casing differences.

		Note that the standard `Certificate`, `CertificateFingerprints`, etc.
		objects do not normalize hostnames. They are a representation of their
		source. Whether that source is caller-supplied input, an X.509
		certificate, or a storage object, it will reflect the version of the
		hostname on there. That allows for creating an object that can represent
		a piece of state that can then be introspected or validated, which we do
		today.

Commits:

	Summary	ID
	Normalize, encode, and decode cert hostnames for storage. When a certificate represents a hostname, or a client requests one, that hostname may be presented in any casing (uppercase, lowercase, mixed case), which can pose issues for comparison. Further, when dealing with filesystem storage, we may encounter hostnames with non-ASCII characters in them, which may pose challenges depending on the filesystem. This change introduces casing normalization of hostnames in the storage objects to ease comparisons, and normalization/encoding/decoding in the file storage backend to handle encoding and representation differences. The base storage objects that deal with hostnames now keep a version of the hostname normalized for comparison purposes. This is a Unicode string that can resolve to a hostname, but with casing converted to lowercase. This eases comparison and gives a consistent representation of these hostnames. The file storage backend handles its own normalization and translation behavior when computing filenames for a given hostname. Encoding involves removing any trailing period on the hostname, converting to lowercase, and then converting to an IDNA representation to handle Unicode characters. The result is an ASCII filename safe for all filesystems. Decoding does the inverse of this. Note that the standard `Certificate`, `CertificateFingerprints`, etc. objects do not normalize hostnames. They are a representation of their source. Whether that source is caller-supplied input, an X.509 certificate, or a storage object, it will reflect the version of the hostname on there. That allows for creating an object that can represent a piece of state that can then be introspected or validated, which we do today.	8ef5b55bd70f83aa73a5c59d1bcaa217d5bc32c6
	Normalize, encode, and decode cert hostnames for storage. When a certificate represents a hostname, or a client requests one, that hostname may be presented in any casing (uppercase, lowercase, mixed case), which can pose issues for comparison. Further, when dealing with filesystem storage, we may encounter hostnames with non-ASCII characters in them, which may pose challenges depending on the filesystem. This change introduces casing normalization of hostnames in the storage objects to ease comparisons, and normalization/encoding/decoding in the file storage backend to handle encoding and representation differences. The base storage objects that deal with hostnames now keep a version of the hostname normalized for comparison purposes. This is a Unicode string that can resolve to a hostname, but with casing converted to lowercase. This eases comparison and gives a consistent representation of these hostnames. The file storage backend handles its own normalization and translation behavior when computing filenames for a given hostname. Encoding involves removing any trailing period on the hostname and then then converting to an IDNA 2008 representation to handle Unicode characters. The result is an ASCII filename safe for all filesystems. Decoding does the inverse of this. IDNA handling depends on the `idna` library, which is a new dependency added to Review Board 8. This supports IDNA 2008 standards with UTS46 normalization, which amongst other things handles casing differences. Note that the standard `Certificate`, `CertificateFingerprints`, etc. objects do not normalize hostnames. They are a representation of their source. Whether that source is caller-supplied input, an X.509 certificate, or a storage object, it will reflect the version of the hostname on there. That allows for creating an object that can represent a piece of state that can then be introspected or validated, which we do today.	7d5a227b55167c5f78acd8999437877bc1539693

Diff:

Revision 3 (+2654 -230)

Show changes

	reviewboard/dependencies.py
	reviewboard/certs/storage/base.py
	reviewboard/certs/storage/file_storage.py
	reviewboard/certs/tests/test_file_storage_backend.py
	reviewboard/certs/tests/testdata/file_storage/certs/client/__.eng.example.xn--cm-fka__443.crt
	reviewboard/certs/tests/testdata/file_storage/certs/client/__.eng.example.xn--cm-fka__443.key
	reviewboard/certs/tests/testdata/file_storage/certs/client/xn-----xample-9ua.com__443.crt
	reviewboard/certs/tests/testdata/file_storage/certs/trust/__.eng.example.xn--cm-fka__443.crt
	3 more

Checks run (1 failed, 1 succeeded)

flake8 failed.

JSHint passed.

flake8

reviewboard/certs/tests/test_file_storage_backend.py (Diff revision 3)
The issue has been resolved. Show all issues
```
redefinition of unused 'test_init_with_unicode_hostname' from line 830

Column: 5
Error code: F811
```

Change Summary:

Removed a duplicate unit test.

Commits:

	Summary	ID
	Normalize, encode, and decode cert hostnames for storage. When a certificate represents a hostname, or a client requests one, that hostname may be presented in any casing (uppercase, lowercase, mixed case), which can pose issues for comparison. Further, when dealing with filesystem storage, we may encounter hostnames with non-ASCII characters in them, which may pose challenges depending on the filesystem. This change introduces casing normalization of hostnames in the storage objects to ease comparisons, and normalization/encoding/decoding in the file storage backend to handle encoding and representation differences. The base storage objects that deal with hostnames now keep a version of the hostname normalized for comparison purposes. This is a Unicode string that can resolve to a hostname, but with casing converted to lowercase. This eases comparison and gives a consistent representation of these hostnames. The file storage backend handles its own normalization and translation behavior when computing filenames for a given hostname. Encoding involves removing any trailing period on the hostname and then then converting to an IDNA 2008 representation to handle Unicode characters. The result is an ASCII filename safe for all filesystems. Decoding does the inverse of this. IDNA handling depends on the `idna` library, which is a new dependency added to Review Board 8. This supports IDNA 2008 standards with UTS46 normalization, which amongst other things handles casing differences. Note that the standard `Certificate`, `CertificateFingerprints`, etc. objects do not normalize hostnames. They are a representation of their source. Whether that source is caller-supplied input, an X.509 certificate, or a storage object, it will reflect the version of the hostname on there. That allows for creating an object that can represent a piece of state that can then be introspected or validated, which we do today.	7d5a227b55167c5f78acd8999437877bc1539693
	Normalize, encode, and decode cert hostnames for storage. When a certificate represents a hostname, or a client requests one, that hostname may be presented in any casing (uppercase, lowercase, mixed case), which can pose issues for comparison. Further, when dealing with filesystem storage, we may encounter hostnames with non-ASCII characters in them, which may pose challenges depending on the filesystem. This change introduces casing normalization of hostnames in the storage objects to ease comparisons, and normalization/encoding/decoding in the file storage backend to handle encoding and representation differences. The base storage objects that deal with hostnames now keep a version of the hostname normalized for comparison purposes. This is a Unicode string that can resolve to a hostname, but with casing converted to lowercase. This eases comparison and gives a consistent representation of these hostnames. The file storage backend handles its own normalization and translation behavior when computing filenames for a given hostname. Encoding involves removing any trailing period on the hostname and then then converting to an IDNA 2008 representation to handle Unicode characters. The result is an ASCII filename safe for all filesystems. Decoding does the inverse of this. IDNA handling depends on the `idna` library, which is a new dependency added to Review Board 8. This supports IDNA 2008 standards with UTS46 normalization, which amongst other things handles casing differences. Note that the standard `Certificate`, `CertificateFingerprints`, etc. objects do not normalize hostnames. They are a representation of their source. Whether that source is caller-supplied input, an X.509 certificate, or a storage object, it will reflect the version of the hostname on there. That allows for creating an object that can represent a piece of state that can then be introspected or validated, which we do today.	a3938a0688f72a0326a9e502725faccbae72b81c

Diff:

Revision 4 (+2586 -206)

Show changes

	reviewboard/dependencies.py
	reviewboard/certs/storage/base.py
	reviewboard/certs/storage/file_storage.py
	reviewboard/certs/tests/test_file_storage_backend.py
	reviewboard/certs/tests/testdata/file_storage/certs/client/__.eng.example.xn--cm-fka__443.crt
	reviewboard/certs/tests/testdata/file_storage/certs/client/__.eng.example.xn--cm-fka__443.key
	reviewboard/certs/tests/testdata/file_storage/certs/client/xn-----xample-9ua.com__443.crt
	reviewboard/certs/tests/testdata/file_storage/certs/trust/__.eng.example.xn--cm-fka__443.crt
	3 more

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

Ship it!

reviewboard/certs/storage/file_storage.py (Diff revisions 2 - 4)
The issue has been resolved. Show all issues
```
We don't return here anymore, we raise.
```

reviewboard/certs/storage/file_storage.py (Diff revisions 2 - 4)

The issue has been resolved. Show all issues

This is a little confusing. How about "The hostname {hostname} contains invalid characters and cannot be stored"?

reviewboard/certs/storage/file_storage.py (Diff revisions 2 - 4)

The issue has been resolved. Show all issues

Given that this isn't really a fatal error, we should probably use warning instead of error

Change Summary:

Improved logging, errors, and comments.

Commits:

	Summary	ID
	Normalize, encode, and decode cert hostnames for storage. When a certificate represents a hostname, or a client requests one, that hostname may be presented in any casing (uppercase, lowercase, mixed case), which can pose issues for comparison. Further, when dealing with filesystem storage, we may encounter hostnames with non-ASCII characters in them, which may pose challenges depending on the filesystem. This change introduces casing normalization of hostnames in the storage objects to ease comparisons, and normalization/encoding/decoding in the file storage backend to handle encoding and representation differences. The base storage objects that deal with hostnames now keep a version of the hostname normalized for comparison purposes. This is a Unicode string that can resolve to a hostname, but with casing converted to lowercase. This eases comparison and gives a consistent representation of these hostnames. The file storage backend handles its own normalization and translation behavior when computing filenames for a given hostname. Encoding involves removing any trailing period on the hostname and then then converting to an IDNA 2008 representation to handle Unicode characters. The result is an ASCII filename safe for all filesystems. Decoding does the inverse of this. IDNA handling depends on the `idna` library, which is a new dependency added to Review Board 8. This supports IDNA 2008 standards with UTS46 normalization, which amongst other things handles casing differences. Note that the standard `Certificate`, `CertificateFingerprints`, etc. objects do not normalize hostnames. They are a representation of their source. Whether that source is caller-supplied input, an X.509 certificate, or a storage object, it will reflect the version of the hostname on there. That allows for creating an object that can represent a piece of state that can then be introspected or validated, which we do today.	a3938a0688f72a0326a9e502725faccbae72b81c
	Normalize, encode, and decode cert hostnames for storage. When a certificate represents a hostname, or a client requests one, that hostname may be presented in any casing (uppercase, lowercase, mixed case), which can pose issues for comparison. Further, when dealing with filesystem storage, we may encounter hostnames with non-ASCII characters in them, which may pose challenges depending on the filesystem. This change introduces casing normalization of hostnames in the storage objects to ease comparisons, and normalization/encoding/decoding in the file storage backend to handle encoding and representation differences. The base storage objects that deal with hostnames now keep a version of the hostname normalized for comparison purposes. This is a Unicode string that can resolve to a hostname, but with casing converted to lowercase. This eases comparison and gives a consistent representation of these hostnames. The file storage backend handles its own normalization and translation behavior when computing filenames for a given hostname. Encoding involves removing any trailing period on the hostname and then then converting to an IDNA 2008 representation to handle Unicode characters. The result is an ASCII filename safe for all filesystems. Decoding does the inverse of this. IDNA handling depends on the `idna` library, which is a new dependency added to Review Board 8. This supports IDNA 2008 standards with UTS46 normalization, which amongst other things handles casing differences. Note that the standard `Certificate`, `CertificateFingerprints`, etc. objects do not normalize hostnames. They are a representation of their source. Whether that source is caller-supplied input, an X.509 certificate, or a storage object, it will reflect the version of the hostname on there. That allows for creating an object that can represent a piece of state that can then be introspected or validated, which we do today.	7787cc93803afad7474d670b33238f41ef1aa460

Diff:

Revision 5 (+2584 -206)

Show changes

	reviewboard/dependencies.py
	reviewboard/certs/storage/base.py
	reviewboard/certs/storage/file_storage.py
	reviewboard/certs/tests/test_file_storage_backend.py
	reviewboard/certs/tests/testdata/file_storage/certs/client/__.eng.example.xn--cm-fka__443.crt
	reviewboard/certs/tests/testdata/file_storage/certs/client/__.eng.example.xn--cm-fka__443.key
	reviewboard/certs/tests/testdata/file_storage/certs/client/xn-----xample-9ua.com__443.crt
	reviewboard/certs/tests/testdata/file_storage/certs/trust/__.eng.example.xn--cm-fka__443.crt
	3 more

Checks run (1 failed, 1 succeeded)

flake8 failed.

JSHint passed.

flake8

reviewboard/certs/storage/file_storage.py (Diff revision 5)

The issue has been resolved. Show all issues

SyntaxError: unterminated string literal (detected at line 2223)

Column: 22
Error code: E999

Status:: Completed
Change Summary:: Pushed to release-8.x (8983438)