Fix get_markdown_element_tree() with bad escaping in e-mail addresses.

Review Request #11618 — Created May 25, 2021 and submitted

Information

Djblets
release-1.0.x

Reviewers

When Python Markdown renders an e-mail address, it escapes each
character in order to make it harder for addresses to be scraped. This
can generate some bad entities when adding backslash escaping at the end
of the e-mail address (like <test@example.com\\>), which browsers will
deal with fine, but Python's XML parser won't.

We already had some code in place to ensure that some entity names were
converted to entity codes, and some other code to convert invalid raw
character codes to a ?. This change combines the two, updating the
entity name conversion to also perform numeric entity value conversion
for any illegal characters.

Unit tests passed.

Verified that this fixed a reported issue on Review Board with these
bad e-mail addresses when diffing fields.

Tested this fix against Python 3 as well on release-2.0.x.

Summary ID
Fix get_markdown_element_tree() with bad escaping in e-mail addresses.
When Python Markdown renders an e-mail address, it escapes each character in order to make it harder for addresses to be scraped. This can generate some bad entities when adding backslash escaping at the end of the e-mail address (like `<test@example.com\\>`), which browsers will deal with fine, but Python's XML parser won't. We already had some code in place to ensure that some entity names were converted to entity codes, and some other code to convert invalid raw character codes to a `?`. This change combines the two, updating the entity name conversion to also perform numeric entity value conversion for any illegal characters.
aa219443d57c31831c287c4d771d9b47519a30b5
david
  1. Ship It!
  2. 
      
chipx86
Review request changed
Status:
Completed
Change Summary:
Pushed to release-1.0.x (e93e526)