Fix get_markdown_element_tree() with bad escaping in e-mail addresses.
Review Request #11618 — Created May 25, 2021 and submitted
When Python Markdown renders an e-mail address, it escapes each
character in order to make it harder for addresses to be scraped. This
can generate some bad entities when adding backslash escaping at the end
of the e-mail address (like
<email@example.com\\>), which browsers will
deal with fine, but Python's XML parser won't.
We already had some code in place to ensure that some entity names were
converted to entity codes, and some other code to convert invalid raw
character codes to a
?. This change combines the two, updating the
entity name conversion to also perform numeric entity value conversion
for any illegal characters.
Unit tests passed.
Verified that this fixed a reported issue on Review Board with these
bad e-mail addresses when diffing fields.
Tested this fix against Python 3 as well on