• 
      

    Normalize all named entites when parsing Markdown HTML.

    Review Request #11059 — Created June 26, 2020 and submitted — Latest diff uploaded

    Information

    Djblets
    release-1.0.x

    Reviewers

    When attempting to parse HTML rendered by Python Markdown, we would
    crash any time an unknown named entity was encountered by
    xml.dom.minidom. This has been found to happen when Python Markdown
    processes an e-mail address that has a Unicode character in it that has
    a known named HTML entity. Optimistically, Python Markdown will use this
    instead of a character reference (&#...;), but this isn't known to the
    XML parser.

    Overriding the XML parser is fragile and error-prone, so it's not worth
    doing. Instead, we simply undo what Python Markdown does by applying a
    regex to convert all known named entities to character references.
    These are safe to feed to the XML parser, and gives us the resulting
    strings that we want.

    Unit tests pass.

    Commits

    Files