Add a code safety checker for some trojan source code attacks.

Review Request #11906 — Created Jan. 4, 2022 and submitted — Latest diff uploaded


Review Board


This code safety checker looks for zero-width spaces and bi-directional
text in lines of code, flagging them when they appear and putting a
banner at the top of the diff.

Certain bi-directional Unicode characters can be used together to make
malicious code display one way and execute another way. For example,
code can appear to be inside of a comment, but instead be inside of a
string, opening up opportunities to circumvent access control checks or
other logic. This is CVE-2021-42574.

Similarly, zero-width spaces can make code appear one way and execute
another way. Languages that allow for Unicode characters in function
names or variable names may accept zero-width spaces as a legitimate
character in a name. To reviewers, an identifier with a zero-width space
would appear the same as an identifier without one. This can cause, for
instance, state checks to be circumvented.

If either issue is found in code, the file alert template will show a
section for the vulnerability with examples, so reviewers know the kind
of risks that could be hiding in the code. They're also given a link to
the CVE.

This does not currently address CVE-2021-42694, which enables attacks
similar to the zero-width space attach, but through homoglyphs
(separate Unicode characters that resemble another character, like an
"H"). That will be tackled separately.

Note that there are legitimate situations in which these characters may
appear. Depending on user feedback, we may want to offer options for
disabling these checks. At the moment, we're following what other tools
are doing and unconditionally checking the code.

Unit tests pass on Python 2 and 3.

Tested with a wide collection of trojan source files available at