Add a code safety checker for some trojan source code attacks.

Review Request #11906 — Created Jan. 4, 2022 and submitted

chipx86
Review Board
release-5.0.x
11904, 11905
reviewboard

This code safety checker looks for zero-width spaces and bi-directional
text in lines of code, flagging them when they appear and putting a
banner at the top of the diff.

Certain bi-directional Unicode characters can be used together to make
malicious code display one way and execute another way. For example,
code can appear to be inside of a comment, but instead be inside of a
string, opening up opportunities to circumvent access control checks or
other logic. This is CVE-2021-42574.

Similarly, zero-width spaces can make code appear one way and execute
another way. Languages that allow for Unicode characters in function
names or variable names may accept zero-width spaces as a legitimate
character in a name. To reviewers, an identifier with a zero-width space
would appear the same as an identifier without one. This can cause, for
instance, state checks to be circumvented.

If either issue is found in code, the file alert template will show a
section for the vulnerability with examples, so reviewers know the kind
of risks that could be hiding in the code. They're also given a link to
the CVE.

This does not currently address CVE-2021-42694, which enables attacks
similar to the zero-width space attach, but through homoglyphs
(separate Unicode characters that resemble another character, like an
"H"). That will be tackled separately.

Note that there are legitimate situations in which these characters may
appear. Depending on user feedback, we may want to offer options for
disabling these checks. At the moment, we're following what other tools
are doing and unconditionally checking the code.

Unit tests pass on Python 2 and 3.

Tested with a wide collection of trojan source files available at
https://github.com/nickboucher/trojan-source/

Summary
Add a code safety checker for some trojan source code attacks.

Description From Last Updated

E501 line too long (80 > 79 characters)

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

F401 're' imported but unused

reviewbotreviewbot

F841 local variable 'checks_map' is assigned to but never used

reviewbotreviewbot

F841 local variable 'checks_map' is assigned to but never used

reviewbotreviewbot

Shouldn't this be switched to SafeString as well?

maubinmaubin

Does this need to be back-ticked? Also should add a type (list of str)

maubinmaubin
Checks run (1 failed, 1 succeeded)
flake8 failed.
JSHint passed.

flake8

chipx86
chipx86
chipx86
Review request changed

Change Summary:

  • Reworked the Unicode scanning to use a more flexible loop for iteration and line changing, in preparation for confusables support.
  • Shortened the result IDs, to reduce cache space.
  • Removed the hard-coded Unicode titles, in favor of unicodedata.name().
  • Reordered some unit test functions.

Commits:

Summary
-
Add a code safety checker for some trojan source code attacks.
+
Add a code safety checker for some trojan source code attacks.

Diff:

Revision 3 (+1034 -2)

Show changes

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

chipx86
david
  1. Ship It!
  2. 
      
chipx86
david
  1. Ship It!
  2. 
      
maubin
  1. Ship It!

  2. 
      
maubin
  1. 
      
  2. Shouldn't this be switched to SafeString as well?

  3. 
      
chipx86
maubin
  1. 
      
  2. Does this need to be back-ticked? Also should add a type (list of str)

    1. It does in this form. I'll update it to use the newer Keys: section.

  3. 
      
chipx86
maubin
  1. Ship It!
  2. 
      
david
  1. Ship It!
  2. 
      
chipx86
Review request changed

Status: Closed (submitted)

Change Summary:

Pushed to release-5.0.x (3823d88)
Loading...