Add a code safety checker for some trojan source code attacks.

Review Request #11906 — Created Jan. 4, 2022 and submitted

Information

Review Board
release-5.0.x

Reviewers

This code safety checker looks for zero-width spaces and bi-directional
text in lines of code, flagging them when they appear and putting a
banner at the top of the diff.

Certain bi-directional Unicode characters can be used together to make
malicious code display one way and execute another way. For example,
code can appear to be inside of a comment, but instead be inside of a
string, opening up opportunities to circumvent access control checks or
other logic. This is CVE-2021-42574.

Similarly, zero-width spaces can make code appear one way and execute
another way. Languages that allow for Unicode characters in function
names or variable names may accept zero-width spaces as a legitimate
character in a name. To reviewers, an identifier with a zero-width space
would appear the same as an identifier without one. This can cause, for
instance, state checks to be circumvented.

If either issue is found in code, the file alert template will show a
section for the vulnerability with examples, so reviewers know the kind
of risks that could be hiding in the code. They're also given a link to
the CVE.

This does not currently address CVE-2021-42694, which enables attacks
similar to the zero-width space attach, but through homoglyphs
(separate Unicode characters that resemble another character, like an
"H"). That will be tackled separately.

Note that there are legitimate situations in which these characters may
appear. Depending on user feedback, we may want to offer options for
disabling these checks. At the moment, we're following what other tools
are doing and unconditionally checking the code.

Unit tests pass on Python 2 and 3.

Tested with a wide collection of trojan source files available at
https://github.com/nickboucher/trojan-source/

Summary ID
Add a code safety checker for some trojan source code attacks.
This code safety checker looks for zero-width spaces and bi-directional text in lines of code, flagging them when they appear and putting a banner at the top of the diff. Certain bi-directional Unicode characters can be used together to make malicious code display one way and execute another way. For example, code can appear to be inside of a comment, but instead be inside of a string, opening up opportunities to circumvent access control checks or other logic. This is CVE-2021-42574. Similarly, zero-width spaces can make code appear one way and execute another way. Languages that allow for Unicode characters in function names or variable names may accept zero-width spaces as a legitimate character in a name. To reviewers, an identifier with a zero-width space would appear the same as an identifier without one. This can cause, for instance, state checks to be circumvented. If either issue is found in code, the file alert template will show a section for the vulnerability with examples, so reviewers know the kind of risks that could be hiding in the code. They're also given a link to the CVE. This does not currently address CVE-2021-42694, which enables attacks similar to the zero-width space attach, but through homoglyphs (separate Unicode characters that resemble another character, like an "H"). That will be tackled separately. Note that there are legitimate situations in which these characters may appear. Depending on user feedback, we may want to offer options for disabling these checks. At the moment, we're following what other tools are doing and unconditionally checking the code.
dbaa4d5f211b92091b1c1fe0692d579199e8eb39

Description From Last Updated

E501 line too long (80 > 79 characters)

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

F401 're' imported but unused

reviewbotreviewbot

F841 local variable 'checks_map' is assigned to but never used

reviewbotreviewbot

F841 local variable 'checks_map' is assigned to but never used

reviewbotreviewbot

Shouldn't this be switched to SafeString as well?

maubinmaubin

Does this need to be back-ticked? Also should add a type (list of str)

maubinmaubin
Checks run (1 failed, 1 succeeded)
flake8 failed.
JSHint passed.

flake8

chipx86
chipx86
chipx86
Review request changed
Change Summary:
  • Reworked the Unicode scanning to use a more flexible loop for iteration and line changing, in preparation for confusables support.
  • Shortened the result IDs, to reduce cache space.
  • Removed the hard-coded Unicode titles, in favor of unicodedata.name().
  • Reordered some unit test functions.
Commits:
Summary ID
Add a code safety checker for some trojan source code attacks.
This code safety checker looks for zero-width spaces and bi-directional text in lines of code, flagging them when they appear and putting a banner at the top of the diff. Certain bi-directional Unicode characters can be used together to make malicious code display one way and execute another way. For example, code can appear to be inside of a comment, but instead be inside of a string, opening up opportunities to circumvent access control checks or other logic. This is CVE-2021-42574. Similarly, zero-width spaces can make code appear one way and execute another way. Languages that allow for Unicode characters in function names or variable names may accept zero-width spaces as a legitimate character in a name. To reviewers, an identifier with a zero-width space would appear the same as an identifier without one. This can cause, for instance, state checks to be circumvented. If either issue is found in code, the file alert template will show a section for the vulnerability with examples, so reviewers know the kind of risks that could be hiding in the code. They're also given a link to the CVE. This does not currently address CVE-2021-42694, which enables attacks similar to the zero-width space attach, but through homoglyphs (separate Unicode characters that resemble another character, like an "H"). That will be tackled separately. Note that there are legitimate situations in which these characters may appear. Depending on user feedback, we may want to offer options for disabling these checks. At the moment, we're following what other tools are doing and unconditionally checking the code.
11a4139d174c7bb75ded5a2fcf523bd76de41c08
Add a code safety checker for some trojan source code attacks.
This code safety checker looks for zero-width spaces and bi-directional text in lines of code, flagging them when they appear and putting a banner at the top of the diff. Certain bi-directional Unicode characters can be used together to make malicious code display one way and execute another way. For example, code can appear to be inside of a comment, but instead be inside of a string, opening up opportunities to circumvent access control checks or other logic. This is CVE-2021-42574. Similarly, zero-width spaces can make code appear one way and execute another way. Languages that allow for Unicode characters in function names or variable names may accept zero-width spaces as a legitimate character in a name. To reviewers, an identifier with a zero-width space would appear the same as an identifier without one. This can cause, for instance, state checks to be circumvented. If either issue is found in code, the file alert template will show a section for the vulnerability with examples, so reviewers know the kind of risks that could be hiding in the code. They're also given a link to the CVE. This does not currently address CVE-2021-42694, which enables attacks similar to the zero-width space attach, but through homoglyphs (separate Unicode characters that resemble another character, like an "H"). That will be tackled separately. Note that there are legitimate situations in which these characters may appear. Depending on user feedback, we may want to offer options for disabling these checks. At the moment, we're following what other tools are doing and unconditionally checking the code.
5105919086bcf387da8c551632147b954d92a44d

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

chipx86
david
  1. Ship It!
  2. 
      
chipx86
david
  1. Ship It!
  2. 
      
maubin
  1. Ship It!

  2. 
      
maubin
  1. 
      
  2. Show all issues

    Shouldn't this be switched to SafeString as well?

  3. 
      
chipx86
maubin
  1. 
      
  2. Show all issues

    Does this need to be back-ticked? Also should add a type (list of str)

    1. It does in this form. I'll update it to use the newer Keys: section.

  3. 
      
chipx86
maubin
  1. Ship It!
  2. 
      
david
  1. Ship It!
  2. 
      
chipx86
Review request changed
Status:
Completed
Change Summary:
Pushed to release-5.0.x (3823d88)