Add a code safety checker for some trojan source code attacks.
Review Request #11906 — Created Jan. 4, 2022 and submitted
This code safety checker looks for zero-width spaces and bi-directional
text in lines of code, flagging them when they appear and putting a
banner at the top of the diff.Certain bi-directional Unicode characters can be used together to make
malicious code display one way and execute another way. For example,
code can appear to be inside of a comment, but instead be inside of a
string, opening up opportunities to circumvent access control checks or
other logic. This is CVE-2021-42574.Similarly, zero-width spaces can make code appear one way and execute
another way. Languages that allow for Unicode characters in function
names or variable names may accept zero-width spaces as a legitimate
character in a name. To reviewers, an identifier with a zero-width space
would appear the same as an identifier without one. This can cause, for
instance, state checks to be circumvented.If either issue is found in code, the file alert template will show a
section for the vulnerability with examples, so reviewers know the kind
of risks that could be hiding in the code. They're also given a link to
the CVE.This does not currently address CVE-2021-42694, which enables attacks
similar to the zero-width space attach, but through homoglyphs
(separate Unicode characters that resemble another character, like an
"H"). That will be tackled separately.Note that there are legitimate situations in which these characters may
appear. Depending on user feedback, we may want to offer options for
disabling these checks. At the moment, we're following what other tools
are doing and unconditionally checking the code.
Unit tests pass on Python 2 and 3.
Tested with a wide collection of trojan source files available at
https://github.com/nickboucher/trojan-source/
Summary | ID |
---|---|
dbaa4d5f211b92091b1c1fe0692d579199e8eb39 |
Description | From | Last Updated |
---|---|---|
E501 line too long (80 > 79 characters) |
reviewbot | |
E501 line too long (80 > 79 characters) |
reviewbot | |
F401 're' imported but unused |
reviewbot | |
F841 local variable 'checks_map' is assigned to but never used |
reviewbot | |
F841 local variable 'checks_map' is assigned to but never used |
reviewbot | |
Shouldn't this be switched to SafeString as well? |
maubin | |
Does this need to be back-ticked? Also should add a type (list of str) |
maubin |
- Change Summary:
-
Fixed some line length issues.
- Commits:
-
Summary ID bfaefbc2acee4ef965e6dd5aef8ce54430e515e8 11a4139d174c7bb75ded5a2fcf523bd76de41c08
Checks run (2 succeeded)
- Change Summary:
-
- Reworked the Unicode scanning to use a more flexible loop for iteration and line changing, in preparation for confusables support.
- Shortened the result IDs, to reduce cache space.
- Removed the hard-coded Unicode titles, in favor of
unicodedata.name()
. - Reordered some unit test functions.
- Commits:
-
Summary ID 11a4139d174c7bb75ded5a2fcf523bd76de41c08 5105919086bcf387da8c551632147b954d92a44d
- Change Summary:
-
- Removed unused variables and imports.
- Switched the variable used to determine how many possible warnings can be found, for short-circuiting.
- Commits:
-
Summary ID 5105919086bcf387da8c551632147b954d92a44d 1a4c67b6e5973ea3c6bd6d1bc186c17ec3b376f9
Checks run (2 succeeded)
- Change Summary:
-
Updated for Review Board 5.0:
- Removed
__future__
imports - Removed
six
usage - Changed version numbers in docstrings
- Changed
unicode
tostr
in docstrings - Changed
SafeText
toSafeString
. - Changed
ugettext_lazy
togettext_lazy
.
- Removed
- Commits:
-
Summary ID 1a4c67b6e5973ea3c6bd6d1bc186c17ec3b376f9 9fd5dc0f0360454526acb686efaa16733e3793c3 - Branch:
-
release-4.0.xrelease-5.0.x
Checks run (2 succeeded)
- Change Summary:
-
Changed
SafeText
toSafeString
. - Commits:
-
Summary ID 9fd5dc0f0360454526acb686efaa16733e3793c3 2349296d22a4d829aa07a4a5998985a0e80716ff