Add support in the Trojan Source Checker to disable confusable checks.

Review Request #12784 — Created Jan. 11, 2023 and submitted — Latest diff uploaded

Information

Review Board
release-5.0.x

Reviewers

This updates the Trojan Source code safety checker to take arguments to
turn off Unicode confusable checks as a whole, or to disable them for
specific Unicode aliases.

To allow this to work, the build-confusables.py script now provides a
mapping of confusable alias names to numeric indexes, and a reverse of
that map. The number indexes are then referenced in the confusable
character map. The checker can then use this to quickly determine which
alias a particular character is part of, and to then exclude that if
provided in the call.

There's no user-level customization at this stage. This is purely an
update to the safety checker code. Additional work is still needed to
allow customization of these settings and to provide it at all the right
stages of diff generation.

There's also a fix that was encountered when posting this change, which
caused a crash when Python failed to identify the name of a Unicode
character.

Unit tests pass on all supported versions of Python.

Tested this along with in-progress changes to allow customization of
which confusables are checked.

Diff Revision 1

This is not the most recent revision of the diff. The latest diff is revision 2. See what's changed.

orig
1
2

Commits

First Last Summary ID Author
Add support in the Trojan Source Checker to disable confusable checks.
This updates the Trojan Source code safety checker to take arguments to turn off Unicode confusable checks as a whole, or to disable them for specific Unicode aliases. To allow this to work, the `build-confusables.py` script now provides a mapping of confusable alias names to numeric indexes, and a reverse of that map. The number indexes are then referenced in the confusable character map. The checker can then use this to quickly determine which alias a particular character is part of, and to then exclude that if provided in the call. There's no user-level customization at this stage. This is purely an update to the safety checker code. Additional work is still needed to allow customization of these settings and to provide it at all the right stages of diff generation. There's also a fix that was encountered when posting this change, which caused a crash when Python failed to identify the name of a Unicode character.
227f12e5d0240146a100ea7722aebf72586bea87 Christian Hammond
contrib/internal/build-confusables.py
reviewboard/codesafety/_unicode_confusables.py
reviewboard/codesafety/checkers/trojan_source.py
reviewboard/codesafety/tests/test_trojan_source_code_safety_checker.py
Loading...