Add support in the Trojan Source Checker to disable confusable checks.

Review Request #12784 — Created Jan. 11, 2023 and submitted

Information

Review Board
release-5.0.x

Reviewers

This updates the Trojan Source code safety checker to take arguments to
turn off Unicode confusable checks as a whole, or to disable them for
specific Unicode aliases.

To allow this to work, the build-confusables.py script now provides a
mapping of confusable alias names to numeric indexes, and a reverse of
that map. The number indexes are then referenced in the confusable
character map. The checker can then use this to quickly determine which
alias a particular character is part of, and to then exclude that if
provided in the call.

There's no user-level customization at this stage. This is purely an
update to the safety checker code. Additional work is still needed to
allow customization of these settings and to provide it at all the right
stages of diff generation.

There's also a fix that was encountered when posting this change, which
caused a crash when Python failed to identify the name of a Unicode
character.

Unit tests pass on all supported versions of Python.

Tested this along with in-progress changes to allow customization of
which confusables are checked.

Summary ID
Add support in the Trojan Source Checker to disable confusable checks.
This updates the Trojan Source code safety checker to take arguments to turn off Unicode confusable checks as a whole, or to disable them for specific Unicode aliases. To allow this to work, the `build-confusables.py` script now provides a mapping of confusable alias names to numeric indexes, and a reverse of that map. The number indexes are then referenced in the confusable character map. The checker can then use this to quickly determine which alias a particular character is part of, and to then exclude that if provided in the call. There's no user-level customization at this stage. This is purely an update to the safety checker code. Additional work is still needed to allow customization of these settings and to provide it at all the right stages of diff generation. There's also a fix that was encountered when posting this change, which caused a crash when Python failed to identify the name of a Unicode character.
58190502fe5bbf9b0ac27c58a298e99f948923b6
Description From Last Updated

Looks like there's a bug displaying the diff for reviewboard/codesafety/_unicode_confusables.py

daviddavid

< 3.9

daviddavid

Need to add check_confusables and confusable_aliases_allowed to the args section.

maubinmaubin

Should probably add check_confusables and confusable_aliases_allowed to the args section here too.

maubinmaubin
david
  1. 
      
  2. Show all issues

    Looks like there's a bug displaying the diff for reviewboard/codesafety/_unicode_confusables.py

    1. Yep, I mentioned that in the description. The fix is included in this change.

      It's just generated output. Nothing that really needs to be reviewed itself.

  3. contrib/internal/build-confusables.py (Diff revision 1)
     
     
    Show all issues

    < 3.9

  4. 
      
maubin
  1. 
      
  2. reviewboard/codesafety/checkers/trojan_source.py (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
    Show all issues

    Need to add check_confusables and confusable_aliases_allowed to the args section.

  3. Show all issues

    Should probably add check_confusables and confusable_aliases_allowed to the args section here too.

  4. 
      
chipx86
maubin
  1. Ship It!
  2. 
      
david
  1. Ship It!
  2. 
      
chipx86
Review request changed
Status:
Completed
Change Summary:
Pushed to release-5.0.x (ed18f57)