Tree Sitter: Add highlighting implementation.
Review Request #14522 — Created July 28, 2025 and updated
This change adds the main implementation for syntax highlighting using
Tree Sitter. The top-levelhighlight
function takes in the file
content both as a utf8-encoded bytestring and a decoded list of lines,
the parsed tree sitterTree
object, and the language name to highlight
for.The basic procedure for highlighting involves several steps:
- Get highlighted nodes for the parsed tree with the given language.
- Get injections for the parsed tree for the given language. If there
are any injections present, reparse the file with the injected
language and get highlighted nodes for those injections. This currently
only does one level of injections (so JS inside of HTML inside of
markdown would not be highlighted).- Process the highlighted nodes into a new list organized by line.
Prior to this step, nodes can span multiple lines (for example, an
entire Python docstring will result in onecomment.documentation
node). After, there would be separate nodes covering each line of the
docstring.- Turn highlighted nodes into a sequence of events. These events have a
starting position (within the line) and HTML opening or closing tags
to insert at that position.- Scan through each line character by character, applying the events
and escaping any HTML special characters, resulting in a list of HTML
for each line in the file.Unit tests include tests for the basic functionality of those steps, as
well as a set of sample files in common languages so we can compare
full-file highlighting results. These tests may require updating when we
update to new versions of tree-sitter-language-pack or update queries
files from nvim-treesitter or grammars. Therecompute-test-data.py
script will take the sample files and generate the expected output.This commit includes the source files for all the full-file highlighting
tests, but for reviewability does not include the*.expected
files
that they get compared to. Those are in a separate commit, since they're
not really suitable for review.
- Ran unit tests.
- Verified the appearance of syntax highlighting across a range of file
types, including ones with various injected languages.
Summary | ID |
---|---|
qxkmonknlypwrsrsulutpwoxlqvxtnrv |
Description | From | Last Updated |
---|---|---|
line too long (85 > 79 characters) Column: 80 Error code: E501 |
![]() |
|
'DEFAULT_TIMEOUT' is defined but never used. Column: 7 Error code: W098 |
![]() |
|
'os' imported but unused Column: 1 Error code: F401 |
![]() |
|
'DEFAULT_TIMEOUT' is defined but never used. Column: 7 Error code: W098 |
![]() |
|
'os' imported but unused Column: 1 Error code: F401 |
![]() |
|
'functools.lru_cache' imported but unused Column: 1 Error code: F401 |
![]() |
|
line break before binary operator Column: 5 Error code: W503 |
![]() |
|
'DEFAULT_TIMEOUT' is defined but never used. Column: 7 Error code: W098 |
![]() |
|
'os' imported but unused Column: 1 Error code: F401 |
![]() |
|
'DEFAULT_TIMEOUT' is defined but never used. Column: 7 Error code: W098 |
![]() |
|
'os' imported but unused Column: 1 Error code: F401 |
![]() |
|
'DEFAULT_TIMEOUT' is defined but never used. Column: 7 Error code: W098 |
![]() |
|
'os' imported but unused Column: 1 Error code: F401 |
![]() |
- Change Summary:
-
Use tuples for pytest parametrize arg names.
- Commits:
-
Summary ID 22677924f64bc345de4094d290b766e9ec0311fe aea443cd841183b369d76b54005cc3cf91e6d9fb - Diff:
-
Revision 2 (+7674)
- Change Summary:
-
- Update to use
reviewboard.treesitter.core
utilities. - Update some type hints for Python 3.10+
- Update to use
- Commits:
-
Summary ID aea443cd841183b369d76b54005cc3cf91e6d9fb qxkmonknlypwrsrsulutpwoxlqvxtnrv - Diff:
-
Revision 3 (+7582)
Checks run (2 failed)
flake8
JSHint
- Commits:
-
Summary ID qxkmonknlypwrsrsulutpwoxlqvxtnrv qxkmonknlypwrsrsulutpwoxlqvxtnrv - Diff:
-
Revision 4 (+7578)
- Commits:
-
Summary ID qxkmonknlypwrsrsulutpwoxlqvxtnrv qxkmonknlypwrsrsulutpwoxlqvxtnrv - Diff:
-
Revision 5 (+7578)