Summary

Tree Sitter: Add highlighting implementation.

Review Request #14522 — Created July 28, 2025 and submitted April 6, 2026, 10:25 a.m.

Information

Owner

david

Repository

Review Board

Branch

master

Bugs

Depends On

Reviewers

Groups

reviewboard

People

Description

This change adds the main implementation for syntax highlighting using
Tree Sitter. The top-level highlight function takes in the file
content both as a utf8-encoded bytestring and a decoded list of lines,
the parsed tree sitter Tree object, and the language name to highlight
for.

The basic procedure for highlighting involves several steps:

Get highlighted nodes for the parsed tree with the given language.
Get injections for the parsed tree for the given language. If there
are any injections present, reparse the file with the injected
language and get highlighted nodes for those injections. This currently
only does one level of injections (so JS inside of HTML inside of
markdown would not be highlighted).
Process the highlighted nodes into a new list organized by line.
Prior to this step, nodes can span multiple lines (for example, an
entire Python docstring will result in one comment.documentation
node). After, there would be separate nodes covering each line of the
docstring.
Turn highlighted nodes into a sequence of events. These events have a
starting position (within the line) and HTML opening or closing tags
to insert at that position.
Scan through each line character by character, applying the events
and escaping any HTML special characters, resulting in a list of HTML
for each line in the file.

Unit tests include tests for the basic functionality of those steps, as
well as a set of sample files in common languages so we can compare
full-file highlighting results. These tests may require updating when we
update to new versions of tree-sitter-language-pack or update queries
files from nvim-treesitter or grammars. The recompute-test-data.py
script will take the sample files and generate the expected output.

This commit includes the source files for all the full-file highlighting
tests, but for reviewability does not include the *.expected files
that they get compared to. Those are in a separate commit, since they're
not really suitable for review.

Testing Done


Ran unit tests.
Verified the appearance of syntax highlighting across a range of file

  types, including ones with various injected languages.

Commits

Summary	ID
Tree Sitter: Add highlighting implementation. This change adds the main implementation for syntax highlighting using Tree Sitter. The top-level `highlight` function takes in the file content both as a utf8-encoded bytestring and a decoded list of lines, the parsed tree sitter `Tree` object, and the language name to highlight for. The basic procedure for highlighting involves several steps: 1. Get highlighted nodes for the parsed tree with the given language. 2. Get injections for the parsed tree for the given language. If there are any injections present, reparse the file with the injected language and get highlighted nodes for those injections. This currently only does one level of injections (so JS inside of HTML inside of markdown would not be highlighted). 3. Process the highlighted nodes into a new list organized by line. Prior to this step, nodes can span multiple lines (for example, an entire Python docstring will result in one `comment.documentation` node). After, there would be separate nodes covering each line of the docstring. 4. Turn highlighted nodes into a sequence of events. These events have a starting position (within the line) and HTML opening or closing tags to insert at that position. 5. Scan through each line character by character, applying the events and escaping any HTML special characters, resulting in a list of HTML for each line in the file. Unit tests include tests for the basic functionality of those steps, as well as a set of sample files in common languages so we can compare full-file highlighting results. These tests may require updating when we update to new versions of tree-sitter-language-pack or update queries files from nvim-treesitter or grammars. The `recompute-test-data.py` script will take the sample files and generate the expected output. This commit includes the source files for all the full-file highlighting tests, but for reviewability does not include the `*.expected` files that they get compared to. Those will be in the next change. Testing Done: - Ran unit tests. - Verified the appearance of syntax highlighting across a range of file types, including ones with various injected languages.	qxkmonknlypwrsrsulutpwoxlqvxtnrv

Summary

Tree Sitter: Add highlighting implementation.

This change adds the main implementation for syntax highlighting using Tree Sitter. The top-level `highlight` function takes in the file content both as a utf8-encoded bytestring and a decoded list of lines, the parsed tree sitter `Tree` object, and the language name to highlight for. The basic procedure for highlighting involves several steps: 1. Get highlighted nodes for the parsed tree with the given language. 2. Get injections for the parsed tree for the given language. If there are any injections present, reparse the file with the injected language and get highlighted nodes for those injections. This currently only does one level of injections (so JS inside of HTML inside of markdown would not be highlighted). 3. Process the highlighted nodes into a new list organized by line. Prior to this step, nodes can span multiple lines (for example, an entire Python docstring will result in one `comment.documentation` node). After, there would be separate nodes covering each line of the docstring. 4. Turn highlighted nodes into a sequence of events. These events have a starting position (within the line) and HTML opening or closing tags to insert at that position. 5. Scan through each line character by character, applying the events and escaping any HTML special characters, resulting in a list of HTML for each line in the file. Unit tests include tests for the basic functionality of those steps, as well as a set of sample files in common languages so we can compare full-file highlighting results. These tests may require updating when we update to new versions of tree-sitter-language-pack or update queries files from nvim-treesitter or grammars. The `recompute-test-data.py` script will take the sample files and generate the expected output. This commit includes the source files for all the full-file highlighting tests, but for reviewability does not include the `*.expected` files that they get compared to. Those will be in the next change. Testing Done: - Ran unit tests. - Verified the appearance of syntax highlighting across a range of file types, including ones with various injected languages.

qxkmonknlypwrsrsulutpwoxlqvxtnrv

Issues

Description	From	Last Updated
A few of the test data files have trailing whitespaces, idk if we care about that.	maubin	April 2, 2026, 8:24 a.m.
line too long (85 > 79 characters) Column: 80 Error code: E501	reviewbot	July 28, 2025, 8:20 a.m.
'DEFAULT_TIMEOUT' is defined but never used. Column: 7 Error code: W098	reviewbot	July 28, 2025, 8:19 a.m.
'os' imported but unused Column: 1 Error code: F401	reviewbot	July 28, 2025, 8:20 a.m.
'DEFAULT_TIMEOUT' is defined but never used. Column: 7 Error code: W098	reviewbot	Aug. 1, 2025, 11:13 a.m.
'os' imported but unused Column: 1 Error code: F401	reviewbot	Aug. 1, 2025, 11:13 a.m.
'functools.lru_cache' imported but unused Column: 1 Error code: F401	reviewbot	Sept. 17, 2025, 3:18 p.m.
line break before binary operator Column: 5 Error code: W503	reviewbot	Sept. 17, 2025, 3:18 p.m.
'DEFAULT_TIMEOUT' is defined but never used. Column: 7 Error code: W098	reviewbot	Sept. 17, 2025, 3:18 p.m.
'os' imported but unused Column: 1 Error code: F401	reviewbot	Sept. 17, 2025, 3:18 p.m.
'DEFAULT_TIMEOUT' is defined but never used. Column: 7 Error code: W098	reviewbot	Sept. 17, 2025, 3:19 p.m.
'os' imported but unused Column: 1 Error code: F401	reviewbot	Sept. 17, 2025, 3:19 p.m.
'DEFAULT_TIMEOUT' is defined but never used. Column: 7 Error code: W098	reviewbot	Sept. 17, 2025, 4:23 p.m.
'os' imported but unused Column: 1 Error code: F401	reviewbot	Sept. 17, 2025, 3:20 p.m.
These are missing trailing periods.	maubin	April 2, 2026, 8:09 a.m.
Why do we sort by name length?	maubin	April 2, 2026, 8:14 a.m.
Should we log that this is an unsupported language?	maubin	April 2, 2026, 8:16 a.m.
Moreso a nit but you could use the NODE_NAME, NODE_START and NODE_END constants in the tests and get rid of …	maubin	April 2, 2026, 8:17 a.m.
Same here with EVENT_TAG and EVENT_POSITION.	maubin	April 2, 2026, 8:18 a.m.
Can you add an empty line between these.	maubin	April 2, 2026, 8:18 a.m.
'DEFAULT_TIMEOUT' is defined but never used. Column: 7 Error code: W098	reviewbot	March 12, 2026, 9:33 a.m.
'os' imported but unused Column: 1 Error code: F401	reviewbot	March 12, 2026, 9:33 a.m.
'DEFAULT_TIMEOUT' is defined but never used. Column: 7 Error code: W098	reviewbot	April 2, 2026, 8:28 a.m.
'os' imported but unused Column: 1 Error code: F401	reviewbot	April 2, 2026, 8:28 a.m.

flake8 failed.

JSHint failed.

flake8

reviewboard/treesitter/highlight.py (Diff revision 1)
The issue has been resolved. Show all issues
```
line too long (85 > 79 characters)

Column: 80
Error code: E501
```
reviewboard/treesitter/tests/testdata/sample.py (Diff revision 1)
The issue has been dropped. Show all issues
```
'os' imported but unused

Column: 1
Error code: F401
```

JSHint

reviewboard/treesitter/tests/testdata/sample.js (Diff revision 1)
The issue has been dropped. Show all issues
```
'DEFAULT_TIMEOUT' is defined but never used.

Column: 7
Error code: W098
```

Change Summary:

Use tuples for pytest parametrize arg names.

Commits:

	Summary	ID
	Tree Sitter: Add highlighting implementation. This change adds the main implementation for syntax highlighting using Tree Sitter. The top-level `highlight` function takes in the file content both as a utf8-encoded bytestring and a decoded list of lines, the parsed tree sitter `Tree` object, and the language name to highlight for. The basic procedure for highlighting involves several steps: 1. Get highlighted nodes for the parsed tree with the given language. 2. Get injections for the parsed tree for the given language. If there are any injections present, reparse the file with the injected language and get highlighted nodes for those injections. This currently only does one level of injections (so JS inside of HTML inside of markdown would not be highlighted). 3. Process the highlighted nodes into a new list organized by line. Prior to this step, nodes can span multiple lines (for example, an entire Python docstring will result in one `comment.documentation` node). After, there would be separate nodes covering each line of the docstring. 4. Turn highlighted nodes into a sequence of events. These events have a starting position (within the line) and HTML opening or closing tags to insert at that position. 5. Scan through each line character by character, applying the events and escaping any HTML special characters, resulting in a list of HTML for each line in the file. Unit tests include tests for the basic functionality of those steps, as well as a set of sample files in common languages so we can compare full-file highlighting results. These tests may require updating when we update to new versions of tree-sitter-language-pack or update queries files from nvim-treesitter or grammars. The `recompute-test-data.py` script will take the sample files and generate the expected output. This commit includes the source files for all the full-file highlighting tests, but for reviewability does not include the `*.expected` files that they get compared to. Those will be in the next change. Testing Done: - Ran unit tests. - Verified the appearance of syntax highlighting across a range of file types, including ones with various injected languages.	22677924f64bc345de4094d290b766e9ec0311fe
	Tree Sitter: Add highlighting implementation. This change adds the main implementation for syntax highlighting using Tree Sitter. The top-level `highlight` function takes in the file content both as a utf8-encoded bytestring and a decoded list of lines, the parsed tree sitter `Tree` object, and the language name to highlight for. The basic procedure for highlighting involves several steps: 1. Get highlighted nodes for the parsed tree with the given language. 2. Get injections for the parsed tree for the given language. If there are any injections present, reparse the file with the injected language and get highlighted nodes for those injections. This currently only does one level of injections (so JS inside of HTML inside of markdown would not be highlighted). 3. Process the highlighted nodes into a new list organized by line. Prior to this step, nodes can span multiple lines (for example, an entire Python docstring will result in one `comment.documentation` node). After, there would be separate nodes covering each line of the docstring. 4. Turn highlighted nodes into a sequence of events. These events have a starting position (within the line) and HTML opening or closing tags to insert at that position. 5. Scan through each line character by character, applying the events and escaping any HTML special characters, resulting in a list of HTML for each line in the file. Unit tests include tests for the basic functionality of those steps, as well as a set of sample files in common languages so we can compare full-file highlighting results. These tests may require updating when we update to new versions of tree-sitter-language-pack or update queries files from nvim-treesitter or grammars. The `recompute-test-data.py` script will take the sample files and generate the expected output. This commit includes the source files for all the full-file highlighting tests, but for reviewability does not include the `*.expected` files that they get compared to. Those will be in the next change. Testing Done: - Ran unit tests. - Verified the appearance of syntax highlighting across a range of file types, including ones with various injected languages.	aea443cd841183b369d76b54005cc3cf91e6d9fb

Diff:

Revision 2 (+7674)

Show changes

	contrib/internal/treesitter/recompute-test-data.py
	reviewboard/treesitter/highlight.py
	reviewboard/treesitter/tests/test_highlight.py
	reviewboard/treesitter/tests/testdata/sample.go
	reviewboard/treesitter/tests/testdata/sample.html
	reviewboard/treesitter/tests/testdata/sample.java
	reviewboard/treesitter/tests/testdata/sample.js
	reviewboard/treesitter/tests/testdata/sample.json
	9 more

Checks run (2 failed)

flake8 failed.

JSHint failed.

flake8

reviewboard/treesitter/tests/testdata/sample.py (Diff revision 2)
The issue has been dropped. Show all issues
```
'os' imported but unused

Column: 1
Error code: F401
```

JSHint

reviewboard/treesitter/tests/testdata/sample.js (Diff revision 2)
The issue has been dropped. Show all issues
```
'DEFAULT_TIMEOUT' is defined but never used.

Column: 7
Error code: W098
```

Change Summary:


Update to use reviewboard.treesitter.core utilities.
Update some type hints for Python 3.10+

Commits:

	Summary	ID
	Tree Sitter: Add highlighting implementation. This change adds the main implementation for syntax highlighting using Tree Sitter. The top-level `highlight` function takes in the file content both as a utf8-encoded bytestring and a decoded list of lines, the parsed tree sitter `Tree` object, and the language name to highlight for. The basic procedure for highlighting involves several steps: 1. Get highlighted nodes for the parsed tree with the given language. 2. Get injections for the parsed tree for the given language. If there are any injections present, reparse the file with the injected language and get highlighted nodes for those injections. This currently only does one level of injections (so JS inside of HTML inside of markdown would not be highlighted). 3. Process the highlighted nodes into a new list organized by line. Prior to this step, nodes can span multiple lines (for example, an entire Python docstring will result in one `comment.documentation` node). After, there would be separate nodes covering each line of the docstring. 4. Turn highlighted nodes into a sequence of events. These events have a starting position (within the line) and HTML opening or closing tags to insert at that position. 5. Scan through each line character by character, applying the events and escaping any HTML special characters, resulting in a list of HTML for each line in the file. Unit tests include tests for the basic functionality of those steps, as well as a set of sample files in common languages so we can compare full-file highlighting results. These tests may require updating when we update to new versions of tree-sitter-language-pack or update queries files from nvim-treesitter or grammars. The `recompute-test-data.py` script will take the sample files and generate the expected output. This commit includes the source files for all the full-file highlighting tests, but for reviewability does not include the `*.expected` files that they get compared to. Those will be in the next change. Testing Done: - Ran unit tests. - Verified the appearance of syntax highlighting across a range of file types, including ones with various injected languages.	aea443cd841183b369d76b54005cc3cf91e6d9fb
	Tree Sitter: Add highlighting implementation. This change adds the main implementation for syntax highlighting using Tree Sitter. The top-level `highlight` function takes in the file content both as a utf8-encoded bytestring and a decoded list of lines, the parsed tree sitter `Tree` object, and the language name to highlight for. The basic procedure for highlighting involves several steps: 1. Get highlighted nodes for the parsed tree with the given language. 2. Get injections for the parsed tree for the given language. If there are any injections present, reparse the file with the injected language and get highlighted nodes for those injections. This currently only does one level of injections (so JS inside of HTML inside of markdown would not be highlighted). 3. Process the highlighted nodes into a new list organized by line. Prior to this step, nodes can span multiple lines (for example, an entire Python docstring will result in one `comment.documentation` node). After, there would be separate nodes covering each line of the docstring. 4. Turn highlighted nodes into a sequence of events. These events have a starting position (within the line) and HTML opening or closing tags to insert at that position. 5. Scan through each line character by character, applying the events and escaping any HTML special characters, resulting in a list of HTML for each line in the file. Unit tests include tests for the basic functionality of those steps, as well as a set of sample files in common languages so we can compare full-file highlighting results. These tests may require updating when we update to new versions of tree-sitter-language-pack or update queries files from nvim-treesitter or grammars. The `recompute-test-data.py` script will take the sample files and generate the expected output. This commit includes the source files for all the full-file highlighting tests, but for reviewability does not include the `*.expected` files that they get compared to. Those will be in the next change. Testing Done: - Ran unit tests. - Verified the appearance of syntax highlighting across a range of file types, including ones with various injected languages.	qxkmonknlypwrsrsulutpwoxlqvxtnrv

Diff:

Revision 3 (+7582)

Show changes

	pyproject.toml
	contrib/internal/treesitter/recompute-test-data.py
	reviewboard/treesitter/highlight.py
	reviewboard/treesitter/tests/test_highlight.py
	reviewboard/treesitter/tests/testdata/sample.go
	reviewboard/treesitter/tests/testdata/sample.html
	reviewboard/treesitter/tests/testdata/sample.java
	reviewboard/treesitter/tests/testdata/sample.js
	10 more

Checks run (2 failed)

flake8 failed.

JSHint failed.

flake8

reviewboard/treesitter/highlight.py (Diff revision 3)
The issue has been dropped. Show all issues
```
'functools.lru_cache' imported but unused

Column: 1
Error code: F401
```
reviewboard/treesitter/highlight.py (Diff revision 3)
The issue has been dropped. Show all issues
```
line break before binary operator

Column: 5
Error code: W503
```
reviewboard/treesitter/tests/testdata/sample.py (Diff revision 3)
The issue has been dropped. Show all issues
```
'os' imported but unused

Column: 1
Error code: F401
```

JSHint

reviewboard/treesitter/tests/testdata/sample.js (Diff revision 3)
The issue has been dropped. Show all issues
```
'DEFAULT_TIMEOUT' is defined but never used.

Column: 7
Error code: W098
```

Commits:

	Summary	ID
	Tree Sitter: Add highlighting implementation. This change adds the main implementation for syntax highlighting using Tree Sitter. The top-level `highlight` function takes in the file content both as a utf8-encoded bytestring and a decoded list of lines, the parsed tree sitter `Tree` object, and the language name to highlight for. The basic procedure for highlighting involves several steps: 1. Get highlighted nodes for the parsed tree with the given language. 2. Get injections for the parsed tree for the given language. If there are any injections present, reparse the file with the injected language and get highlighted nodes for those injections. This currently only does one level of injections (so JS inside of HTML inside of markdown would not be highlighted). 3. Process the highlighted nodes into a new list organized by line. Prior to this step, nodes can span multiple lines (for example, an entire Python docstring will result in one `comment.documentation` node). After, there would be separate nodes covering each line of the docstring. 4. Turn highlighted nodes into a sequence of events. These events have a starting position (within the line) and HTML opening or closing tags to insert at that position. 5. Scan through each line character by character, applying the events and escaping any HTML special characters, resulting in a list of HTML for each line in the file. Unit tests include tests for the basic functionality of those steps, as well as a set of sample files in common languages so we can compare full-file highlighting results. These tests may require updating when we update to new versions of tree-sitter-language-pack or update queries files from nvim-treesitter or grammars. The `recompute-test-data.py` script will take the sample files and generate the expected output. This commit includes the source files for all the full-file highlighting tests, but for reviewability does not include the `*.expected` files that they get compared to. Those will be in the next change. Testing Done: - Ran unit tests. - Verified the appearance of syntax highlighting across a range of file types, including ones with various injected languages.	qxkmonknlypwrsrsulutpwoxlqvxtnrv
	Tree Sitter: Add highlighting implementation. This change adds the main implementation for syntax highlighting using Tree Sitter. The top-level `highlight` function takes in the file content both as a utf8-encoded bytestring and a decoded list of lines, the parsed tree sitter `Tree` object, and the language name to highlight for. The basic procedure for highlighting involves several steps: 1. Get highlighted nodes for the parsed tree with the given language. 2. Get injections for the parsed tree for the given language. If there are any injections present, reparse the file with the injected language and get highlighted nodes for those injections. This currently only does one level of injections (so JS inside of HTML inside of markdown would not be highlighted). 3. Process the highlighted nodes into a new list organized by line. Prior to this step, nodes can span multiple lines (for example, an entire Python docstring will result in one `comment.documentation` node). After, there would be separate nodes covering each line of the docstring. 4. Turn highlighted nodes into a sequence of events. These events have a starting position (within the line) and HTML opening or closing tags to insert at that position. 5. Scan through each line character by character, applying the events and escaping any HTML special characters, resulting in a list of HTML for each line in the file. Unit tests include tests for the basic functionality of those steps, as well as a set of sample files in common languages so we can compare full-file highlighting results. These tests may require updating when we update to new versions of tree-sitter-language-pack or update queries files from nvim-treesitter or grammars. The `recompute-test-data.py` script will take the sample files and generate the expected output. This commit includes the source files for all the full-file highlighting tests, but for reviewability does not include the `*.expected` files that they get compared to. Those will be in the next change. Testing Done: - Ran unit tests. - Verified the appearance of syntax highlighting across a range of file types, including ones with various injected languages.	qxkmonknlypwrsrsulutpwoxlqvxtnrv

Diff:

Revision 4 (+7578)

Show changes

	pyproject.toml
	setup.cfg
	contrib/internal/treesitter/recompute-test-data.py
	reviewboard/treesitter/highlight.py
	reviewboard/treesitter/tests/test_highlight.py
	reviewboard/treesitter/tests/testdata/sample.go
	reviewboard/treesitter/tests/testdata/sample.html
	reviewboard/treesitter/tests/testdata/sample.java
	11 more

Checks run (2 failed)

flake8 failed.

JSHint failed.

flake8

reviewboard/treesitter/tests/testdata/sample.py (Diff revision 4)
The issue has been dropped. Show all issues
```
'os' imported but unused

Column: 1
Error code: F401
```

JSHint

reviewboard/treesitter/tests/testdata/sample.js (Diff revision 4)
The issue has been dropped. Show all issues
```
'DEFAULT_TIMEOUT' is defined but never used.

Column: 7
Error code: W098
```

Commits:

	Summary	ID
	Tree Sitter: Add highlighting implementation. This change adds the main implementation for syntax highlighting using Tree Sitter. The top-level `highlight` function takes in the file content both as a utf8-encoded bytestring and a decoded list of lines, the parsed tree sitter `Tree` object, and the language name to highlight for. The basic procedure for highlighting involves several steps: 1. Get highlighted nodes for the parsed tree with the given language. 2. Get injections for the parsed tree for the given language. If there are any injections present, reparse the file with the injected language and get highlighted nodes for those injections. This currently only does one level of injections (so JS inside of HTML inside of markdown would not be highlighted). 3. Process the highlighted nodes into a new list organized by line. Prior to this step, nodes can span multiple lines (for example, an entire Python docstring will result in one `comment.documentation` node). After, there would be separate nodes covering each line of the docstring. 4. Turn highlighted nodes into a sequence of events. These events have a starting position (within the line) and HTML opening or closing tags to insert at that position. 5. Scan through each line character by character, applying the events and escaping any HTML special characters, resulting in a list of HTML for each line in the file. Unit tests include tests for the basic functionality of those steps, as well as a set of sample files in common languages so we can compare full-file highlighting results. These tests may require updating when we update to new versions of tree-sitter-language-pack or update queries files from nvim-treesitter or grammars. The `recompute-test-data.py` script will take the sample files and generate the expected output. This commit includes the source files for all the full-file highlighting tests, but for reviewability does not include the `*.expected` files that they get compared to. Those will be in the next change. Testing Done: - Ran unit tests. - Verified the appearance of syntax highlighting across a range of file types, including ones with various injected languages.	qxkmonknlypwrsrsulutpwoxlqvxtnrv
	Tree Sitter: Add highlighting implementation. This change adds the main implementation for syntax highlighting using Tree Sitter. The top-level `highlight` function takes in the file content both as a utf8-encoded bytestring and a decoded list of lines, the parsed tree sitter `Tree` object, and the language name to highlight for. The basic procedure for highlighting involves several steps: 1. Get highlighted nodes for the parsed tree with the given language. 2. Get injections for the parsed tree for the given language. If there are any injections present, reparse the file with the injected language and get highlighted nodes for those injections. This currently only does one level of injections (so JS inside of HTML inside of markdown would not be highlighted). 3. Process the highlighted nodes into a new list organized by line. Prior to this step, nodes can span multiple lines (for example, an entire Python docstring will result in one `comment.documentation` node). After, there would be separate nodes covering each line of the docstring. 4. Turn highlighted nodes into a sequence of events. These events have a starting position (within the line) and HTML opening or closing tags to insert at that position. 5. Scan through each line character by character, applying the events and escaping any HTML special characters, resulting in a list of HTML for each line in the file. Unit tests include tests for the basic functionality of those steps, as well as a set of sample files in common languages so we can compare full-file highlighting results. These tests may require updating when we update to new versions of tree-sitter-language-pack or update queries files from nvim-treesitter or grammars. The `recompute-test-data.py` script will take the sample files and generate the expected output. This commit includes the source files for all the full-file highlighting tests, but for reviewability does not include the `*.expected` files that they get compared to. Those will be in the next change. Testing Done: - Ran unit tests. - Verified the appearance of syntax highlighting across a range of file types, including ones with various injected languages.	qxkmonknlypwrsrsulutpwoxlqvxtnrv

Diff:

Revision 5 (+7578)

Show changes

	pyproject.toml
	setup.cfg
	contrib/internal/treesitter/recompute-test-data.py
	reviewboard/treesitter/highlight.py
	reviewboard/treesitter/tests/test_highlight.py
	reviewboard/treesitter/tests/testdata/sample.go
	reviewboard/treesitter/tests/testdata/sample.html
	reviewboard/treesitter/tests/testdata/sample.java
	11 more

Checks run (2 failed)

flake8 failed.

JSHint failed.

flake8

reviewboard/treesitter/tests/testdata/sample.py (Diff revision 5)
The issue has been dropped. Show all issues
```
'os' imported but unused

Column: 1
Error code: F401
```

JSHint

reviewboard/treesitter/tests/testdata/sample.js (Diff revision 5)
The issue has been dropped. Show all issues
```
'DEFAULT_TIMEOUT' is defined but never used.

Column: 7
Error code: W098
```

Change Summary:

Add to coderef.

Commits:

	Summary	ID
	Tree Sitter: Add highlighting implementation. This change adds the main implementation for syntax highlighting using Tree Sitter. The top-level `highlight` function takes in the file content both as a utf8-encoded bytestring and a decoded list of lines, the parsed tree sitter `Tree` object, and the language name to highlight for. The basic procedure for highlighting involves several steps: 1. Get highlighted nodes for the parsed tree with the given language. 2. Get injections for the parsed tree for the given language. If there are any injections present, reparse the file with the injected language and get highlighted nodes for those injections. This currently only does one level of injections (so JS inside of HTML inside of markdown would not be highlighted). 3. Process the highlighted nodes into a new list organized by line. Prior to this step, nodes can span multiple lines (for example, an entire Python docstring will result in one `comment.documentation` node). After, there would be separate nodes covering each line of the docstring. 4. Turn highlighted nodes into a sequence of events. These events have a starting position (within the line) and HTML opening or closing tags to insert at that position. 5. Scan through each line character by character, applying the events and escaping any HTML special characters, resulting in a list of HTML for each line in the file. Unit tests include tests for the basic functionality of those steps, as well as a set of sample files in common languages so we can compare full-file highlighting results. These tests may require updating when we update to new versions of tree-sitter-language-pack or update queries files from nvim-treesitter or grammars. The `recompute-test-data.py` script will take the sample files and generate the expected output. This commit includes the source files for all the full-file highlighting tests, but for reviewability does not include the `*.expected` files that they get compared to. Those will be in the next change. Testing Done: - Ran unit tests. - Verified the appearance of syntax highlighting across a range of file types, including ones with various injected languages.	qxkmonknlypwrsrsulutpwoxlqvxtnrv
	Tree Sitter: Add highlighting implementation. This change adds the main implementation for syntax highlighting using Tree Sitter. The top-level `highlight` function takes in the file content both as a utf8-encoded bytestring and a decoded list of lines, the parsed tree sitter `Tree` object, and the language name to highlight for. The basic procedure for highlighting involves several steps: 1. Get highlighted nodes for the parsed tree with the given language. 2. Get injections for the parsed tree for the given language. If there are any injections present, reparse the file with the injected language and get highlighted nodes for those injections. This currently only does one level of injections (so JS inside of HTML inside of markdown would not be highlighted). 3. Process the highlighted nodes into a new list organized by line. Prior to this step, nodes can span multiple lines (for example, an entire Python docstring will result in one `comment.documentation` node). After, there would be separate nodes covering each line of the docstring. 4. Turn highlighted nodes into a sequence of events. These events have a starting position (within the line) and HTML opening or closing tags to insert at that position. 5. Scan through each line character by character, applying the events and escaping any HTML special characters, resulting in a list of HTML for each line in the file. Unit tests include tests for the basic functionality of those steps, as well as a set of sample files in common languages so we can compare full-file highlighting results. These tests may require updating when we update to new versions of tree-sitter-language-pack or update queries files from nvim-treesitter or grammars. The `recompute-test-data.py` script will take the sample files and generate the expected output. This commit includes the source files for all the full-file highlighting tests, but for reviewability does not include the `*.expected` files that they get compared to. Those will be in the next change. Testing Done: - Ran unit tests. - Verified the appearance of syntax highlighting across a range of file types, including ones with various injected languages.	qxkmonknlypwrsrsulutpwoxlqvxtnrv

Diff:

Revision 6 (+7576)

Show changes

	pyproject.toml
	contrib/internal/treesitter/recompute-test-data.py
	docs/manual/extending/coderef/index.rst
	reviewboard/treesitter/highlight.py
	reviewboard/treesitter/tests/test_highlight.py
	reviewboard/treesitter/tests/testdata/sample.go
	reviewboard/treesitter/tests/testdata/sample.html
	reviewboard/treesitter/tests/testdata/sample.java
	11 more

Checks run (2 failed)

flake8 failed.

JSHint failed.

flake8

reviewboard/treesitter/tests/testdata/sample.py (Diff revision 6)
The issue has been dropped. Show all issues
```
'os' imported but unused

Column: 1
Error code: F401
```

JSHint

reviewboard/treesitter/tests/testdata/sample.js (Diff revision 6)
The issue has been dropped. Show all issues
```
'DEFAULT_TIMEOUT' is defined but never used.

Column: 7
Error code: W098
```

The issue has been resolved. Show all issues

A few of the test data files have trailing whitespaces, idk if we care about that.

reviewboard/treesitter/highlight.py (Diff revision 6)
The issue has been resolved. Show all issues
```
These are missing trailing periods.
```

reviewboard/treesitter/highlight.py (Diff revision 6)

Why do we do our own html escaping with str.translate() instead of using html.escape? I'm assuming for performance purposes?

david

April 2, 2026, 8:27 a.m.

Yeah. html.escape() calls str.replace() 5 times.

reviewboard/treesitter/highlight.py (Diff revision 6)

The issue has been resolved. Show all issues

Why do we sort by name length?

david

April 2, 2026, 8:27 a.m.

This ensures that more specific names (constant.builtin) take precendence over less specific (constant). I'll add a comment.

reviewboard/treesitter/highlight.py (Diff revision 6)
The issue has been resolved. Show all issues
```
Should we log that this is an unsupported language?
```

reviewboard/treesitter/tests/test_highlight.py (Diff revision 6)

The issue has been resolved. Show all issues

Moreso a nit but you could use the NODE_NAME, NODE_START and NODE_END constants in the tests and get rid of the trailing comments.

reviewboard/treesitter/tests/test_highlight.py (Diff revision 6)

The issue has been resolved. Show all issues

Same here with EVENT_TAG and EVENT_POSITION.

reviewboard/treesitter/tests/test_highlight.py (Diff revision 6)
The issue has been resolved. Show all issues
```
Can you add an empty line between these.
```

Commits:

	Summary	ID
	Tree Sitter: Add highlighting implementation. This change adds the main implementation for syntax highlighting using Tree Sitter. The top-level `highlight` function takes in the file content both as a utf8-encoded bytestring and a decoded list of lines, the parsed tree sitter `Tree` object, and the language name to highlight for. The basic procedure for highlighting involves several steps: 1. Get highlighted nodes for the parsed tree with the given language. 2. Get injections for the parsed tree for the given language. If there are any injections present, reparse the file with the injected language and get highlighted nodes for those injections. This currently only does one level of injections (so JS inside of HTML inside of markdown would not be highlighted). 3. Process the highlighted nodes into a new list organized by line. Prior to this step, nodes can span multiple lines (for example, an entire Python docstring will result in one `comment.documentation` node). After, there would be separate nodes covering each line of the docstring. 4. Turn highlighted nodes into a sequence of events. These events have a starting position (within the line) and HTML opening or closing tags to insert at that position. 5. Scan through each line character by character, applying the events and escaping any HTML special characters, resulting in a list of HTML for each line in the file. Unit tests include tests for the basic functionality of those steps, as well as a set of sample files in common languages so we can compare full-file highlighting results. These tests may require updating when we update to new versions of tree-sitter-language-pack or update queries files from nvim-treesitter or grammars. The `recompute-test-data.py` script will take the sample files and generate the expected output. This commit includes the source files for all the full-file highlighting tests, but for reviewability does not include the `*.expected` files that they get compared to. Those will be in the next change. Testing Done: - Ran unit tests. - Verified the appearance of syntax highlighting across a range of file types, including ones with various injected languages.	qxkmonknlypwrsrsulutpwoxlqvxtnrv
	Tree Sitter: Add highlighting implementation. This change adds the main implementation for syntax highlighting using Tree Sitter. The top-level `highlight` function takes in the file content both as a utf8-encoded bytestring and a decoded list of lines, the parsed tree sitter `Tree` object, and the language name to highlight for. The basic procedure for highlighting involves several steps: 1. Get highlighted nodes for the parsed tree with the given language. 2. Get injections for the parsed tree for the given language. If there are any injections present, reparse the file with the injected language and get highlighted nodes for those injections. This currently only does one level of injections (so JS inside of HTML inside of markdown would not be highlighted). 3. Process the highlighted nodes into a new list organized by line. Prior to this step, nodes can span multiple lines (for example, an entire Python docstring will result in one `comment.documentation` node). After, there would be separate nodes covering each line of the docstring. 4. Turn highlighted nodes into a sequence of events. These events have a starting position (within the line) and HTML opening or closing tags to insert at that position. 5. Scan through each line character by character, applying the events and escaping any HTML special characters, resulting in a list of HTML for each line in the file. Unit tests include tests for the basic functionality of those steps, as well as a set of sample files in common languages so we can compare full-file highlighting results. These tests may require updating when we update to new versions of tree-sitter-language-pack or update queries files from nvim-treesitter or grammars. The `recompute-test-data.py` script will take the sample files and generate the expected output. This commit includes the source files for all the full-file highlighting tests, but for reviewability does not include the `*.expected` files that they get compared to. Those will be in the next change. Testing Done: - Ran unit tests. - Verified the appearance of syntax highlighting across a range of file types, including ones with various injected languages.	qxkmonknlypwrsrsulutpwoxlqvxtnrv

Diff:

Revision 7 (+7608)

Show changes

	pyproject.toml
	contrib/internal/treesitter/recompute-test-data.py
	docs/manual/extending/coderef/index.rst
	reviewboard/treesitter/highlight.py
	reviewboard/treesitter/tests/test_highlight.py
	reviewboard/treesitter/tests/testdata/sample.go
	reviewboard/treesitter/tests/testdata/sample.html
	reviewboard/treesitter/tests/testdata/sample.java
	11 more

Checks run (2 failed)

flake8 failed.

JSHint failed.

flake8

reviewboard/treesitter/tests/testdata/sample.py (Diff revision 7)
The issue has been dropped. Show all issues
```
'os' imported but unused

Column: 1
Error code: F401
```

JSHint

reviewboard/treesitter/tests/testdata/sample.js (Diff revision 7)
The issue has been dropped. Show all issues
```
'DEFAULT_TIMEOUT' is defined but never used.

Column: 7
Error code: W098
```

Ship it!

```
Ship It!
```

Status:: Completed
Change Summary:: Pushed to master (4bbc90e)