• 
      

    Tree Sitter: Add update-language-info script and resulting data.

    Review Request #14514 — Created July 26, 2025 and submitted

    Information

    Review Board
    master

    Reviewers

    This change adds a script that loads information about supported
    languages from a tree-sitter-language-pack checkout. This creates a
    file, reviewboard/treesitter/_languages.py, which has two mappings,
    MIME_TYPE_TO_LANGUAGE and FILE_SUFFIX_TO_LANGUAGES, which are used
    for determining the tree sitter language name to use for a given file.

    The compiled grammars that are shipped in tree-sitter-language-pack
    don't include any of the grammar metadata, so this script will go into a
    a checkout and grab the filename suffixes. This also fetches the
    "first_line_regex" key, although that is not currently used for
    detection.

    Ran script to create the _languages.py file.

    Summary ID
    Tree Sitter: Add update-language-info script and resulting data.
    This change adds a script that loads information about supported languages from a tree-sitter-language-pack checkout. This creates a file, `reviewboard/treesitter/_languages.py`, which has two mappings, `MIME_TYPE_TO_LANGUAGE` and `FILE_SUFFIX_TO_LANGUAGES`, which are used for determining the tree sitter language name to use for a given file. The compiled grammars that are shipped in tree-sitter-language-pack don't include any of the grammar metadata, so this script will go into a a checkout and grab the filename suffixes. This also fetches the "first_line_regex" key, although that is not currently used for detection. Testing Done: Ran script to create the _languages.py file.
    nnsksqorxrnnmnuxklvrymslmwqmpxxt
    Description From Last Updated

    continuation line over-indented for visual indent Column: 10 Error code: E127

    reviewbotreviewbot

    continuation line under-indented for visual indent Column: 33 Error code: E128

    reviewbotreviewbot

    line too long (83 > 79 characters) Column: 80 Error code: E501

    reviewbotreviewbot

    This and the rest of the variables/classes/functions in this file are missing Version Addeds in their docstrings.

    maubinmaubin

    We should show what the invalid file_suffixes data is in this warning.

    maubinmaubin

    Same comment here to show what the invalid first_line_regex is.

    maubinmaubin

    Should we log some warning/debug statement when we encounter OSErrors in this function?

    maubinmaubin

    line too long (83 > 79 characters) Column: 80 Error code: E501

    reviewbotreviewbot

    line too long (83 > 79 characters) Column: 80 Error code: E501

    reviewbotreviewbot
    Checks run (1 failed, 1 succeeded)
    flake8 failed.
    JSHint passed.

    flake8

    david
    Review request changed
    Commits:
    Summary ID
    Tree Sitter: Add update-language-info script and resulting data.
    This change adds a script that loads information about supported languages from a tree-sitter-language-pack checkout. This creates a file, `reviewboard/treesitter/_languages.py`, which has two mappings, `MIME_TYPE_TO_LANGUAGE` and `FILE_SUFFIX_TO_LANGUAGES`, which are used for determining the tree sitter language name to use for a given file. The compiled grammars that are shipped in tree-sitter-language-pack don't include any of the grammar metadata, so this script will go into a a checkout and grab the filename suffixes. This also fetches the "first_line_regex" key, although that is not currently used for detection. Testing Done: Ran script to create the _languages.py file.
    43fd40d0beb3c46b3edf86d5aa7a87e8cdc58486
    Tree Sitter: Add update-language-info script and resulting data.
    This change adds a script that loads information about supported languages from a tree-sitter-language-pack checkout. This creates a file, `reviewboard/treesitter/_languages.py`, which has two mappings, `MIME_TYPE_TO_LANGUAGE` and `FILE_SUFFIX_TO_LANGUAGES`, which are used for determining the tree sitter language name to use for a given file. The compiled grammars that are shipped in tree-sitter-language-pack don't include any of the grammar metadata, so this script will go into a a checkout and grab the filename suffixes. This also fetches the "first_line_regex" key, although that is not currently used for detection. Testing Done: Ran script to create the _languages.py file.
    d79179711d6e9d6845a3bbf7b28b15c7e5b2ef56

    Checks run (1 failed, 1 succeeded)

    flake8 failed.
    JSHint passed.

    flake8

    maubin
    1. 
        
    2. Show all issues

      This and the rest of the variables/classes/functions in this file are missing Version Addeds in their docstrings.

    3. Show all issues

      We should show what the invalid file_suffixes data is in this warning.

    4. Show all issues

      Same comment here to show what the invalid first_line_regex is.

    5. Show all issues

      Should we log some warning/debug statement when we encounter OSErrors in this function?

      1. No, it's expected that one or both of these will fail. I'll add comments.

    6. 
        
    david
    Review request changed
    Commits:
    Summary ID
    Tree Sitter: Add update-language-info script and resulting data.
    This change adds a script that loads information about supported languages from a tree-sitter-language-pack checkout. This creates a file, `reviewboard/treesitter/_languages.py`, which has two mappings, `MIME_TYPE_TO_LANGUAGE` and `FILE_SUFFIX_TO_LANGUAGES`, which are used for determining the tree sitter language name to use for a given file. The compiled grammars that are shipped in tree-sitter-language-pack don't include any of the grammar metadata, so this script will go into a a checkout and grab the filename suffixes. This also fetches the "first_line_regex" key, although that is not currently used for detection. Testing Done: Ran script to create the _languages.py file.
    d79179711d6e9d6845a3bbf7b28b15c7e5b2ef56
    Tree Sitter: Add update-language-info script and resulting data.
    This change adds a script that loads information about supported languages from a tree-sitter-language-pack checkout. This creates a file, `reviewboard/treesitter/_languages.py`, which has two mappings, `MIME_TYPE_TO_LANGUAGE` and `FILE_SUFFIX_TO_LANGUAGES`, which are used for determining the tree sitter language name to use for a given file. The compiled grammars that are shipped in tree-sitter-language-pack don't include any of the grammar metadata, so this script will go into a a checkout and grab the filename suffixes. This also fetches the "first_line_regex" key, although that is not currently used for detection. Testing Done: Ran script to create the _languages.py file.
    nnsksqorxrnnmnuxklvrymslmwqmpxxt

    Checks run (1 failed, 1 succeeded)

    flake8 failed.
    JSHint passed.

    flake8

    maubin
    1. Ship It!
    2. 
        
    david
    Review request changed
    Status:
    Completed
    Change Summary:
    Pushed to master (13610bb)