Skip to content

Python: Update tree-sitter dependency #19929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

tausbn
Copy link
Contributor

@tausbn tausbn commented Jun 30, 2025

Updates the Python extractor to depend on version 0.24.7 of tree-sitter (and 0.12.0 of tree-sitter-graph).

A few changes were needed in order to make the code build and run after updating the dependencies:

  • In main.rs, the Language parameter is now passed as a reference.
  • In python.tsg, many queries had captures that were not actually used in the body of the stanza. This is no longer allowed (unless the captures start with an underscore), as it may indicate an error. To fix this, I added underscores in the appropriate places (and verified that none of these unused captures were in fact bugs).

Finally, something in the newer version of tree-sitter/tree-sitter-graph made it so that we were no longer able to detect syntax errors correctly. We previously did this by matching (in tree-sitter-graph) against all nodes, and then checking if their node type was either ERROR or MISSING.

As this no longer works, we now instead traverse the tree in a separate pass (but only if errors are present) and record all nodes that correspond to syntax errors, and then manually add these to the graph output.

Finally, A new option --dump-errors was added to tsg-python to make it easy to dump a list of the syntax errors encountered.

Updates the Python extractor to depend on version 0.24.7 of tree-sitter
(and 0.12.0 of tree-sitter-graph).

A few changes were needed in order to make the code build and run after
updating the dependencies:

- In `main.rs`, the `Language` parameter is now passed as a reference.
- In `python.tsg`, many queries had captures that were not actually used
in the body of the stanza. This is no longer allowed (unless the
captures start with an underscore), as it may indicate an error. To fix
this, I added underscores in the appropriate places (and verified that
none of these unused captures were in fact bugs).
@tausbn tausbn added the no-change-note-required This PR does not need a change note label Jun 30, 2025
tausbn added 4 commits June 30, 2025 14:57
It seems that with a newer version of tree-sitter, we no longer parse
the (not actually valid!) syntax `Spam[**P2]` as if the `**` is an
exponentiation operation (with a missing left operand).
Rather than relying on matching arbitrary nodes inside tree-sitter-graph
and then checking whether they are of type ERROR or MISSING (which seems
to have stopped working in later versions of tree-sitter), we now
explicitly go through the tree-sitter tree, locating all of the error
and missing nodes along the way. We then add these on to the graph
output in the same format as was previously produced by
tree-sitter-graph.

Note that it's very likely that some of the syntax errors will move
around a bit as a consequence of this change. In general, we don't
expect syntax errors to have stable locations, as small changes in the
grammar can cause an error to appear in a different position, even if
the underlying (erroneous) code has not changed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-change-note-required This PR does not need a change note Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant