Table of Contents

Tree-Sitter.  How your favourite text editor knows how to highlight your code!

I did all of this (kind of by accident) for the Helix Editor which doesn’t, by default, have documentation-comment highlighting for most languages.

Why?🔗

Someone in the Helix Matrix channel asked about documentation comment highlighting in Python, and I very confidently told them that because Python doc comments and multi-line comments look like this:

"""
	This is a documentation comment
"""
my_variable = """
	And this is a multi-line string
"""

They can’t probably can’t be distinguished between all that easily.
I was really very wrong.

The next day I ended up going down a very deep rabbit-hole learning about tree-sitter, node trees, and how syntax highlighting works.

So how does this all this work?🔗

Tree-Sitter uses grammar files to break this rust code:
/// I am a documentation comment
fn test_func() {
	// I am a regular comment
    a + b
}

into this node tree

source_file [0, 0] - [5, 0]
  line_comment [0, 0] - [1, 0]
    outer: outer_doc_comment_marker [0, 2] - [0, 3]
    doc: doc_comment [0, 3] - [1, 0]
  function_item [1, 0] - [4, 1]
    name: identifier [1, 3] - [1, 12]
    parameters: parameters [1, 12] - [1, 14]
    body: block [1, 15] - [4, 1]
      line_comment [2, 1] - [2, 7]
      binary_expression [3, 4] - [3, 9]
        left: identifier [3, 4] - [3, 5]
        right: identifier [3, 8] - [3, 9]

To select the single line documentation comment you use the below tree-sitter query

(line_comment (doc_comment)) @test

Which will assign the identifier @test to all line doc comments, which can then be targeted for syntax highlighting!

You can test all of this for yourself in the tree-sitter playground

Python Block Comments🔗

So going back to where we started.

"""
	This is a documentation comment
"""
my_variable = """
	And this is a multi-line string	
"""

We have a documentation comment, and a multi-line string, how do they look in the node tree?

Well, I did some experimenting in the tree-sitter playground, and it turns out in Python, string doc comments exist directly inside an expression_statement node: expression_statement > string.  However, multi-line strings (and single line ones) are in the form expression_statement > assignment > string.

So to match the documentation comment we need the query:

(expression_statement (string) @content
	(#match? @content "^\"\"\"")
) @comment.block.documentation

So we get an expression_statement containing a string, and then make sure it matches the (simplified here) regex ^""" which means it’s a multi-line comment and not a boring single quote comment.

And we’ve just matched ourselves a Python documentation comment!

Rust Block Comments🔗

I use Rust a bunch, and if we can target Python documentation comments… What’s stopping us from targetting Rust comments?

Nothing.

The reason I used Rust for the example above is that the helix rust tree-sitter grammar identifies the documentation comments for us!

Now, we could do:

(doc_comment) @comment.documentation

but that doesn’t actually match the whole comment, so the slashes at the start (//) are left out.
Instead, we search for line (and block) comments that contain a doc_comment node, and that includes the slashes!

(line_comment (doc_comment)) @comment.documentation
(block_comment (doc_comment)) @comment.block.documentation

Once you’ve gotten the hang of it it’s actually pretty easy to do!

Helix’s command :tree-sitter-scopes helps a lot too.

Title Tree Sitter Highlighting Queries
Created 2024-11-04
Published 2024-11-17
Draft false

Tags: tree-sitter, Python, Rust, Guide