Tree-Sitter.  How your favourite text editor knows how to highlight your code!
I did all of this (kind of by accident) for the Helix Editor which doesn’t, by default, have documentation-comment highlighting for most languages.
Why?🔗
Someone in the Helix Matrix channel asked about documentation comment highlighting in Python, and I very confidently told them that because Python doc comments and multi-line comments look like this:
"""
This is a documentation comment
"""
my_variable = """
And this is a multi-line string
"""
They can’t probably can’t be distinguished between all that easily.
I was really very wrong.
The next day I ended up going down a very deep rabbit-hole learning about tree-sitter, node trees, and how syntax highlighting works.
So how does this all this work?🔗
/// I am a documentation comment
fn test_func() {
// I am a regular comment
  a + b
}
into this node tree
source_file [0, 0] - [5, 0]
 line_comment [0, 0] - [1, 0]
  outer: outer_doc_comment_marker [0, 2] - [0, 3]
  doc: doc_comment [0, 3] - [1, 0]
 function_item [1, 0] - [4, 1]
  name: identifier [1, 3] - [1, 12]
  parameters: parameters [1, 12] - [1, 14]
  body: block [1, 15] - [4, 1]
  line_comment [2, 1] - [2, 7]
  binary_expression [3, 4] - [3, 9]
  left: identifier [3, 4] - [3, 5]
  right: identifier [3, 8] - [3, 9]
To select the single line documentation comment you use the below tree-sitter query
(line_comment (doc_comment)) @test
Which will assign the identifier @test
to all line doc comments, which can then be targeted for syntax highlighting!
You can test all of this for yourself in the tree-sitter playground
Python Block Comments🔗
So going back to where we started.
"""
This is a documentation comment
"""
my_variable = """
And this is a multi-line string
"""
We have a documentation comment, and a multi-line string, how do they look in the node tree?
Well, I did some experimenting in the tree-sitter playground, and it turns out in Python, string doc comments exist directly inside an expression_statement node: expression_statement > string
.  However, multi-line strings (and single line ones) are in the form expression_statement > assignment > string
.
So to match the documentation comment we need the query:
(expression_statement (string) @content
(#match? @content "^\"\"\"")
) @comment.block.documentation
So we get an expression_statement
containing a string
, and then make sure it matches the (simplified here) regex ^"""
which means it’s a multi-line comment and not a boring single quote comment.
And we’ve just matched ourselves a Python documentation comment!
Rust Block Comments🔗
I use Rust a bunch, and if we can target Python documentation comments… What’s stopping us from targetting Rust comments?
Nothing.
The reason I used Rust for the example above is that the helix rust tree-sitter grammar identifies the documentation comments for us!
Now, we could do:
(doc_comment) @comment.documentation
but that doesn’t actually match the whole comment, so the slashes at the start (//
) are left out.
Instead, we search for line (and block) comments that contain a doc_comment
node, and that includes the slashes!
(line_comment (doc_comment)) @comment.documentation
(block_comment (doc_comment)) @comment.block.documentation
Once you’ve gotten the hang of it it’s actually pretty easy to do!
Helix’s command :tree-sitter-scopes
helps a lot too.