pm39 tree sitter parser
Notes and such about the Tree-Sitter ConTeXt parser.
Contents
Features
Version 0.6 of the tree-sitter-context_en
parser supports the following features:
Document Areas
If document start(text, component) and stop(text, component) commands exit in the document, the parser will build a tree with preamble, main, and postamble nodes. Dividing the document this way makes it easier for tools (that may want to ignore the postamble, for example).
If no start- or stop- commands exist in the document, all content is contained in a main node.
Commands
The parser tokenizes commands into:
- name
- zero or more option blocks (square brackets with keywords)
- zero or more settings blocks (square brackets with key=val pairs)
- zero or more scopes (curly braces after the command)
Settings are further tokenized into keys and values, with values able to contain other tokens (more commands, etc.).
Groups
The parser understands the following types of groupings:
- Brace groups (starting with "{" or "\bgroup", and ending with "}" or "\egroup")
- "Command" groups (starting with "\start" and ending with "\stop")
Inline Math
The parser supports minimal handling of inline math.
(Future work: more math support!)
Inclusions
Code Inclusions
The parser supports marking the following inclusions for inlined code:
- luacode
- tikzcode
- MPinclusions
- useMPgraphic
- reuseableMPgraphic
- MPcode
- MPpage
- staticMPfigure
Note that the parser will make these areas for external parsing, but nothing will happen if the external parser isn't available.
(As of this writing, an external parser exists for Lua, but not for MetaPost or TiKz.)
Typing Environment Inclusions
The parser supports marking the following typing environments:
- MetaPost
- Lua
- HTML
- CSS
- XML
- PARSEDXML
...and a generic typing inclusion.
Other Things
The parser marks commands relating to project structure.
The parser marks escaped characters (and will complain about unescaped characters that should be, except in special circumstances.)
The parser should be line-ending agnostic.
Future Directions
- Parse and include more of the document structure in the syntax tree? (Reflect chapters, sections, etc. in the syntax tree? What to do about user-defined headings?)
- Table support for the parser? (which model(s)?)
- Better math support?
- Better programming support? (Explicitly tag things like loop and branch commands?)
- More inclusions? (Markdown?)
- Other ConTeXt interface languages?
- Should the parser be more strict about what's allowed in the preamble?