Skip to main content

A lexer and tokenizer for grammar files as defined by TextMate and used in VSCode, implemented in Python.

Project description

PyPI - Version PyPI - License Ruff Checked with mypy pre-commit Python versions CI/CD readthedocs

textmate-grammar-python

A lexer and tokenizer for grammar files as defined by TextMate and used in VSCode, implemented in Python.

Textmate grammars are made for vscode-texmate, allowing for syntax highlighting in VSCode after tokenization. This presents textmate-grammar-python with a large list of potentially supported languages.

Usage

Install the module with:

pip install textmate-grammar-python

Before tokenization is possible, a LanguageParser needs to be initialized using a loaded grammar.

from textmate_grammar.language import LanguageParser
from textmate_grammar.grammars import matlab
parser = LanguageParser(matlab.GRAMMAR)

After this, one can either choose to call parser.parsing_string to parse a input string directly, or call parser.parse_file with the path to the appropiate source file as the first argument, such as in the example example.py.

The parsed element object can be displayed directly by calling the print method. By default the element is printed as an element tree in a dictionary format.

>>> element = parser.parse_string("value = num2str(10);")
>>> element.print()

{'token': 'source.matlab',
 'children': [{'token': 'meta.assignment.variable.single.matlab', 
               'children': [{'token': 'variable.other.readwrite.matlab', 'content': 'value'}]},
              {'token': 'keyword.operator.assignment.matlab', 'content': '='},
              {'token': 'meta.function-call.parens.matlab',
               'begin': [{'token': 'entity.name.function.matlab', 'content': 'num2str'},
                         {'token': 'punctuation.section.parens.begin.matlab', 'content': '('}],
               'end': [{'token': 'punctuation.section.parens.end.matlab', 'content': ')'}],
               'children': [{'token': 'constant.numeric.decimal.matlab', 'content': '10'}]},
              {'token': 'punctuation.terminator.semicolon.matlab', 'content': ';'}]}

Alternatively, with the keyword argument flatten the element is displayed as a list per unique token. Here the first item in the list is the starting position (line, column) of the unique tokenized element.

>>> element.print(flatten=True)

[[(0, 0), 'value', ['source.matlab', 'meta.assignment.variable.single.matlab', 'variable.other.readwrite.matlab']],
 [(0, 5), ' ', ['source.matlab']],
 [(0, 6), '=', ['source.matlab', 'keyword.operator.assignment.matlab']],
 [(0, 7), ' ', ['source.matlab']],
 [(0, 8), 'num2str', ['source.matlab', 'meta.function-call.parens.matlab', 'entity.name.function.matlab']],
 [(0, 15), '(', ['source.matlab', 'meta.function-call.parens.matlab', 'punctuation.section.parens.begin.matlab']],
 [(0, 16), '10', ['source.matlab', 'meta.function-call.parens.matlab', 'constant.numeric.decimal.matlab']],
 [(0, 18), ')', ['source.matlab', 'meta.function-call.parens.matlab', 'punctuation.section.parens.end.matlab']],
 [(0, 19), ';', ['source.matlab', 'punctuation.terminator.semicolon.matlab']]]

Information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textmate_grammar_python-0.4.0.tar.gz (34.1 kB view hashes)

Uploaded Source

Built Distribution

textmate_grammar_python-0.4.0-py3-none-any.whl (36.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page