Skip to main content

Parse VBA grammar using ANTLR4 and python

Project description

antlr4-vba-parser

Navigate antlr VBA Parse Trees in python.

This python package provides an interface to the the antlr4 tooling and allows parsing and lexing of VBA grammar.

>>> from antlr4_vba_parser.vba_parser import Antlr4VbaParser

>>> parsed = Antlr4VbaParser("""
... SUB square(x)
...   DIM y: REM Some comment
...   y = x * x  ' same as x**2
... END SUB
... """)  # also accepts a filepath

>>> from pprint import pprint
>>> pprint(parsed)
('(startRule (module (endOfLine \\n) (moduleBody (moduleBodyElement (subStmt '
 'SUB   (ambiguousIdentifier square) (argList ( (arg (ambiguousIdentifier x)) '
 ')) (endOfStatement (endOfLine \\n   )) (block (blockStmt (variableStmt DIM   '
 '(variableListStmt (variableSubStmt (ambiguousIdentifier y))))) '
 '(endOfStatement :   (endOfLine (remComment REM Some comment)) (endOfLine '
 '\\n   )) (blockStmt (letStmt (implicitCallStmt_InStmt '
 '(iCS_S_VariableOrProcedureCall (ambiguousIdentifier y)))   =   (valueStmt '
 '(valueStmt (implicitCallStmt_InStmt (iCS_S_VariableOrProcedureCall '
 '(ambiguousIdentifier x))))   *   (valueStmt (implicitCallStmt_InStmt '
 '(iCS_S_VariableOrProcedureCall (ambiguousIdentifier x))))))) (endOfStatement '
 "(endOfLine    (comment ' same as x**2)) (endOfLine \\n))) END SUB)) "
 '(endOfLine \\n))) <EOF>)')

Installation

antlr4_vba_parser itself is a pure python package, but depends on a java runtime in order to run. The ANTLR4 jar needed to perform the parsing/lexing is included in the package distribution and is bundled from third-party sources at the time of packaging with setup.py build.

To install, simply try:

pip install antlr4_vba_parser

Development

To set up a development environment, first create either a new virtual or conda environment before activating it and then run the following:

git clone https://github.com/Liam-Deacon/antlr4-vba-parser
cd antlr4-vba-parser
pip install -r requirements-dev.txt requirements-test.txt -r requirements.txt
python setup.py build_antlr4  # needed to generate python bindings
pip install -e .

This will install the package in development mode. Note that is you have forked the repo then change the URL as appropriate.

Documentation

Documentation can be found within the docs/ directory. This project uses sphinx to autogenerate API documentation by scraping python docstrings.

To generate the HTML documentation, simply do the following:

cd docs
make html

Contribution Guidelines

Contributions are extremely welcome and highly encouraged. To help with consistency please can the following areas be considered before submitting a PR for review:

  • Use autopep8 -a -a -i -r . to run over any modified files to ensure basic pep8 conformance, allowing the code to be read in a style expected for most python projects.
  • New or changed functionality should be tested, running pytest should
  • Try to document any new or changed functionality. Note: this project uses numpydoc for it's docstring documentation style.

License

Released under the BSD license.

TODO

This package is mostly a proof of concept and as such there are a number of areas to add to, fix and improve.

  • Create listener(s) capable of capturing contextual information and creating a JSON-friendly dictionary output.
  • Produce simple script turns the above into a command line tool.
  • Contribute to oletools.vba to hopefully extend capabilities using this package.

Acknowledgements

  • Andrew Lockhart for the initial idea of combining ANTLR4 and python to handle VBA grammar

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antlr4-vba-parser-0.0.2.tar.gz (2.1 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page