chomsky

Another language grammar parser. Inspired by modgrammar and pyparsing

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 1 - Planning
Environment
- Console
Intended Audience
- Developers
- End Users/Desktop
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic

Project description

=======
chomsky
=======

I needed a language grammar parser for the plywood_ project, and modgrammar_
looked like it would be perfect, except I couldn't get the simplest of grammars
to work. pyparsing_ is excellent, but doesn't give me objects back, only lists
and strings - I need more than that. I would recommend pyparsing_ for *your*
project. Unless you really want objects, or if you are doing a language
(chomsky_ has lots of built-in stuff for making programming language grammars).

Besides, I like writing parsers, and I know how I want this one to work, so
screw it, I'll do it myself!

------------
INSTALLATION
------------

::

$ pip install chomsky

-----
USAGE
-----

Matchers
~~~~~~~~

``Matcher`` objects are the most basic building blocks. They are not smart,
they return only strings and lists, and they make no assumptions about what you
might be trying to build. For instance, the ``Word`` Matcher does not assume
that you want to consume whitespace.

``Matcher`` objects are great for building a small parsing language for
consistent data, where ``Grammar`` objects are not needed. But for building a
language parser, you will probably use the more heavy-duty Grammar building
blocks.

Letter
~~~~~~

Matches a single letter from a string of accepted letters. There are lots of
built-in strings in the `string module`_.

::

test/matchers/test_letter_matcher.py

matcher = Letter('abcde')
matcher('a') => 'a'
matcher('bcd') => 'b'
matcher('f') => raise ParseException
# shorthand:
matcher = A('abcde')

import string
matcher = A(string.letters + string.digits + '_')

Word
~~~~

Matches one or more letters from string of accepted letters.

You can also set ``min`` and ``max`` options. ``min`` will raise a
``ParseException`` if the matched word is not long enough. Default is ``1``.
``max`` will stop matching once ``max`` characters are matched.

::

test/matchers/test_word_matcher.py
matcher = Word('abcde')
matcher('a') => 'a'
matcher('bcd') => 'bcd'
matcher('defg') => 'defg'
matcher('fghi') => ParseException

# max
matcher = Word('abcde', max=2)
matcher('bcd') => 'bc'

# min
matcher = Word('abcde', min=3)
matcher('ab') => ParseException

Literal
~~~~~~~

Matches a literal string.

::

test/matchers/test_literal_matcher.py
matcher = Literal('abcde')
matcher('a') => 'a'
matcher('bcd') => 'bcd'
matcher('defg') => 'defg'
matcher('fghi') => ParseException

Whitespace
~~~~~~~~~~

::

test/matchers/test_whitespace_matcher.py
matcher = Whitespace() # default is " \t"
matcher(" ") => " "
matcher(" \t\n ") => " \t"
matcher = Whitespace(" \t\n")
matcher(" \t\n ") => " \t\n "

Regex
~~~~~

These have two options: ``group`` and ``advance``.

``group`` says which group or groups to return. Default is ``0`` (the entire
match). A list or tuple of groups will return a list of results. ``advance``
indicates what group to advance *past*. Default is ``0`` (the entire match).
This is a quick way to build a matching system that can parse consistently
formatted data, for example.

::

test/matchers/test_regex_matcher.py
matcher = Regex("([a-zA-Z_][0-9])")
matcher('a1') => 'a1'

# group
matcher = Regex("([a-zA-Z_][0-9])", group=1)
matcher('a1') => 'a'

# to demonstrate `advance`, I will have to add two regex Matchers, which
# returns a list
matcher = Regex("([a-zA-Z_][0-9])", group=1, advance=1) + Regex("([0-9])", group=1)
matcher('a1') => ['a', '1']

Sequence
~~~~~~~~

There are two flavors of ``Sequence``. One you can declare yourself, called
``Sequence``, the other is created automatically when you add or multiply
Matcher objects. Don't worry about that one, it "just works" (we saw it above
in the ``Regex`` example).

::

test/matchers/test_sequence_matcher.py
matcher = Sequence(Literal('Hello '), Literal('World'), Letter('!.'))
matcher('Hello World!') => 'Hello World!'
matcher('Hello World.') => 'Hello World.'
matcher('Hello, World.') => ParseException

test/matchers/test_sequence_matcher.py
matcher = Sequence(Literal('Hello '), Literal('World'), Letter('!.'))
matcher('Hello World!') => 'Hello World!'
matcher('Hello World.') => 'Hello World.'
matcher('Hello, World.') => ParseException

**arithmetic**::

Matcher + Matcher + Matcher # tested
Matcher * 3 # tested

**repetition**::

NMatches # tested
ZeroOrMore # tested
OneOrMore # tested
Optional # tested
Any # tested

NextIs, NextIsNot

**language building blocks**::

QuotedString, Number, Integer, Float, Hexadecimal, Octal, Binary
LineComment, BlockComment, Block, IndentedBlock

**location*::*

NextIs, PreviousWas, NextIsNot, PreviousWasNot
WordStart, WordEnd, LineStart, LineEnd,
StringStart, StringEnd

----
TEST
----

::

$ pip install pytest
$ py.test

-------
LICENSE
-------

:Author: Colin Thomas-Arnold
:Copyright: 2012 Colin Thomas-Arnold <http://colinta.com/>

Copyright (c) 2012, Colin Thomas-Arnold
All rights reserved.

See LICENSE_ for more details (it's a simplified BSD license).

.. _LICENSE: https://github.com/colinta/chomsky/blob/master/LICENSE
.. _modgrammar: http://pypi.python.org/pypi/modgrammar
.. _pyparsing: http://pyparsing.wikispaces.com/
.. _plywood: http://github.com/colinta/plywood
.. _string module: http://docs.python.org/library/string.html#string-constants

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 1 - Planning
Environment
- Console
Intended Audience
- Developers
- End Users/Desktop
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic

Release history Release notifications | RSS feed

2.0.0

Jun 5, 2019

1.0.2

Apr 25, 2013

1.0.1

Jan 14, 2013

1.0.0

Jan 14, 2013

v0.0.22

Jan 14, 2013

v0.0.21

Jan 7, 2013

v0.0.20

Jan 7, 2013

v0.0.19

Jan 7, 2013

v0.0.18

Jan 6, 2013

v0.0.17

Jan 6, 2013

v0.0.16

Jan 5, 2013

v0.0.14

Jan 5, 2013

v0.0.13

Jan 5, 2013

v0.0.12

Dec 29, 2012

v0.0.11

Dec 29, 2012

v0.0.10

Dec 29, 2012

v0.0.9

Dec 29, 2012

v0.0.8

Aug 17, 2012

v0.0.7

Aug 17, 2012

v0.0.6

Jul 31, 2012

v0.0.5

Jul 3, 2012

v0.0.4

Jul 3, 2012

v0.0.3

Jul 3, 2012

This version

v0.0.2

Jul 2, 2012

v0.0.1

Jun 21, 2012

v0.0.0

Jun 21, 2012

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chomsky-v0.0.2.tar.gz (7.6 kB view hashes)

Uploaded Jul 2, 2012 Source

Hashes for chomsky-v0.0.2.tar.gz

Hashes for chomsky-v0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`ed419b8c73e2b0e55ab272d37abbe9190864768b11bcd7e02550f309d667f7d4`
MD5	`b83cf80cbc6f71495d121da75d23729c`
BLAKE2b-256	`b24381787667fe922dca65dbf10c302e478277a80f338d455a35ed39288bf9c0`