reflex 0.1
A lightweight regex-based lexical scanner library.
Reflex: A lightweight lexical scanner library.
Reflex supports regular expressions, rule actions, multiple scanner states,
tracking of line/column numbers, and customizable token classes.
Reflex is not a "scanner generator" in the sense of generating source code.
Instead, it generates a scanner object dynamically based on the set of
input rules sepecified. The rules themselves are ordinary python regular
expressions, combined with rule actions which are simply python functions.
Example use:
# Create a scanner. The "start" parameter specifies the name of the
# starting state. Note: The state argument can be any hashable python
# type.
scanner = reflex.scanner( "start" )
# Add some rules.
# The whitespace rule has no actions, so whitespace will be skipped
scanner.rule( "\s+" )
# Rules for identifiers and numbers.
TOKEN_IDENT = 1
TOKEN_NUMBER = 2
scanner.rule( "[a-zA-Z_][\w_]*", token=TOKEN_IDENT )
scanner.rule( "0x[\da-fA-F]+|\d+", token=TOKEN_NUMBER )
# The "string" rule kicks us into the string state
TOKEN_STRING = 3
scanner.rule( "\"", tostate="string" )
# Define the string state. "string_escape" and "string_chars" are
# action functions which handle the parsed charaxcters and escape
# sequences and append them to a buffer. Once a quotation mark
# is encountered, we set the token type to be TOKEN_STRING
# and return to the start state.
scanner.state( "string" )
scanner.rule( "\"", tostate="start", token=TOKEN_STRING )
scanner.rule( "\\\\.", string_escape )
scanner.rule( "[^\"\\\\]+", string_text )
Invoking the scanner: The scanner can be called as a function which
takes a reference to a stream (such as a file object) which iterates
over input lines. The "context" argument is for application use,
The result is an iterator which produces a series of tokens.
The same scanner can be used to parse multiple input files, by
creating a new stream for each file.
# Return an instance of the scanner.
token_iter = scanner( istream, context )
Getting the tokens. Here is a simple example of looping through the
input tokens. A real-world use would most likely involve comparing
vs. the type of the current token.
# token.id is the token type (the same as the token= argument in the rule)
# token.value is the actual characters that make up the token.
# token.line is the line number on which the token was encountered.
# token.pos is the column number of the first character of the token.
for token in token_iter:
print token.id, token.value, token.line, token.pos
Action functions are python functions which take a single argument, which
is the token stream instance.
# Action function to handle striing text.
# Appends the value of the current token to the string data
def string_text( token_stream ):
string_data += scanner.token.value
The token_stream object has a number of interesting and usable attributes:
states: dictionary of scanner states
state: the current state
stream: the input line stream
context: the context pointer that was passed to the scanner
token: the current token
line: the line number of the current parse position
pos: the column number of the current parse position
Note - reflex currently has a limit of 99 rules for each state. (That is
the maximum number of capturing groups allowed in a python regular expression.)
| File | Type | Py Version | Uploaded on | Size | # downloads |
|---|---|---|---|---|---|
| reflex-0.1.tar.gz (md5) | Source | 2005-12-26 00:52:46 | 4KB | 341 | |
| reflex-0.1-py2.4.egg (md5) | Python Egg | 2.4 | 2005-12-26 00:52:47 | 9KB | 502 |
- Author: Talin <viridia at gmail com>
- Home Page: http://viridia.org/python-projects/
- License: Choice of GPL or Python license
- Platform: Any
-
Categories
- Development Status :: 4 - Beta
- Intended Audience :: Developers
- License :: OSI Approved :: GNU General Public License (GPL)
- License :: OSI Approved :: Python Software Foundation License
- Operating System :: OS Independent
- Programming Language :: Python
- Topic :: Software Development :: Compilers
- Topic :: Software Development :: Libraries :: Python Modules
- Package Index Owner: Talin
- DOAP record: reflex-0.1.xml
Log in to rate this package.
