shipyard 0.02
process data in a format inspired by email headers
What is shipyard?
Shipyard is a module to process data in a format inspired by email headers (RFC 2822).
The goal of shipyard is to have a simple, human readable and human writable replacement for CSV that works better for long data and many rows and doesn't need difficult escaping rules for special characters.
It's called shipyard because that word contains py and doesn't seem to be taken yet.
File format
Character encoding
A character encoding can be specified similar to PEP 0263 using:
# -*- coding: <encoding name> -*-
in the first line. # is replaced with the actual comment mark.
More precisely, the first line must match the regular expression:
^#.*coding[:=]\s*([-\w.]+)
Again # is replaced by the actual comment mark. The first group of this expression is then interpreted as encoding name.
Comment
Lines starting with the comment mark (default: #) are ignored. Comments can be used in or between records.
Field
A field is a line that has the form:
key: value
- key is a string that
- doesn't contain a colon
- doesn't start with the comment mark
- doesn't start with the continuation mark
value is an arbitrary string. It can span multiple line using continuation marks.
Continuation
If a line starts with the continuation mark (default: " " [one blank]) it gets appended to the preceding line, with the continuation mark removed.
Usage
- Obviuosly we need to import shipyard:
>>> import shipyard
- First we open the file:
>>> input = open('nobel.sy')- Then we create a parser object:
>>> reader = shipyard.Parser(keep_linebreaks=False, ... keys=['id', 'discipline', 'year', ... 'name', 'country', 'rationale'])
For every record the given keys are initialized with None.
Now we can iterater through the records:
>>> for record in reader.parse(input): # doctest:+ELLIPSIS ... print record['country'] United States Japan United States ...
- Instead of iterating we may want to get a list of dicts:
>>> input.seek(0) >>> lod = reader.get_list(input) >>> print lod # doctest:+ELLIPSIS [{u'discipline': u'Chemistry', u'name': u'Martin Chalfie', ...}, {u'discipline': u'Chemistry', u'name': u'Osamu Shimomura', ...}, ...]- Sometimes we need a dict of dicts (using the 'id' field as key):
>>> input.seek(0) >>> dod = reader.get_dict(input, key='id') >>> print dod.keys() [u'11', u'10', u'1', u'0', u'3', u'2', u'5', u'4', u'7', u'6', u'9', u'8'] >>> print dod[u'5'][u'rationale'] for the discovery of the mechanism of spontaneous brokensymmetry in subatomic physics
- If we don't want dicts we can use the 'factory' parameter:
>>> input.seek(0) >>> los = reader.get_list(input, factory = lambda **keys: ', '.join(keys.values())) >>> print los[0] Chemistry, Martin Chalfie, United States, for the discovery and development of the green fluorescentprotein, GFP, 2008, 0
- Of course a class works as a factory, too:
>>> input.seek(0) >>> class Laureate(object): ... def __init__(self, id, discipline, year, name, country, rationale): ... self.name = name >>> doo = reader.get_dict(input, key='id', factory = Laureate) >>> print doo[u'2'] # doctest:+ELLIPSIS <Laureate object at ...> >>> print doo[u'2'].name Roger Y. Tsien
Now let's write a Shipyard file.
- First we create a StringIO (any other file-like object will do, too):
>>> import StringIO >>> output = StringIO.StringIO()
- Next we need a Writer object:
>>> writer = shipyard.Writer(keys=('foo', 'bar'), coding='utf-8')- Now we can use write() to write a single record:
>>> writer.write(output, {'foo': 1, 'bar': 2}) >>> print output.getvalue() foo: 1 bar: 2 <BLANKLINE> <BLANKLINE>- Using write_many() we can write a list of records:
>>> output = StringIO.StringIO() >>> d = [dict((('foo', i), ('bar', 2*i))) for i in range(3)] >>> writer.write_many(output, d) >>> print output.getvalue() foo: 0 bar: 0 <BLANKLINE> foo: 1 bar: 2 <BLANKLINE> foo: 2 bar: 4 <BLANKLINE> <BLANKLINE>- To get a encoding line we use write_coding():
>>> output = StringIO.StringIO() >>> writer.write_coding(output) >>> print output.getvalue() #-*- coding: utf-8 -*- <BLANKLINE> <BLANKLINE>
- Now let's do everything at once using write_full():
>>> output = StringIO.StringIO() >>> writer.write_full(output, d) >>> print output.getvalue() #-*- coding: utf-8 -*- <BLANKLINE> foo: 0 bar: 0 <BLANKLINE> foo: 1 bar: 2 <BLANKLINE> foo: 2 bar: 4 <BLANKLINE> <BLANKLINE>
- Author: Florian Diesch <devel at florian-diesch de>
- Home Page: http://www.florian-diesch.de/software/shipyard/
- Keywords: data format storage human readable RFC-2822 CSV
- License: GPL
-
Categories
- Development Status :: 3 - Alpha
- Intended Audience :: Developers
- License :: OSI Approved :: GNU General Public License (GPL)
- Natural Language :: English
- Natural Language :: German
- Operating System :: OS Independent
- Programming Language :: Python
- Topic :: Other/Nonlisted Topic
- Topic :: Software Development :: Libraries
- Package Index Owner: fdiesch
- DOAP record: shipyard-0.02.xml
Log in to rate this package.
