skip to navigation
skip to content

Not Logged In

shipyard 0.02

process data in a format inspired by email headers

What is shipyard?

Shipyard is a module to process data in a format inspired by email headers (RFC 2822).

The goal of shipyard is to have a simple, human readable and human writable replacement for CSV that works better for long data and many rows and doesn't need difficult escaping rules for special characters.

It's called shipyard because that word contains py and doesn't seem to be taken yet.

File format

Character encoding

A character encoding can be specified similar to PEP 0263 using:

# -*- coding: <encoding name> -*-

in the first line. # is replaced with the actual comment mark.

More precisely, the first line must match the regular expression:

^#.*coding[:=]\s*([-\w.]+)

Again # is replaced by the actual comment mark. The first group of this expression is then interpreted as encoding name.

Data set

A data set consists of zero or more records separated by one or more empty lines.

Comment

Lines starting with the comment mark (default: #) are ignored. Comments can be used in or between records.

Record

A record consists of one or more fields

Field

A field is a line that has the form:

key: value
key is a string that
  • doesn't contain a colon
  • doesn't start with the comment mark
  • doesn't start with the continuation mark

value is an arbitrary string. It can span multiple line using continuation marks.

Continuation

If a line starts with the continuation mark (default: " " [one blank]) it gets appended to the preceding line, with the continuation mark removed.

Usage

Obviuosly we need to import shipyard:
>>> import shipyard
First we open the file:
>>> input = open('nobel.sy')
Then we create a parser object:
>>> reader = shipyard.Parser(keep_linebreaks=False,
...                          keys=['id', 'discipline', 'year',
...                                'name', 'country', 'rationale'])

For every record the given keys are initialized with None.

Now we can iterater through the records:

>>> for record in reader.parse(input):    # doctest:+ELLIPSIS
...     print record['country']
United States
Japan
United States
...
Instead of iterating we may want to get a list of dicts:
>>> input.seek(0)
>>> lod = reader.get_list(input)
>>> print lod     # doctest:+ELLIPSIS
[{u'discipline': u'Chemistry', u'name': u'Martin Chalfie', ...}, {u'discipline': u'Chemistry', u'name': u'Osamu Shimomura', ...}, ...]
Sometimes we need a dict of dicts (using the 'id' field as key):
>>> input.seek(0)
>>> dod = reader.get_dict(input, key='id')
>>> print dod.keys()
[u'11', u'10', u'1', u'0', u'3', u'2', u'5', u'4', u'7', u'6', u'9', u'8']
>>> print dod[u'5'][u'rationale']
for the discovery of the mechanism of spontaneous brokensymmetry in subatomic physics
If we don't want dicts we can use the 'factory' parameter:
>>> input.seek(0)
>>> los = reader.get_list(input, factory = lambda **keys: ', '.join(keys.values()))
>>> print los[0]
Chemistry, Martin Chalfie, United States, for the discovery and development of the green fluorescentprotein, GFP, 2008, 0
Of course a class works as a factory, too:
>>> input.seek(0)
>>> class Laureate(object):
...     def __init__(self, id, discipline, year, name, country, rationale):
...         self.name = name
>>> doo = reader.get_dict(input, key='id', factory = Laureate)
>>> print doo[u'2']      # doctest:+ELLIPSIS
<Laureate object at ...>
>>> print doo[u'2'].name
Roger Y. Tsien

Now let's write a Shipyard file.

First we create a StringIO (any other file-like object will do, too):
>>> import StringIO
>>> output = StringIO.StringIO()
Next we need a Writer object:
>>> writer = shipyard.Writer(keys=('foo', 'bar'), coding='utf-8')
Now we can use write() to write a single record:
>>> writer.write(output, {'foo': 1, 'bar': 2})
>>> print output.getvalue()
foo: 1
bar: 2
<BLANKLINE>
<BLANKLINE>
Using write_many() we can write a list of records:
>>> output = StringIO.StringIO()
>>> d = [dict((('foo', i), ('bar', 2*i))) for i in range(3)]
>>> writer.write_many(output, d)
>>> print output.getvalue()
foo: 0
bar: 0
<BLANKLINE>
foo: 1
bar: 2
<BLANKLINE>
foo: 2
bar: 4
<BLANKLINE>
<BLANKLINE>
To get a encoding line we use write_coding():
>>> output = StringIO.StringIO()
>>> writer.write_coding(output)
>>> print output.getvalue()
#-*- coding: utf-8 -*-
<BLANKLINE>
<BLANKLINE>
Now let's do everything at once using write_full():
>>> output = StringIO.StringIO()
>>> writer.write_full(output, d)
>>> print output.getvalue()
#-*- coding: utf-8 -*-
<BLANKLINE>
foo: 0
bar: 0
<BLANKLINE>
foo: 1
bar: 2
<BLANKLINE>
foo: 2
bar: 4
<BLANKLINE>
<BLANKLINE>
 
  • Downloads (All Versions):
  • 0 downloads in the last day
  • 0 downloads in the last week
  • 0 downloads in the last month