pycldf

A python library to read and write CLDF datasets

These details have been verified by PyPI

Maintainers

bibiko chrzyki xflr6 xrotwang

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

pycldf
======

A python package to read and write [CLDF](http://cldf.clld.org) datasets

[![Build Status](https://travis-ci.org/glottobank/pycldf.svg?branch=master)](https://travis-ci.org/glottobank/pycldf)
[![codecov](https://codecov.io/gh/glottobank/pycldf/branch/master/graph/badge.svg)](https://codecov.io/gh/glottobank/pycldf)
[![Requirements Status](https://requires.io/github/glottobank/pycldf/requirements.svg?branch=master)](https://requires.io/github/glottobank/pycldf/requirements/?branch=master)
[![PyPI](https://img.shields.io/pypi/v/pycldf.svg)](https://pypi.python.org/pypi/pycldf)

Writing CLDF
------------

```python
from pycldf.dataset import Dataset
from pycldf.sources import Source
dataset = Dataset('mydb')
dataset.fields = ('ID', 'Language_ID', 'Parameter_ID', 'Value', 'Source', 'Comment')
dataset.sources.add(Source('book', 'Meier2005', author='Hans Meier', year='2005', title='The Book'))
dataset.add_row([
'1',
'http://glottolog.org/resource/languoid/id/stan1295',
'http://concepticon.clld.org/parameters/1277',
'hand',
'Meier2005[3-7]',
''])
dataset.write('.')
```

results in

- `mydb.csv`
```
ID,Language_ID,Parameter_ID,Value,Source,Comment
1,http://glottolog.org/resource/languoid/id/stan1295,http://concepticon.clld.org/parameters/1277,hand,Meier2005[3-7],
```
- `mydb.bib`
```bibtex
@book{Meier2005,
author = {Meier, Hans},
title = {The Book},
year = {2005}
}
```
- `mydb.csv-metadata.json`
```python
{
"@context": [
"http://www.w3.org/ns/csvw",
{
"@language": "en"
}
],
"dc:format": "cldf-1.0",
"dialect": {
"header": true,
"delimiter": ",",
"encoding": "utf-8"
},
"tables": [
{
"url": "",
"dc:type": "cldf-values",
"tableSchema": {
"primaryKey": "ID",
"columns": [
{
"datatype": "string",
"name": "ID"
},
{
"datatype": "string",
"name": "Language_ID"
},
{
"datatype": "string",
"name": "Parameter_ID"
},
{
"datatype": "string",
"name": "Value"
},
{
"datatype": "string",
"name": "Source"
},
{
"datatype": "string",
"name": "Comment"
}
]
}
}
]
}
```

Reading CLDF
------------

```python
>>> from pycldf.dataset import Dataset
>>> dataset = Dataset.from_file('mydb.csv')
>>> dataset
<Dataset mydb>
>>> len(dataset)
1
>>> row = dataset.rows[0]
>>> row
Row([('ID', u'1'),
('Language_ID', 'http://glottolog.org/resource/languoid/id/stan1295'),
('Parameter_ID', 'http://concepticon.clld.org/parameters/1277'),
('Value', 'hand'),
('Source', 'Meier2005[3-7]'),
('Comment', '')])
>>> row['Value']
'hand'
>>> row.refs
[<Reference Meier2005[3-7]>]
>>> row.refs[0].source
<Source Meier2005>
>>> print row.refs[0].source
Meier, Hans. 2005. The Book.
>>> print row.refs[0].source.bibtex()
@book{Meier2005,
year = {2005},
author = {Meier, Hans},
title = {The Book}
}
```

Validating a data file
~~~~~~~~~~~~~~~~~~~~~~

By default, data files are read in strict-mode, i.e. invalid rows will result in an exception
being raised. To validate a data file, it can be read in validating-mode.

For example the following output is generated

```python
>>> from pycldf.dataset import Dataset
>>> dataset = Dataset.from_file('mydb.csv', skip_on_error=True)
WARNING:pycldf.dataset:skipping row in line 3: wrong number of columns in row
WARNING:pycldf.dataset:skipping row in line 4: duplicate ID: 1
WARNING:pycldf.dataset:skipping row in line 5: missing citekey: Mei2005
```

when reading the file

```
ID,Language_ID,Parameter_ID,Value,Source,Comment
1,stan1295,1277,hand,Meier2005[3-7],
1,stan1295,1277,hand,Meier2005[3-7]
1,stan1295,1277,hand,Meier2005[3-7],
2,stan1295,1277,hand,Mei2005[3-7],
```

Support for augmented metadata
------------------------------

`pycldf` provides some support for metadata properties as described in
[W3's Metadata Vocabulary for Tabular Data](https://www.w3.org/TR/tabular-metadata/), in particular,
- On [column description level](https://www.w3.org/TR/tabular-metadata/#dfn-column-description),
- `datatype` is interpreted to use appropriate python objects internally,
- a URI template provided as `valueUrl` can be expanded calling `Row.valueUrl(<colname>)`.
- On [schema description level](https://www.w3.org/TR/tabular-metadata/#dfn-schema-description),
- a URI template provided as `aboutUrl` is used to compute the URL available as `Row.url`.

So the example above could be rewritten more succintly:

```python
from pycldf.dataset import Dataset
from pycldf.sources import Source
dataset = Dataset('mydb')
dataset.fields = ('ID', 'Language_ID', 'Parameter_ID', 'Value', 'Source', 'Comment')
dataset.table.schema.columns['ID'].datatype = int
dataset.table.schema.columns['Language_ID'].valueUrl = 'http://glottolog.org/resource/languoid/id/{Language_ID}'
dataset.table.schema.columns['Parameter_ID'].valueUrl = 'http://concepticon.clld.org/parameters/{Parameter_ID}'
dataset.sources.add(Source('book', 'Meier2005', author='Hans Meier', year='2005', title='The Book'))
dataset.add_row(['1', 'stan1295', '1277', 'hand', 'Meier2005[3-7]', ''])
dataset.write('.')
```

And then accessed as follows:

```python
>>> from pycldf.dataset import Dataset
>>> dataset = Dataset.from_file('mydb.csv')
>>> row = dataset.rows[0]
>>> type(row['ID'])
<type 'int'>
>>> row.valueUrl('Language_ID')
'http://glottolog.org/resource/languoid/id/stan1295'
>>> row['Language_ID']
'stan1295'
```

Project details

These details have been verified by PyPI

Maintainers

bibiko chrzyki xflr6 xrotwang

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.38.0

Apr 26, 2024

1.37.1

Mar 18, 2024

1.37.0

Jan 22, 2024

1.36.0

Nov 14, 2023

1.35.1

Oct 23, 2023

1.35.0

Jul 10, 2023

1.34.1

Mar 15, 2023

1.34.0

Dec 5, 2022

1.33.0

Nov 24, 2022

1.32.0

Nov 23, 2022

1.31.0

Nov 22, 2022

1.30.0

Nov 22, 2022

1.29.0

Oct 28, 2022

1.28.0

Oct 11, 2022

1.27.0

Jul 7, 2022

1.26.1

May 23, 2022

1.26.0

May 19, 2022

1.25.1

Feb 6, 2022

1.25.0

Feb 5, 2022

1.24.0

Nov 24, 2021

1.23.0

Aug 15, 2021

1.22.0

Jun 4, 2021

1.21.2

May 28, 2021

1.21.1

May 26, 2021

1.21.0

May 10, 2021

1.20.2

May 3, 2021

1.20.1

Apr 30, 2021

1.20.0

Apr 28, 2021

1.19.0

Apr 3, 2021

1.18.1

Mar 9, 2021

1.18.0

Jan 13, 2021

1.17.0

Oct 31, 2020

1.16.0

Oct 13, 2020

1.15.2

Oct 12, 2020

1.15.1

Oct 7, 2020

1.15.0

Aug 19, 2020

1.14.1

Mar 7, 2020

1.14.0

Mar 7, 2020

1.13.0

Mar 4, 2020

1.12.1

Feb 14, 2020

1.12.0

Feb 13, 2020

1.11.0

Feb 12, 2020

1.10.0

Jan 10, 2020

1.9.0

Nov 26, 2019

1.8.2

Oct 24, 2019

1.8.1

Oct 14, 2019

1.8.0

Sep 17, 2019

1.7.0

Aug 16, 2019

1.6.4

Jun 12, 2019

1.6.3

Jun 3, 2019

1.6.2

May 9, 2019

1.6.1

May 6, 2019

1.6.0

May 2, 2019

1.5.3

Apr 1, 2019

1.5.2

Nov 16, 2018

1.5.1

Aug 2, 2018

1.5.0

Jul 31, 2018

1.4.1

May 2, 2018

1.4.0

May 2, 2018

1.3.0

Apr 24, 2018

1.2.0

Apr 18, 2018

1.1.1

Apr 18, 2018

1.1.0

Apr 18, 2018

1.0.10

Jan 13, 2018

1.0.9

Dec 20, 2017

1.0.8

Dec 1, 2017

1.0.7

Nov 29, 2017

1.0.6

Oct 19, 2017

1.0.5

Oct 16, 2017

1.0.4

Oct 12, 2017

1.0.3

Aug 16, 2017

1.0.2

Jul 28, 2017

1.0.1

Jul 27, 2017

1.0r2

Jul 17, 2017

1.0r1

Jul 14, 2017

1.0.0

Jul 27, 2017

1.0rc1 pre-release

Jul 24, 2017

1.0b2 pre-release

Jul 17, 2017

0.6.4

Dec 21, 2016

0.6.3

Dec 15, 2016

0.6.2

Sep 7, 2016

0.6.1

Sep 7, 2016

0.6.0

Jul 6, 2016

0.5.2

Jun 28, 2016

0.5.1

Jun 28, 2016

0.5.0

Jun 28, 2016

0.4.2

Jun 23, 2016

0.4.1

Jun 23, 2016

0.4.0

Jun 22, 2016

0.3.0

Jun 22, 2016

0.2.1

Jun 20, 2016

This version

0.2.0

Jun 20, 2016

0.1.0

Jun 16, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycldf-0.2.0.tar.gz (16.9 kB view hashes)

Uploaded Jun 20, 2016 Source

Hashes for pycldf-0.2.0.tar.gz

Hashes for pycldf-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`6ac932e69b1195d3de30478fb5063dd3c3ca572234de7f390d48d523773b45f6`
MD5	`489aeb76db74b4f7f0ca90f6a0deb02b`
BLAKE2b-256	`3b067d0eaed2bdfca3d200aa2f94c35b7eba4600825fbfca3d4803927b47902a`