Skip to main content

Turn a csv into a dictionary with a predefined schema.

Project description

Documentation

What is this?

This package defines base classes BaseCSVReader and BaseMultilineCSVReader. These can be used to iterate over a csv file and return its contents as a dictionary. Normally you should use the BaseCSVReader. The BaseMultilineCSVReader can be used when you run into problems with csv files that can have newline characters within a column, which could trip up the standard reader.

Example usage

You should write an own class that inherits from one of the base classes. The example.py file has an example. Basically it will be something like this:

from collective.csv2dict import BaseCSVReader, to_int, to_string

class ExampleCSVReader(BaseCSVReader):
    """Example csv reader class.

    We read three columns and skip one.
    """
    skip = [2]  # skip column index 2
    fields = [
        # The format is: (field name, filter method)
        ('id', to_int),
        ('fullname', to_string),
        ('email', to_string),
    ]

You can then use this class to read a csv file. The example.py file again has sample code to read the csv file and some options from the command line. Simply put, it boils down to this:

c = reader(open(filename, 'U'))
# Iterate over the entries and print them.
for entry in c:
    print entry
print '%d entries ignored due to errors.' % c.ignored
print '%d entries read without errors.' % c.success

It would turn this csv (contained in example.csv):

1,Maurits van Rees,ignored,maurits@example.org
2,Arthur Dent,ignored again,dentarthurdent@example.org

into this dictionary:

{'email': u'maurits@example.org',
 'fullname': u'Maurits van Rees',
 'id': 1}
{'email': u'dentarthurdent@example.org',
 'fullname': u'Arthur Dent',
 'id': 2}

Notes

  • It is recommented to always open a file in universal newline mode. This is usually the best way to avoid some potential problems with newlines within a single row.

  • The base reader tries to guess the encoding of the file in a simplistic way and will avoid breaking when no good encoding can be found.

  • The reader might ignore the first row of the csv file as it may be a header. We do a simple check for this: if none of the columns of the first row can be turned into an integer, then it is not a header line and it will be treated as data. If this logic does not work for you, then override the is_header method in your own class, simply like this:

    def is_header(self, items):
        return False

    That will make sure the first line is always treated as data. If you want it to always be treated as a header, just do return True.

  • You can override the prepare_iterable method if you need to do some fixes to some rows or the complete csv file before the reader starts to handle it. The BaseMultilineCSVReader has an example for this.

  • By default the excel csv dialect is used (or whatever your Python version has as default). If you want to use a specific dialect, you can override the dialect variable in your reader class. For example, you can use tabs as delimiter like this:

    import csv
    
    class MyDialect(csv.excel):
        delimiter = '\t'
    
    csv.register_dialect('mydialect', MyDialect)
    
    class ExampleCSVReader(BaseCSVReader):
        dialect = 'mydialect'
        fields = [...]

Compatibility

I have tried this on Python 2.6 and an earlier version on 2.4. It will likely work on all 2.x versions from 2.3 onwards.

Tested on Mac OS X so likely also working on any Unix-like system. Should work on Windows too, though I can imagine problems with newline characters in some corner cases.

Note for Plone users

I usually make packages for use in Plone, but this one can be used with plain Python. Nevertheless, a note for Plone users is probably good.

If you want to use it within your Plone buildout, just add it to the eggs in your buildout.cfg. You do not need to load zcml or install anything. You just need to write your own class definition, as in the example above. Then you probably want to write a browser view that uses this class to turn some uploaded csv file to a dictionary. Then you probably create a content item or a member for each item in this dictionary or do whatever you want with it.

Authors

  • Maurits van Rees (package creation, various improvements and generalizations)

  • Guido Wesdorp (initial code, written for a client way back in 2007)

Changelog

1.1 (2014-04-11)

  • Optionally allow ignoring extra columns. To use this: initialize the reader with ignore_extra_columns=True. [maurits]

  • Add formatting method to readers. It currently returns the delimiter, the dialect instance, the encoding and the expected number of columns. You can use this to give a hint in an upload form. [maurits]

1.0 (2012-06-21)

  • Initial release [maurits]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

collective.csv2dict-1.1.zip (24.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page