skip to navigation
skip to content

Not Logged In

collective.transmogrifier 1.5

A configurable pipeline, aimed at transforming content for import and export

**************
Transmogrifier
**************

.. contents::

Transmogrifier provides support for building pipelines that turn one thing
into another. Specifically, transmogrifier pipelines are used to convert and
import legacy content into a Plone site. It provides the tools to construct
pipelines from multiple sections, where each section processes the data
flowing through the pipe.

A "transmogrifier pipeline" refers to a description of a set of pipe sections,
slotted together in a set order. The stated goal is for these sections to
transform data and ultimately add content to a Plone site based on this data.
Sections deal with tasks such as sourcing the data (from textfiles, databases,
etc.) and characterset conversion, through to determining portal type,
location and workflow state.

Note that a transmogrifier pipeline can be used to process any number of
things, and is not specific to Plone content import. However, it's original
intent is to provide a pluggable way to import legacy content.

Installation
************

See docs/INSTALL.txt for installation instructions.

Credits
*******

Development sponsored by
    Elkjøp Nordic AS

Design and development
    `Martijn Pieters`_ at Jarn_

Project name
    A transmogrifier_ is fictional device used for transforming one object
    into another object. The term was coined by Bill Waterson of Calvin and
    Hobbes fame.

.. _Martijn Pieters: mailto:mj@jarn.com
.. _Jarn: http://www.jarn.com/
.. _Transmogrifier: http://en.wikipedia.org/wiki/Transmogrifier

Detailed Documentation
**********************

Pipelines
=========

To transmogrify, or import and convert non-plone content, you simply define a
pipeline. Pipe sections, the equivalent of parts in a buildout_, are slotted
together into a processing pipe. To slot sections together, you define a
configuration file, define named sections, and a main pipeline definition that
names the sections in order (one section per line):

    >>> exampleconfig = """\
    ... [transmogrifier]
    ... pipeline =
    ...     section 1
    ...     section 2
    ...     section 3
    ...
    ... [section 1]
    ... blueprint = collective.transmogrifier.tests.examplesource
    ... size = 5
    ...
    ... [section 2]
    ... blueprint = collective.transmogrifier.tests.exampletransform
    ...
    ... [section 3]
    ... blueprint = collective.transmogrifier.tests.exampleconstructor
    ... """

As you can see this is also very similar to how you construct WSGI pipelines
using paster. The format of the configuration files is defined by the Python
ConfigParser module, with extensions that we'll describe later. At minimum, at
least the transmogrifier section with an empty pipeline is required:

    >>> mimimalconfig = """\
    ... [transmogrifier]
    ... pipeline =
    ... """

Transmogrifier can load these configuration files either by looking them up
in a registry or by loading them from a python package.

You register transmogrifier configurations using the ``registerConfig``
directive in the http://namespaces.plone.org/transmogrifier namespace,
together with a name, and optionally a title and description::

  <configure
      xmlns="http://namespaces.zope.org/zope"
      xmlns:transmogrifier="http://namespaces.plone.org/transmogrifier"
      i18n_domain="collective.transmogrifier">

  <transmogrifier:registerConfig
      name="exampleconfig"
      title="Example pipeline configuration"
      description="This is an example pipeline configuration"
      configuration="example.cfg"
      />

  </configure>

You can then tell transmogrifier to load the 'exampleconfig' configuration. To
load configuration files directly from a python package, name the package and
the configuration file separated by a colon, such as
'collective.transmogrifier.tests:exampleconfig.cfg'.

Registering files with the transmogrifier registry allows other uses, such as
listing available configurations in a user interface, together with the
registered description. Loading files directly let's you build reusable
libraries of configuration files more quickly though.

In this document we'll use the shorthand *registerConfig* to register
example configurations:

    >>> registerConfig(u'collective.transmogrifier.tests.exampleconfig',
    ...                exampleconfig)

Pipeline sections
-----------------

Each section in the pipeline is created by a blueprint. Blueprints are looked
up as named utilities implementing the ISectionBlueprint interface. In the
transmogrifier configuration file, you refer to blueprints by the name under
which they are registered. Blueprints are factories; when called they produce
an ISection pipe section. ISections in turn, are iterators implementing the
`iterator protocol`_.

Here is a simple blueprint, in the form of a class definition:

    >>> from zope.interface import classProvides, implements
    >>> from zope.component import provideUtility
    >>> class ExampleTransform(object):
    ...     classProvides(ISectionBlueprint)
    ...     implements(ISection)
    ...
    ...     def __init__(self, transmogrifier, name, options, previous):
    ...         self.previous = previous
    ...         self.name = name
    ...
    ...     def __iter__(self):
    ...         for item in self.previous:
    ...             item['exampletransformname'] = self.name
    ...             yield item
    ...
    >>> provideUtility(ExampleTransform,
    ...                name=u'collective.transmogrifier.tests.exampletransform')

Note that we register this class as a named utility, and that instances of
this class can be used as an iterator. When slotted together, items 'flow'
through the pipeline by iterating over the last section, which in turn
iterates over it's preceding section (``self.previous`` in the example), and
so on.

By iterating over the source, then yielding the items again, each section
passes items on to the next section. During the iteration loop, sections can
manipulate the items. Note that items are python dictionaries; sections simply
operate on the keys they care about. In our example we add a new key,
``exampletransformname``, which we set to the name of the section.

Sources
~~~~~~~

The items that flow through the pipe have to originate from somewhere though.
This is where special sections, sources, come in. A source is simply a pipe
section that inserts extra items into the pipeline. This is best illustrated
with another example:

    >>> class ExampleSource(object):
    ...     classProvides(ISectionBlueprint)
    ...     implements(ISection)
    ...
    ...     def __init__(self, transmogrifier, name, options, previous):
    ...         self.previous = previous
    ...         self.size = int(options['size'])
    ...
    ...     def __iter__(self):
    ...         for item in self.previous:
    ...             yield item
    ...
    ...         for i in range(self.size):
    ...             yield dict(id='item%02d' % i)
    ...
    >>> provideUtility(ExampleSource,
    ...                name=u'collective.transmogrifier.tests.examplesource')

In this example we use the ``options`` dictionary to read options from the
section configuration, which in the example configuration we gave earlier has
the option ``size`` defined as 5. Note that the configuration values are
always strings, so we need to convert the size option to an integer here.

The source first iterates over the previous section and yields all items
unchanged. Only when that loop is done, does the source produce new items and
puts those into the pipeline. This order is important: when you slot multiple
source sections together, you want items produced by earlier sections to be
processed first too.

There is always a previous section, even for the first section defined in the
pipeline. Transmogrifier passes in a empty iterator when it instantiates this
first section, expecting such a first section to be a source that'll produce
items for the pipeline to process.

Constructors
~~~~~~~~~~~~

As stated before, transmogrifier is intended for importing content into a
Plone site. However, transmogrifier itself only drives the pipeline, inserting
an empty iterator and discarding whatever it pulls out of the last section.

In order to create content then, a constructor section is required. Like
source sections, you should be able to use multiple constructors, so
constructors should always start with yielding the items passed in from the
previous section on to a possible next section.

So, a constructor section is an ISection that consumes items from the previous
section, and affects the plone site based on items, usually by creating
content objects based on these items, then yield the item for a next section.
For example purposes, we simply pretty print the items instead:

    >>> import pprint
    >>> class ExampleConstructor(object):
    ...     classProvides(ISectionBlueprint)
    ...     implements(ISection)
    ...
    ...     def __init__(self, transmogrifier, name, options, previous):
    ...         self.previous = previous
    ...         self.pprint = pprint.PrettyPrinter().pprint
    ...
    ...     def __iter__(self):
    ...         for item in self.previous:
    ...             self.pprint(sorted(item.items()))
    ...             yield item
    ...
    >>> provideUtility(ExampleConstructor,
    ...                name=u'collective.transmogrifier.tests.exampleconstructor')

With this last section blueprint example completed, we can load the example
configuration we created earlier, and run our transmogrification:

    >>> from collective.transmogrifier.transmogrifier import Transmogrifier
    >>> transmogrifier = Transmogrifier(plone)
    >>> transmogrifier(u'collective.transmogrifier.tests.exampleconfig')
    [('exampletransformname', 'section 2'), ('id', 'item00')]
    [('exampletransformname', 'section 2'), ('id', 'item01')]
    [('exampletransformname', 'section 2'), ('id', 'item02')]
    [('exampletransformname', 'section 2'), ('id', 'item03')]
    [('exampletransformname', 'section 2'), ('id', 'item04')]

Developing blueprints
~~~~~~~~~~~~~~~~~~~~~

As we could see from the ISectionBlueprint examples above, a blueprint gets
called with several arguments: ``transmogrifier``, ``name``, ``options`` and
``previous``.

We discussed ``previous`` before, it is a reference to the previous pipe
section and must be looped over when the section itself is iterated. The
``name`` argument is simply the name of the section as given in the
configuration file.

The ``transmogrifier`` argument is a reference to the transmogrifier itself,
and it can be used to reach the context we are importing to through it's
``context`` attribute. The transmogrifier also acts as a dictionary, mapping
from section names to a mapping of the options in each section.

Finally, as seen before, the ``options`` argument is a mapping of the current
section options. It is the same mapping as can be had through
``transmogrifier[name]``.

A short example shows each of these arguments in action:

    >>> class TitleExampleSection(object):
    ...     classProvides(ISectionBlueprint)
    ...     implements(ISection)
    ...
    ...     def __init__(self, transmogrifier, name, options, previous):
    ...         self.transmogrifier = transmogrifier
    ...         self.name = name
    ...         self.options = options
    ...         self.previous = previous
    ...
    ...         pipeline = transmogrifier['transmogrifier']['pipeline']
    ...         pipeline_size = len([s.strip() for s in pipeline.split('\n')
    ...                              if s.strip()])
    ...         self.size = options['pipeline-size'] = str(pipeline_size)
    ...         self.site_title = transmogrifier.context.Title()
    ...
    ...     def __iter__(self):
    ...         for item in self.previous:
    ...             item['pipeline-size'] = self.size
    ...             item['title'] = '%s - %s' % (self.site_title, item['id'])
    ...             yield item
    >>> provideUtility(TitleExampleSection,
    ...                name=u'collective.transmogrifier.tests.titleexample')
    >>> titlepipeline = """\
    ... [transmogrifier]
    ... pipeline =
    ...     section1
    ...     titlesection
    ...     section3
    ...
    ... [section1]
    ... blueprint = collective.transmogrifier.tests.examplesource
    ... size = 5
    ...
    ... [titlesection]
    ... blueprint = collective.transmogrifier.tests.titleexample
    ...
    ... [section3]
    ... blueprint = collective.transmogrifier.tests.exampleconstructor
    ... """
    >>> registerConfig(u'collective.transmogrifier.tests.titlepipeline',
    ...                titlepipeline)
    >>> plone.Title()
    u'Plone Test Site'
    >>> transmogrifier = Transmogrifier(plone)
    >>> transmogrifier(u'collective.transmogrifier.tests.titlepipeline')
    [('id', 'item00'),
     ('pipeline-size', '3'),
     ('title', u'Plone Test Site - item00')]
    [('id', 'item01'),
     ('pipeline-size', '3'),
     ('title', u'Plone Test Site - item01')]
    [('id', 'item02'),
     ('pipeline-size', '3'),
     ('title', u'Plone Test Site - item02')]
    [('id', 'item03'),
     ('pipeline-size', '3'),
     ('title', u'Plone Test Site - item03')]
    [('id', 'item04'),
     ('pipeline-size', '3'),
     ('title', u'Plone Test Site - item04')]

Configuration file syntax
-------------------------

As mentioned earlier, the configuration files use the format
defined by the Python ConfigParser module with extensions. The
extensions are based on the zc.buildout extensions and are:

- option names are case sensitive

- option values can use a substitution syntax, described below, to
  refer to option values in specific sections.

- you can include other configuration files, see `Including other
  configurations`_.

The ConfigParser syntax is very flexible. Section names can contain any
characters other than newlines and right square braces ("]"). Option names can
contain any characters (within the ASCII character set) other than newlines,
colons, and equal signs, can not start with a space, and don't include
trailing spaces.

It is a good idea to keep section and option names simple, sticking to
alphanumeric characters, hyphens, and periods.

Variable substitution
~~~~~~~~~~~~~~~~~~~~~

Transmogrifier supports a string.Template-like syntax for variable
substitution, using both the section and the option name joined by a colon:

    >>> substitutionexample = """\
    ... [transmogrifier]
    ... pipeline =
    ...     section1
    ...     section2
    ...     section3
    ...
    ... [definitions]
    ... item_count = 3
    ...
    ... [section1]
    ... blueprint = collective.transmogrifier.tests.examplesource
    ... size = ${definitions:item_count}
    ...
    ... [section2]
    ... blueprint = collective.transmogrifier.tests.exampletransform
    ...
    ... [section3]
    ... blueprint = collective.transmogrifier.tests.exampleconstructor
    ... """
    >>> registerConfig(u'collective.transmogrifier.tests.substitutionexample',
    ...                substitutionexample)

    Here we created an extra section called definitions, and refer to the
    item_count option defined in that section to set the size of the section1
    pipeline section, so we only get 3 items when we execute this pipeline:

    >>> transmogrifier = Transmogrifier(plone)
    >>> transmogrifier(u'collective.transmogrifier.tests.substitutionexample')
    [('exampletransformname', 'section2'), ('id', 'item00')]
    [('exampletransformname', 'section2'), ('id', 'item01')]
    [('exampletransformname', 'section2'), ('id', 'item02')]

Including other configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can include other transmogrifier configurations with the ``include``
option in the transmogrifier section. This option takes a list of
configuration ids, separated by whitespace. All sections and options from
those configuration files will be included provided the options weren't
already present. This works recursively; inclusions in the included
configuration files are honoured too:

    >>> inclusionexample = """\
    ... [transmogrifier]
    ... include =
    ...     collective.transmogrifier.tests.sources
    ...     collective.transmogrifier.tests.base
    ...
    ... [section1]
    ... size = 3
    ... """
    >>> registerConfig(u'collective.transmogrifier.tests.inclusionexample',
    ...                inclusionexample)
    >>> sources = """\
    ... [section1]
    ... blueprint = collective.transmogrifier.tests.examplesource
    ... size = 10
    ... """
    >>> registerConfig(u'collective.transmogrifier.tests.sources',
    ...                sources)
    >>> base = """\
    ... [transmogrifier]
    ... pipeline =
    ...     section1
    ...     section2
    ...     section3
    ... include = collective.transmogrifier.tests.constructor
    ...
    ... [section2]
    ... blueprint = collective.transmogrifier.tests.exampletransform
    ... """
    >>> registerConfig(u'collective.transmogrifier.tests.base',
    ...                base)
    >>> constructor = """\
    ... [section3]
    ... blueprint = collective.transmogrifier.tests.exampleconstructor
    ... """
    >>> registerConfig(u'collective.transmogrifier.tests.constructor',
    ...                constructor)
    >>> transmogrifier = Transmogrifier(plone)
    >>> transmogrifier(u'collective.transmogrifier.tests.inclusionexample')
    [('exampletransformname', 'section2'), ('id', 'item00')]
    [('exampletransformname', 'section2'), ('id', 'item01')]
    [('exampletransformname', 'section2'), ('id', 'item02')]

Like zc.buildout configurations, we can also add or remove lines from included
configuration options, by using the += and -= syntax:

    >>> advancedinclusionexample = """\
    ... [transmogrifier]
    ... include =
    ...     collective.transmogrifier.tests.inclusionexample
    ... pipeline -=
    ...     section2
    ...     section3
    ... pipeline +=
    ...     section4
    ...     section3
    ...
    ... [section4]
    ... blueprint = collective.transmogrifier.tests.titleexample
    ... """
    >>> registerConfig(u'collective.transmogrifier.tests.advancedinclusionexample',
    ...                advancedinclusionexample)
    >>> transmogrifier = Transmogrifier(plone)
    >>> transmogrifier(u'collective.transmogrifier.tests.advancedinclusionexample')
    [('id', 'item00'),
     ('pipeline-size', '3'),
     ('title', u'Plone Test Site - item00')]
    [('id', 'item01'),
     ('pipeline-size', '3'),
     ('title', u'Plone Test Site - item01')]
    [('id', 'item02'),
     ('pipeline-size', '3'),
     ('title', u'Plone Test Site - item02')]

When calling transmogrifier, you can provide your own sections too: any extra
keyword is interpreted as a section dictionary. Do make sure you use string
values though:

    >>> transmogrifier(u'collective.transmogrifier.tests.inclusionexample',
    ...               section1=dict(size='1'))
    [('exampletransformname', 'section2'), ('id', 'item00')]

Conventions
-----------

At its most basic level, transmogrifier pipelines are just iterators passing
'things' around. Transmogrifier doesn't expect anything more than being able
to iterate over the pipeline and doesn't dictate what happens within that
pipeline, what defines a 'thing' or what ultimately gets accomplished.

But as has been stated repeatedly, transmogrifier has been developed to
facilitate importing legacy content, processing data in incremental steps
until a final section constructs new content.

To reach this end, several conventions have been established that help the
various pipeline sections work together.

Items are mappings
~~~~~~~~~~~~~~~~~~

The first one is that the 'things' passed from section to section are
mappings; i.e. they are or behave just like python dictionaries. Again,
transmogrifier doesn't produce these by itself, source sections (see Sources_)
produce them by injecting them into the stream.

Keys are fields
~~~~~~~~~~~~~~~

Secondly, *all* keys in such mappings that do not start with an underscore
will be used by constructor sections (see Constructors_) to construct Plone
content. So keys that do not start with an underscore are expected to map to
Archetypes fields or Zope3 schema fields or whatever the constructor expects.

Paths are to the target object
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Many sections either create objects (constructors) or operate on
already-constructed or pre-existing objecs. Such sections should interpret
paths as the complete path for the object. For constructors this means they'll
need to split the path into a container path and an id in order for them to
find the correct context for constructing the object.

Keys with a leading underscore are controllers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This leaves the keys that do start with a leading underscore to have special
meaning to specific sections, allowing earlier pipeline sections to inject
'control statements' for later sections in the item mapping. To avoid name
clashes, sections that do expect such controller keys should use prefixes
based on the name under which their blueprint was registered, plus optionally
the name of the pipe section. This allows for precise targeting of pipe
sections when inserting such keys.

We'll illustrate this with an example. Let's say a source section loads news
items from a database, but the database tables for such items hold filenames
to point to binary image data. Rather than have this section load those
filenames directly and add them to the item for image creation, a generic
'file loader' section is used to do this. Let's suppose that this file loader
is registered as ``acme.transmogrifier.fileloader``. This section then could
be instructed to load files and store them in a named key by using 2
'controller' keys named ``_acme.transmogrifier.fileloader_filename`` and
``_acme.transmogrifier.fileloader_targetkey``. If the source section were to
create pipeline items with those keys, this later fileloader section would
then automatically load the filenames and inject them into the items in the
right location.

If you need 2 such loaders, you can target them each individually by including
their section names; so to target just the ``imageloader1`` section you'd use
the keys ``_acme.transmogrifier.fileloader_imageloader1_filename`` and
``_acme.transmogrifier.fileloader_imageloader1_targetkey``. Sections that
support such targeting should prefer such section specific keys over those
only using the blueprint name.

The collective.transmogrifier.utils module has a handy utility method called
``defaultKeys`` that'll generate these keys for you for easy matching:

    >>> from collective.transmogrifier import utils
    >>> keys = utils.defaultKeys('acme.transmogrifier.fileloader',
    ...                          'imageloader1', 'filename')
    >>> pprint.pprint(keys)
    ('_acme.transmogrifier.fileloader_imageloader1_filename',
     '_acme.transmogrifier.fileloader_filename',
     '_imageloader1_filename',
     '_filename')
    >>> utils.Matcher(*keys)('_filename', '_imageloader1_filename')
    ('_imageloader1_filename', True)


Keep memory use to a minimum
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The above example is a little contrived of course; you'd generally configure a
file loader section with a key name to grab the filename from, and perhaps put
the loader *after* the constructor section and load the image data straight
into the already constructed content item instead. This lowers memory
requirements as image data can go directly into the ZODB this way, and the
content object can be deactivated after the binary data has been stored.

By operating on one item at a time, a transmogrifier pipeline can handle huge
numbers of content without breaking memory limits; individual sections should
also avoid using memory unnecessarily.

Previous sections go first
~~~~~~~~~~~~~~~~~~~~~~~~~~

As mentioned in the Sources_ section, when inserting new items into the
stream, generally previous pipe sections come first. This way someone
constructing a pipeline knows what source section will be processed earlier
(those slotted earlier in the pipeline) and can adjust expectations
accordingly. This makes content construction more predictable when dealing
with multiple sources.

An exception would be a Folder Source, which inserts additional Folder items
into the pipeline to ensure that the required container for any given content
item exists at construction time. Such a source would inject extra items as
needed, not before or after the previous source section.

Iterators have 3 stages
~~~~~~~~~~~~~~~~~~~~~~~

Some tasks have to happen before the pipeline runs, or after all content has
been created. In such cases it is handy to realise that iteration within a
section consists of three stages: before iteration, iteration itself, and
after iteration.

For example, a section creating references may have to wait for all content to
be created before it can insert the references. In this case it could build a
queue during iteration, and only when the previous pipe section has been
exhausted and the last item has been yielded would the section reach into the
portal and create all the references.

Sources following the `Previous sections go first`_ convention basically
inject the new items in the after iteration stage.

Here's a piece of psuedo code to illustrate these 3 stages::

    def __iter__(self):
        # Before iteration
        # You can do initialisation here

        for item in self.previous
            # Iteration itself
            # You could process the items, take notes, inject additional
            # items based on the current item in the pipe or manipulate portal
            # content created by previous items
            yield item

        # After iteration
        # The section still has control here and could inject additional
        # items, manipulate all portal content created by the pipeline,
        # or clean up after itself.

You can get quite creative with this. For example, the reference creator could
get quite creative and defer creation of references until it knew the
referenced object has been created too and periodically create these
references. This would keep memory requirements smaller as not *all*
references to create have to be remembered.

Store pipeline-wide information in annotations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If, for some reason or other, you need to remember state across section
instances that is pipeline-wide (such as database connections, or data
counters), such information should be stored as annotations on the transmogrifier object::

    from zope.annotation.interfaces import IAnnotations

    MYKEY = 'foo.bar.baz'

    def __init__(self, transmogrifier, name, options, previous):
        self.storage = IAnnotations(transmogrifier).setdefault(MYKEY, {})
        self.storage.setdefault('spam', 0)
        ...

    def __iter__(self):
        ...
        self.storage['spam'] += 1
        ...

.. _buildout: http://pypi.python.org/pypi/zc.buildout
.. _iterator protocol: http://www.python.org/dev/peps/pep-0234/


GenericSetup import integration
===============================

To ease running a transmogrifier pipeline during site configuration, a generic
import step for GenericSetup is included.

The import step looks for a file named ``transmogrifier.txt`` and reads
pipeline configuration names from this file, one name per line. Empty lines
and lines starting with a # (hash mark) are skipped. These pipelines are then
executed in the same order as they are found in the file.

This means that if you want to run one or more pipelines as part of a
GenericSetup profile, all you have to do is name these pipelines in a file
named ``transmogrifier.txt`` in your profile directory.

The GenericSetup import context is stored on the transmogrifier as an
annotation::

    from collective.transmogrifier.genericsetup import IMPORT_CONTEXT
    from zope.annotation.interfaces import IAnnotations

    def __init__(self, transmogrifier, name, options, previous):
        self.import_context = IAnnotations(transmogrifier)[IMPORT_CONTEXT]

This will of course prevent your code from running outside the generic setup
import context.


Default section blueprints
**************************
Constructor section
===================

A constructor pipeline section is the heart of a transmogrifier content import
pipeline. It constructs Plone content based on the items it processes. The
constructor section blueprint name is
``collective.transmogrifier.sections.constructor``. Constructor sections do
only one thing, they construct *new* content. No schema changes are made.
Also, constructors create content without restrictions, no security checks or
containment constraints are checked.

Construction needs 2 pieces of information: the path to the item (including
the id for the new item itself) and it's portal type. To determine both of
these, the constructor section inspects each item and looks for 2 keys, as
described below. Any item missing any of these 2 pieces will be skipped.
Similarly, items with a path for a container or type that doesn't exist will
be skipped as well; make sure that these containers are constructed
beforehand. Because a constructor section will only construct new objects, if
an object with the same path already exists, the item will also be skipped.

For the object path, it'll look (in order) for
``_collective.transmogrifier.sections.constructor_[sectionname]_path``,
``_collective.transmogrifier.sections.constructor_path``,
``_[sectionname]_path``, and ``_path``, where ``[sectionname]`` is replaced
with the name given to the current section. This allows you to target the
right section precisely if needed. Alternatively, you can specify what key to
use for the path by specifying the ``path-key`` option, which should be a list
of keys to try (one key per line, use a ``re:`` or ``regexp:`` prefix to
specify regular expressions).

For the portal type, use the ``type-key`` option to specify a set of keys just
like ``path-key``. If omitted, the constructor will look for
``_collective.transmogrifier.sections.constructor_[sectionname]_type``,
``_collective.transmogrifier.sections.constructor_type``,
``_[sectionname]_type``, ``_type``, ``portal_type`` and ``Type`` (in that
order, with ``[sectionname]`` replaced).

Unicode paths will be encoded to ASCII. Using the path and type, a new object
will be constructed using invokeFactory; nothing else is done. Paths are
always interpreted as relative to the context object, with the last path
segment being the id of the object to create.

By default the constructor section will log a warning if the container for
the item is missing and the item can't be constructed. However if you add a
required = True key to the constructor section it will instead raise a KeyError.

    >>> import pprint
    >>> constructor = """
    ... [transmogrifier]
    ... pipeline =
    ...     contentsource
    ...     constructor
    ...     logger
    ...
    ... [contentsource]
    ... blueprint = collective.transmogrifier.sections.tests.contentsource
    ...
    ... [constructor]
    ... blueprint = collective.transmogrifier.sections.constructor
    ...
    ... [logger]
    ... blueprint = collective.transmogrifier.sections.logger
    ... name = logger
    ... level = INFO
    ... """
    >>> registerConfig(u'collective.transmogrifier.sections.tests.constructor',
    ...                constructor)
    >>> transmogrifier(u'collective.transmogrifier.sections.tests.constructor')
    >>> print handler
    logger INFO
      {'_path': '/eggs/foo', '_type': 'FooType'}
    logger INFO
      {'_path': '/spam/eggs/foo', '_type': 'FooType'}
    logger INFO
      {'_path': '/foo', '_type': 'FooType'}
    logger INFO
      {'_path': u'/unicode/encoded/to/ascii', '_type': 'FooType'}
    logger INFO
        {'_path': 'not/existing/bar',
       '_type': 'BarType',
       'title': 'Should not be constructed, not an existing path'}
    logger INFO
        {'_path': '/spam/eggs/existing',
       '_type': 'FooType',
       'title': 'Should not be constructed, an existing object'}
    logger INFO
        {'_path': '/spam/eggs/incomplete',
       'title': 'Should not be constructed, no type'}
    logger INFO
        {'_path': '/spam/eggs/nosuchtype',
       '_type': 'NonExisting',
       'title': 'Should not be constructed, not an existing type'}
    logger INFO
        {'_path': 'spam/eggs/changedByFactory',
       '_type': 'FooType',
       'title': 'Factories are allowed to change the id'}
    >>> pprint.pprint(plone.constructed)
    [('eggs', 'foo', 'FooType'),
     ('spam/eggs', 'foo', 'FooType'),
     ('', 'foo', 'FooType'),
     ('unicode/encoded/to', 'ascii', 'FooType'),
     ('spam/eggs', 'changedByFactory', 'FooType')]

    >>> constructor = """
    ... [transmogrifier]
    ... pipeline =
    ...     contentsource
    ...     constructor
    ...     logger
    ...
    ... [contentsource]
    ... blueprint = collective.transmogrifier.sections.tests.contentsource
    ...
    ... [constructor]
    ... blueprint = collective.transmogrifier.sections.constructor
    ... required = True
    ...
    ... [logger]
    ... blueprint = collective.transmogrifier.sections.logger
    ... name = logger
    ... level = INFO
    ... """
    >>> registerConfig(u'collective.transmogrifier.sections.tests.constructor2',
    ...                constructor)
    >>> handler.clear()
    >>> try:
    ...     transmogrifier(u'collective.transmogrifier.sections.tests.constructor2')
    ...     raise AssertionError("Required constructor did not raise an error for missing folder")
    ... except KeyError:
    ...     pass
    >>> print handler
    logger INFO
      {'_path': '/eggs/foo', '_type': 'FooType'}
    logger INFO
      {'_path': '/spam/eggs/foo', '_type': 'FooType'}
    logger INFO
      {'_path': '/foo', '_type': 'FooType'}
    logger INFO
      {'_path': u'/unicode/encoded/to/ascii', '_type': 'FooType'}


Folders section
===============

The ``collective.transmogrifier.sections.constructor`` blueprint can construct
new content, based on a type (``_type`` key) and a path (``_path`` key).
However, it will bail if it is asked to create an item for which the parent
folder does not exist.

One way to work around this is to ensure that the folders already exist, for
example by sending the instruction to construct them through the pipeline
before any contents of that folder. This requires sorted input, of course.

Alternatively, you can use the ``collective.transmogrifier.sections.folders``
blueprint. This will look at the path of each incoming item and construct
parent folders if needed. This implies that all folders (that do not yet
exist), are of the same type. That type defaults to ``Folder``, although you
can supply an alternative type. The folder will be created without an id only,
but a subsequent schema updated section for a subsequent item may have the
opportunity to update it (but not change its type.)

This blueprint can take the following options, all of the optional:

``path-key``
    The name of the key holding the path. This defaults to the same semantics
    as those used for the constructor section. Just use ``_path`` and you'll
    be OK.
``new-type-key``
    The type key to use when inserting a new item in the pipeline to create
    folders. The default is ``_type``. Change it if you need to target a
    specific constructor section.
``new-path-key``
    The path key to use when inserting a new item in the pipeline to create
    folders. The default is to use the same as the incoming path key. Change
    it if you need to target a specific constructor section.
``folder-type``
    The name of the portal type to use for new folders. Defaults to
    ``Folder``, which is the default folder type in CMF and Plone.
``cache``
    By default, the section will keep a cache in memory of each folder it has
    checked (and possibly created) to know whether it already exists. This
    saves a lot of traversal, especially if you have many items under a
    particular folder. This will use a small amount of memory. If you have
    millions of objects, you can trade memory for speed by setting this option
    to false.

Here is how it might look by default:

    >>> import pprint
    >>> constructor = """
    ... [transmogrifier]
    ... pipeline =
    ...     contentsource
    ...     folders
    ...     logger
    ...
    ... [contentsource]
    ... blueprint = collective.transmogrifier.sections.tests.folderssource
    ...
    ... [folders]
    ... blueprint = collective.transmogrifier.sections.folders
    ...
    ... [logger]
    ... blueprint = collective.transmogrifier.sections.logger
    ... name = logger
    ... level = INFO
    ... """
    >>> registerConfig(u'collective.transmogrifier.sections.tests.folders',
    ...                constructor)
    >>> transmogrifier(u'collective.transmogrifier.sections.tests.folders')
    >>> print handler
    logger INFO
        {'_path': '/foo', '_type': 'Document'}
    logger INFO
        {'_path': '/existing/foo', '_type': 'Document'}
    logger INFO
        {'_path': '/nonexisting', '_type': 'Folder'}
    logger INFO
        {'_path': '/nonexisting/alpha', '_type': 'Folder'}
    logger INFO
        {'_path': '/nonexisting/alpha/foo', '_type': 'Document'}
    logger INFO
        {'_path': '/nonexisting/beta', '_type': 'Folder'}
    logger INFO
        {'_path': '/nonexisting/beta/foo', '_type': 'Document'}
    logger INFO
        {'_type': 'Document'}
    logger INFO
        {'_folders_path': '/delta', '_type': 'Folder'}
    logger INFO
        {'_folders_path': '/delta/foo', '_type': 'Document'}

To specify alternate types and keys, we can do something like this:

    >>> import pprint
    >>> constructor = """
    ... [transmogrifier]
    ... pipeline =
    ...     contentsource
    ...     folders
    ...     logger
    ...
    ... [contentsource]
    ... blueprint = collective.transmogrifier.sections.tests.folderssource
    ...
    ... [folders]
    ... blueprint = collective.transmogrifier.sections.folders
    ... folder-type = My Folder
    ... new-type-key = '_folderconstructor_type
    ... new-path-key = '_folderconstructor_path
    ...
    ... [logger]
    ... blueprint = collective.transmogrifier.sections.logger
    ... name = logger
    ... level = INFO
    ... """
    >>> registerConfig(u'collective.transmogrifier.sections.tests.folders2',
    ...                constructor)
    >>> handler.clear()
    >>> plone.exists.clear()
    >>> transmogrifier(u'collective.transmogrifier.sections.tests.folders2')
    >>> print handler
    logger INFO
      {'_path': '/foo', '_type': 'Document'}
    logger INFO
      {'_path': '/existing/foo', '_type': 'Document'}
    logger INFO
        {"'_folderconstructor_path": '/nonexisting',
       "'_folderconstructor_type": 'My Folder'}
    logger INFO
        {"'_folderconstructor_path": '/nonexisting/alpha',
       "'_folderconstructor_type": 'My Folder'}
    logger INFO
        {'_path': '/nonexisting/alpha/foo', '_type': 'Document'}
    logger INFO
        {"'_folderconstructor_path": '/nonexisting/beta',
       "'_folderconstructor_type": 'My Folder'}
    logger INFO
        {'_path': '/nonexisting/beta/foo', '_type': 'Document'}
    logger INFO
        {'_type': 'Document'}
    logger INFO
        {"'_folderconstructor_path": '/delta',
        "'_folderconstructor_type": 'My Folder'}
    logger INFO
        {'_folders_path': '/delta/foo', '_type': 'Document'}


Codec section
=============

A codec pipeline section lets you alter the character encoding of item
values, allowing you to recode text from and to unicode and any of the
codecs supported by python. The codec section blueprint name is
``collective.transmogrifier.sections.codec``.

What values to recode is determined by the ``keys`` option, which takes a set
of newline-separated key names. If a key name starts with ``re:`` or
``regexp:`` it is treated as a regular expression instead.

The optional ``from`` and ``to`` options determine what codecs values are
recoded from and to. Both these values default to ``unicode``, meaning no
translation. If either option is set to ``default``, the current default
encoding of the Plone site is used.

To deal with possible encoding errors, you can set the error handler of both
the ``from`` and ``to`` codecs separately with the ``from-error-handler`` and
``to-error-handler`` options, respectively. These default to ``strict``, but
can be set to any error handler supported by python, including ``replace`` and
``ignore``.

Also optional is the ``condition`` option, which lets you specify a TALES
expression that when evaluating to False will prevent any en- or decoding from
happening. The condition is evaluated for every matched key.

    >>> codecs = """
    ... [transmogrifier]
    ... pipeline =
    ...     source
    ...     decode-all
    ...     encode-id
    ...     encode-title
    ...     logger
    ...
    ... [source]
    ... blueprint = collective.transmogrifier.sections.tests.samplesource
    ... encoding = utf8
    ...
    ... [decode-all]
    ... blueprint = collective.transmogrifier.sections.codec
    ... keys = re:.*
    ... from = utf8
    ...
    ... [encode-id]
    ... blueprint = collective.transmogrifier.sections.codec
    ... keys = id
    ... to = ascii
    ...
    ... [encode-title]
    ... blueprint = collective.transmogrifier.sections.codec
    ... keys = title
    ... to = ascii
    ... to-error-handler = backslashreplace
    ... condition = python:'Brand' not in item['title']
    ...
    ... [logger]
    ... blueprint = collective.transmogrifier.sections.logger
    ... name = logger
    ... level = INFO
    ... """
    >>> registerConfig(u'collective.transmogrifier.sections.tests.codecs',
    ...                codecs)
    >>> transmogrifier(u'collective.transmogrifier.sections.tests.codecs')
    >>> print handler
    logger INFO
        {'id': 'foo', 'status': u'\u2117', 'title': 'The Foo Fighters \\u2117'}
    logger INFO
        {'id': 'bar', 'status': u'\u2122', 'title': u'Brand Chocolate Bar \u2122'}
    logger INFO
        {'id': 'monty-python', 'status': u'\xa9', 'title': "Monty Python's Flying Circus \\xa9"}

The ``condition`` expression has access to the following:

=================== ==========================================================
 ``item``            the current pipeline item
 ``key``             the name of the matched key
 ``match``           if the key was matched by a regular expression, the match
                     object, otherwise boolean True
 ``transmogrifier``  the transmogrifier
 ``name``            the name of the splitter section
 ``options``         the splitter options
 ``modules``         sys.modules
=================== ==========================================================


Inserter section
================

An inserter pipeline section lets you define a key and value to insert into
pipeline items. The inserter section blueprint name is
``collective.transmogrifier.sections.inserter``.

A inserter section takes a ``key`` and a ``value`` TALES expression. These
expressions are evaluated to generate the actual key-value pair that gets
inserted. You can also specify an optional ``condition`` option; if given, the
key only gets inserted when the condition, which is also a TALES is true.

Because the inserter ``value`` expression has access to the original item, it
could even be used to change existing item values. Just target an existing
key, pull out the original value in the value expression and return a modified
version.

    >>> inserter = """
    ... [transmogrifier]
    ... pipeline =
    ...     source
    ...     simple-insertion
    ...     expression-insertion
    ...     transform-id
    ...     logger
    ...
    ... [source]
    ... blueprint = collective.transmogrifier.sections.tests.rangesource
    ... size = 3
    ...
    ... [simple-insertion]
    ... blueprint = collective.transmogrifier.sections.inserter
    ... key = string:foo
    ... value = string:bar (inserted into "${item/id}" by the "$name" section)
    ...
    ... [expression-insertion]
    ... blueprint = collective.transmogrifier.sections.inserter
    ... key = python:'foo-%s' % item['id'][-2:]
    ... value = python:int(item['id'][-2:]) * 15
    ... condition = python:int(item['id'][-2:])
    ...
    ... [transform-id]
    ... blueprint = collective.transmogrifier.sections.inserter
    ... key = string:id
    ... value = string:foo-${item/id}
    ...
    ... [logger]
    ... blueprint = collective.transmogrifier.sections.logger
    ... name = logger
    ... level = INFO
    ... """
    >>> registerConfig(u'collective.transmogrifier.sections.tests.inserter',
    ...                inserter)
    >>> transmogrifier(u'collective.transmogrifier.sections.tests.inserter')
    >>> print handler
    logger INFO
        {'foo': 'bar (inserted into "item-00" by the "simple-insertion" section)',
        'id': 'foo-item-00'}
    logger INFO
        {'foo': 'bar (inserted into "item-01" by the "simple-insertion" section)',
        'foo-01': 15,
        'id': 'foo-item-01'}
    logger INFO
        {'foo': 'bar (inserted into "item-02" by the "simple-insertion" section)',
        'foo-02': 30,
        'id': 'foo-item-02'}

The ``key``, ``value`` and ``condition`` expressions have access to the
following:

=================== ==========================================================
 ``item``            the current pipeline item
 ``transmogrifier``  the transmogrifier
 ``name``            the name of the inserter section
 ``options``         the inserter options
 ``modules``         sys.modules
 ``key``             (only for the value and condition expressions) the key
                     being inserted
=================== ==========================================================


Condition section
=================

A condition pipeline section lets you selectively discard items from the
pipeline. The condition section blueprint name is
``collective.transmogrifier.sections.condition``.

A condition section takes a ``condition`` TALES expression. When this
expression when matched against the current item is True, the item is yielded
to the next pipe section, otherwise it is not:

    >>> condition = """
    ... [transmogrifier]
    ... pipeline =
    ...     source
    ...     condition
    ...     logger
    ...
    ... [source]
    ... blueprint = collective.transmogrifier.sections.tests.rangesource
    ... size = 5
    ...
    ... [condition]
    ... blueprint = collective.transmogrifier.sections.condition
    ... condition = python:int(item['id'][-2:]) > 2
    ...
    ... [logger]
    ... blueprint = collective.transmogrifier.sections.logger
    ... name = logger
    ... level = INFO
    ... """
    >>> registerConfig(u'collective.transmogrifier.sections.tests.condition',
    ...                condition)
    >>> transmogrifier(u'collective.transmogrifier.sections.tests.condition')
    >>> print handler
    logger INFO
        {'id': 'item-03'}
    logger INFO
        {'id': 'item-04'}

The ``condition`` expression has access to the following:

=================== ==========================================================
 ``item``            the current pipeline item
 ``transmogrifier``  the transmogrifier
 ``name``            the name of the splitter section
 ``options``         the splitter options
 ``modules``         sys.modules
=================== ==========================================================

As condition sections skip items in the pipeline, they should not be used
inside a splitter section!


Manipulator section
===================

A manipulator pipeline section lets you copy, move or discard keys from the
pipeline. The manipulator section blueprint name is
``collective.transmogrifier.sections.manipulator``.

A manipulator section will copy keys when you specify a set of keys to copy,
and an expression to determine what to copy these to. These are the ``keys``
and ``destination`` options.

The ``keys`` option is a set of key names, one on each line; keynames starting
with ``re:`` or ``regexp:`` are treated as regular expresions. The
``destination`` expression is a TALES expression that can access not only the
item, but also the matched key and, if a regular expression was used, the
match object.

If a ``delete`` option is specified, it is also interpreted as a set of keys,
like the ``keys`` option. These keys will be deleted from the item; if used
together with the ``keys`` and ``destination`` options, keys will be renamed
instead of copied.

Also optional is the ``condition`` option, which lets you specify a TALES
expression that when evaluating to False will prevent any manipulation from
happening. The condition is evaluated for every matched key.

    >>> manipulator = """
    ... [transmogrifier]
    ... pipeline =
    ...     source
    ...     copy
    ...     rename
    ...     delete
    ...     logger
    ...
    ... [source]
    ... blueprint = collective.transmogrifier.sections.tests.samplesource
    ...
    ... [copy]
    ... blueprint = collective.transmogrifier.sections.manipulator
    ... keys =
    ...     title
    ...     id
    ... destination = string:$key-copy
    ...
    ... [rename]
    ... blueprint = collective.transmogrifier.sections.manipulator
    ... keys = re:([^-]+)-copy$
    ... destination = python:'%s-duplicate' % match.group(1)
    ... delete = ${rename:keys}
    ...
    ... [delete]
    ... blueprint = collective.transmogrifier.sections.manipulator
    ... delete = status
    ...
    ... [logger]
    ... blueprint = collective.transmogrifier.sections.logger
    ... name = logger
    ... level = INFO
    ... """
    >>> registerConfig(u'collective.transmogrifier.sections.tests.manipulator',
    ...                manipulator)
    >>> transmogrifier(u'collective.transmogrifier.sections.tests.manipulator')
    >>> print handler
    logger INFO
        {'id': 'foo',
        'id-duplicate': 'foo',
        'title': u'The Foo Fighters \u2117',
        'title-duplicate': u'The Foo Fighters \u2117'}
    logger INFO
        {'id': 'bar',
        'id-duplicate': 'bar',
        'title': u'Brand Chocolate Bar \u2122',
        'title-duplicate': u'Brand Chocolate Bar \u2122'}
    logger INFO
        {'id': 'monty-python',
        'id-duplicate': 'monty-python',
        'title': u"Monty Python's Flying Circus \xa9",
        'title-duplicate': u"Monty Python's Flying Circus \xa9"}
    >>> handler.clear()

The ``destination`` expression has access to the following:

=================== ==========================================================
 ``item``            the current pipeline item
 ``key``             the name of the matched key
 ``match``           if the key was matched by a regular expression, the match
                     object, otherwise boolean True
 ``transmogrifier``  the transmogrifier
 ``name``            the name of the splitter section
 ``options``         the splitter options
 ``modules``         sys.modules
=================== ==========================================================


Splitter section
================

A splitter pipeline section lets you branch a pipeline into 2 or more
sub-pipelines. The splitter section blueprint name is
``collective.transmogrifier.sections.splitter``.

A splitter section takes 2 or more pipeline definitions, and sends the items
from the previous section through each of these sub-pipelines, each with it's
own copy [*]_ of the items:

    >>> emptysplitter = """
    ... [transmogrifier]
    ... pipeline =
    ...     source
    ...     splitter
    ...     logger
    ...
    ... [source]
    ... blueprint = collective.transmogrifier.sections.tests.rangesource
    ... size = 3
    ...
    ... [splitter]
    ... blueprint = collective.transmogrifier.sections.splitter
    ... pipeline-1 =
    ... pipeline-2 =
    ...
    ... [logger]
    ... blueprint = collective.transmogrifier.sections.logger
    ... name = logger
    ... level = INFO
    ... """
    >>> registerConfig(u'collective.transmogrifier.sections.tests.emptysplitter',
    ...                emptysplitter)
    >>> transmogrifier(u'collective.transmogrifier.sections.tests.emptysplitter')
    >>> print handler
    logger INFO
        {'id': 'item-00'}
    logger INFO
        {'id': 'item-00'}
    logger INFO
        {'id': 'item-01'}
    logger INFO
        {'id': 'item-01'}
    logger INFO
        {'id': 'item-02'}
    logger INFO
        {'id': 'item-02'}

Although the pipeline definitions in the splitter are empty, we end up with 2
copies of every item in the pipeline as both splitter pipelines get to process
a copy. Splitter pipelines are defined by options starting with ``pipeline-``.

Normally you'll use conditions to identify items for each sub-pipe, making the
splitter the pipeline equivalent of an if/elif statement. Conditions are
optional and use the pipeline option name plus ``-condition``:

    >>> evenoddsplitter = """
    ... [transmogrifier]
    ... pipeline =
    ...     source
    ...     splitter
    ...     logger
    ...
    ... [source]
    ... blueprint = collective.transmogrifier.sections.tests.rangesource
    ... size = 3
    ...
    ... [splitter]
    ... blueprint = collective.transmogrifier.sections.splitter
    ... pipeline-even-condition = python:int(item['id'][-2:]) % 2
    ... pipeline-even = even-section
    ... pipeline-odd-condition = not:${splitter:pipeline-even-condition}
    ... pipeline-odd = odd-section
    ...
    ... [odd-section]
    ... blueprint = collective.transmogrifier.sections.inserter
    ... key = string:even
    ... value = string:The even pipe
    ...
    ... [even-section]
    ... blueprint = collective.transmogrifier.sections.inserter
    ... key = string:odd
    ... value = string:The odd pipe
    ...
    ... [logger]
    ... blueprint = collective.transmogrifier.sections.logger
    ... name = logger
    ... level = INFO
    ... """
    >>> registerConfig(u'collective.transmogrifier.sections.tests.evenodd',
    ...                evenoddsplitter)
    >>> handler.clear()
    >>> transmogrifier(u'collective.transmogrifier.sections.tests.evenodd')
    >>> print handler
    logger INFO
        {'even': 'The even pipe', 'id': 'item-00'}
    logger INFO
        {'id': 'item-01', 'odd': 'The odd pipe'}
    logger INFO
        {'even': 'The even pipe', 'id': 'item-02'}

Conditions are expressed as TALES statements, and have access to:

=================== ==========================================================
 ``item``            the current pipeline item
 ``transmogrifier``  the transmogrifier
 ``name``            the name of the splitter section
 ``pipeline``        the name of the splitter pipeline this condition belongs
                     to (including the ``pipeline-`` prefix)
 ``options``         the splitter options
 ``modules``         sys.modules
=================== ==========================================================


.. WARNING::
    Although the splitter section employs some techniques to avoid memory
    bloat, if any contained section swallows items (so taking them from the
    previous section without passing them on), runs the risk of pulling all
    remaining items into the splitter buffer as a next match for the contained
    pipeline is being sought.

    You can avoid this by not using sections that discard items within a
    splitter; place these before or after a splitter section. Better still,
    use a correct condition in the splitter configuration that won't include
    the items to discard in the first place.

.. [*] Note that copy.deepcopy is used on all items. This will fail on items
    containing file handles, modules or other non-copyable values. See the
    copy module documentation.


Savepoint section
=================

A savepoint pipeline section commits a savepoint every so often, which has a
side-effect of freeing up memory. The savepoint section blueprint name is
``collective.transmogrifier.sections.savepoint``.

A savepoint section takes an optional ``every`` option, which defaults to
1000; a savepoint is committed every ``every`` items passing through the pipe.
A savepoint section doesn't alter the items in any way:

    >>> savepoint = """
    ... [transmogrifier]
    ... pipeline =
    ...     source
    ...     savepoint
    ...
    ... [source]
    ... blueprint = collective.transmogrifier.sections.tests.rangesource
    ... size = 10
    ...
    ... [savepoint]
    ... blueprint = collective.transmogrifier.sections.savepoint
    ... every = 3
    ... """
    >>> registerConfig(u'collective.transmogrifier.sections.tests.savepoint',
    ...                savepoint)

    We'll show savepoints being committed by overriding transaction.savepoint:

    >>> import transaction
    >>> original_savepoint = transaction.savepoint
    >>> counter = [0]
    >>> def test_savepoint(counter=counter, *args, **kw):
    ...     counter[0] += 1
    >>> transaction.savepoint = test_savepoint
    >>> transmogrifier(u'collective.transmogrifier.sections.tests.savepoint')
    >>> transaction.savepoint = original_savepoint
    >>> counter[0]
    3


CSV source section
==================

A CSV source pipeline section lets you create pipeline items from CSV files.
The CSV source section blueprint name is
``collective.transmogrifier.sections.csvsource``.

A CSV source section will load the CSV file named in the ``filename``
option or the CSV file named in an item key using the ``key`` option,
and will yield an item for each line in the CSV file. It'll use the first line
of the CSV file to determine what keys to use, or you can specify a
``fieldnames`` option to specify the key names.

The ``filename`` option may be an absolute path, or a package reference, e.g.
``my.package:foo/bar.csv``.

By default the CSV file is assumed to use the Excel CSV dialect, but you can
specify any dialect supported by the python csv module if you specify it with
the ``dialect`` option.  You can also specify `fmtparams`_ using
options that start with ``fmtparam-``.


    >>> import os
    >>> from collective.transmogrifier import tests
    >>> csvsource = """
    ... [transmogrifier]
    ... pipeline =
    ...     csvsource
    ...     logger
    ...
    ... [csvsource]
    ... blueprint = collective.transmogrifier.sections.csvsource
    ... filename = {}/csvsource.csv
    ...
    ... [logger]
    ... blueprint = collective.transmogrifier.sections.logger
    ... name = logger
    ... level = INFO
    ... """.format(os.path.dirname(tests.__file__))
    >>> registerConfig(u'collective.transmogrifier.sections.tests.csvsource.file',
    ...                csvsource)
    >>> transmogrifier(u'collective.transmogrifier.sections.tests.csvsource.file')
    >>> print handler
    logger INFO
        {'bar': 'first-bar', 'baz': 'first-baz', 'foo': 'first-foo'}
    logger INFO
        {'bar': 'second-bar', 'baz': 'second-baz', 'foo': 'second-foo'}

The CSV file column field names can also be specified.

    >>> handler.clear()
    >>> transmogrifier(u'collective.transmogrifier.sections.tests.csvsource.file',
    ...                csvsource=dict(fieldnames='monty spam eggs'))
    >>> print handler
    logger INFO
        {'eggs': 'baz', 'monty': 'foo', 'spam': 'bar'}
    logger INFO
        {'eggs': 'first-baz', 'monty': 'first-foo', 'spam': 'first-bar'}
    logger INFO
        {'eggs': 'second-baz', 'monty': 'second-foo', 'spam': 'second-bar'}

Here is the same example, loading a file from a package instead:

    >>> csvsource = """
    ... [transmogrifier]
    ... pipeline =
    ...     csvsource
    ...     logger
    ...
    ... [csvsource]
    ... blueprint = collective.transmogrifier.sections.csvsource
    ... filename = collective.transmogrifier.tests:sample.csv
    ...
    ... [logger]
    ... blueprint = collective.transmogrifier.sections.logger
    ... name = logger
    ... level = INFO
    ... """
    >>> registerConfig(u'collective.transmogrifier.sections.tests.csvsource.package',
    ...                csvsource)
    >>> handler.clear()
    >>> transmogrifier(u'collective.transmogrifier.sections.tests.csvsource.package')
    >>> print handler
    logger INFO
        {'bar': 'first-bar', 'baz': 'first-baz', 'foo': 'first-foo'}
    logger INFO
        {'_csvsource_rest': ['corge', 'grault'],
       'bar': 'second-bar',
       'baz': 'second-baz',
       'foo': 'second-foo'}

We can also load a file from a GS import context:

    >>> from collective.transmogrifier.transmogrifier import Transmogrifier
    >>> from collective.transmogrifier.genericsetup import IMPORT_CONTEXT
    >>> from zope.annotation.interfaces import IAnnotations
    >>> class FakeImportContext(object):
    ...  def __init__(self, subdir, filename, contents):
    ...      self.filename = filename
    ...      self.subdir = subdir
    ...      self.contents = contents
    ...  def readDataFile(self, filename, subdir=None):
    ...      if subdir is None and self.subdir is not None:
    ...          return None
    ...      if filename != self.filename:
    ...          return None
    ...      return self.contents
    >>> csvsource = """
    ... [transmogrifier]
    ... pipeline =
    ...     csvsource
    ...     logger
    ...
    ... [csvsource]
    ... blueprint = collective.transmogrifier.sections.csvsource
    ... filename = importcontext:sub/dir/somefile.csv
    ...
    ... [logger]
    ... blueprint = collective.transmogrifier.sections.logger
    ... name = logger
    ... level = INFO
    ... """
    >>> registerConfig(u'collective.transmogrifier.sections.tests.csvsource.gs',
    ...                csvsource)
    >>> handler.clear()
    >>> t = Transmogrifier({})
    >>> IAnnotations(t)[IMPORT_CONTEXT] = FakeImportContext('sub/dir/', 'somefile.csv',
    ... """animal,name
    ... cow,daisy
    ... pig,george
    ... duck,archibald
    ... """)
    >>> t(u'collective.transmogrifier.sections.tests.csvsource.gs')
    >>> print handler
    logger INFO
        {'animal': 'cow', 'name': 'daisy'}
    logger INFO
        {'animal': 'pig', 'name': 'george'}
    logger INFO
        {'animal': 'duck', 'name': 'archibald'}

Import contexts can be chunked, and that's okay:

    >>> import StringIO
    >>> class FakeChunkedImportContext(object):
    ...  def __init__(self, subdir, filename, contents):
    ...      self.filename = filename
    ...      self.contents = contents
    ...  def openDataFile(self, filename, subdir=None):
    ...      if subdir is None and self.subdir is not None:
    ...          return None
    ...      if filename != self.filename:
    ...          return None
    ...      return StringIO.StringIO(self.contents)
    >>> handler.clear()
    >>> t = Transmogrifier({})
    >>> IAnnotations(t)[IMPORT_CONTEXT] = FakeChunkedImportContext(None, 'somefile.csv',
    ... """animal,name
    ... fish,wanda
    ... """)
    >>> t(u'collective.transmogrifier.sections.tests.csvsource.gs')
    >>> print handler
    logger INFO
        {'animal': 'fish', 'name': 'wanda'}

Attempting to load a nonexistant file won't do anything:

    >>> handler.clear()
    >>> t = Transmogrifier({})
    >>> IAnnotations(t)[IMPORT_CONTEXT] = FakeImportContext(None, 'someotherfile.csv',
    ... """animal,name
    ... cow,daisy
    ... pig,george
    ... duck,archibald
    ... """)
    >>> t(u'collective.transmogrifier.sections.tests.csvsource.gs')
    >>> print handler

Not having an import context around will also find nothing:

    >>> handler.clear()
    >>> t = Transmogrifier({})
    >>> t(u'collective.transmogrifier.sections.tests.csvsource.gs')
    >>> print handler

The file can also be taken from a source item's key. A key can also be
specified for rows that have more values than the fieldnames.

    >>> csvsource = """
    ... [transmogrifier]
    ... include = collective.transmogrifier.sections.tests.csvsource.package
    ... pipeline =
    ...     csvsource
    ...     filename
    ...     item-csvsource
    ...     logger
    ...
    ... [csvsource]
    ... blueprint = collective.transmogrifier.sections.csvsource
    ... filename = collective.transmogrifier.tests:keysource.csv
    ...
    ... [filename]
    ... blueprint = collective.transmogrifier.sections.inserter
    ... key = string:_item-csvsource
    ... condition = exists:item/_item-csvsource
    ... value = python:modules['os.path'].join(modules['os.path'].dirname(
    ...     modules['collective.transmogrifier.tests'].__file__),
    ...     item['_item-csvsource'])
    ...
    ... [item-csvsource]
    ... blueprint = collective.transmogrifier.sections.csvsource
    ... restkey = _args
    ... row-key = string:_csvsource
    ...
    ... """
    >>> registerConfig(u'collective.transmogrifier.sections.tests.csvsource.key',
    ...                csvsource)

    >>> handler.clear()
    >>> transmogrifier(u'collective.transmogrifier.sections.tests.csvsource.key')
    >>> print handler
    logger INFO
        {'_item-csvsource': '.../collective/transmogrifier/tests/sample.csv'}
    logger INFO
        {'_csvsource': '.../collective/transmogrifier/tests/sample.csv',
       'bar': 'first-bar',
       'baz': 'first-baz',
       'foo': 'first-foo'}
    logger INFO
        {'_args': ['corge', 'grault'],
       '_csvsource': '.../collective/transmogrifier/tests/sample.csv',
       'bar': 'second-bar',
       'baz': 'second-baz',
       'foo': 'second-foo'}

The ``fmtparam-`` expressions have access to the following:

=================== ==========================================================
 ``key``             the `fmtparam`_ attribute
 ``transmogrifier``  the transmogrifier
 ``name``            the name of the inserter section
 ``options``         the inserter options
 ``modules``         sys.modules
=================== ==========================================================

The ``row-key`` and ``row-value`` expressions have access to the following:

=================== ==========================================================
 ``item``            the pipeline item to be yielded from this CSV row
 ``source_item``     the pipeline item the CSV filename was taken from
 ``transmogrifier``  the transmogrifier
 ``name``            the name of the inserter section
 ``options``         the inserter options
 ``modules``         sys.modules
 ``key``             (only for the value and condition expressions) the key
                     being inserted
=================== ==========================================================


Logger section
================

First we need to set up a logger for testing:

    >>> import logging, sys
    >>> logger = logging.getLogger()
    >>> handler = logging.StreamHandler(sys.stdout)
    >>> handler.setFormatter(logging.Formatter('%(name)s: %(message)s'))
    >>> logger.addHandler(handler)

A logger section lets you log a piece of data from the item together with a
name. You can set any logging level in the logger. The logger blueprint name
is ``collective.transmogrifier.sections.logger``.

    >>> infologger = """
    ... [transmogrifier]
    ... pipeline =
    ...     source
    ...     logger
    ...
    ... [source]
    ... blueprint = collective.transmogrifier.sections.tests.rangesource
    ... size = 3
    ...
    ... [logger]
    ... blueprint = collective.transmogrifier.sections.logger
    ... level = INFO
    ... name = Infologger test
    ... key = id
    ... """
    >>> registerConfig(u'collective.transmogrifier.sections.tests.infologger',
    ...                infologger)
    >>> transmogrifier(u'collective.transmogrifier.sections.tests.infologger')
    Infologger test: item-00
    Infologger test: item-01
    Infologger test: item-02


We can also have numerical levels, and if the key is missing, it will print out
a message to that effect.  A condition may also be used to restrict
the items logged.

    >>> debuglogger = """
    ... [transmogrifier]
    ... pipeline =
    ...     source
    ...     logger
    ...
    ... [source]
    ... blueprint = collective.transmogrifier.sections.tests.rangesource
    ... size = 3
    ...
    ... [logger]
    ... blueprint = collective.transmogrifier.sections.logger
    ... level = 10
    ... name = Infologger test
    ... key = foo
    ... condition = python:item['id'] != 'item-01'
    ... """
    >>> registerConfig(u'collective.transmogrifier.sections.tests.debuglogger',
    ...                debuglogger)
    >>> transmogrifier(u'collective.transmogrifier.sections.tests.debuglogger')
    Infologger test: -- Missing key --
    Infologger test: -- Missing key --

If no ``key`` option is given, the logger will render the whole item
in a readable format using Python's ``pprint`` module.  The ``delete``
option can be used to omit certain keys from the output, such as body
text fields which may be too large and make the output too noisy.

    >>> logger = """
    ... [transmogrifier]
    ... pipeline =
    ...     source
    ...     logger
    ...
    ... [source]
    ... blueprint = collective.transmogrifier.sections.tests.samplesource
    ...
    ... [logger]
    ... blueprint = collective.transmogrifier.sections.logger
    ... level = INFO
    ... delete =
    ...     title-duplicate
    ...     id-duplicate
    ...     nonexistent
    ... """
    >>> registerConfig(u'collective.transmogrifier.sections.tests.logger',
    ...                logger)
    >>> transmogrifier(u'collective.transmogrifier.sections.tests.logger')
    collective.transmogrifier.sections.tests.logger.logger:
      {'id': 'foo', 'status': u'\u2117', 'title': u'The Foo Fighters \u2117'}
    collective.transmogrifier.sections.tests.logger.logger:
      {'id': 'bar', 'status': u'\u2122', 'title': u'Brand Chocolate Bar \u2122'}
    collective.transmogrifier.sections.tests.logger.logger:
      {'id': 'monty-python',
       'status': u'\xa9',
       'title': u"Monty Python's Flying Circus \xa9"}


Breakpoint section
==================

A breakpoint section will stop and enter pdb when a specific condition is
met. This is useful for debugging, as you can add a brekpoint section just
before a section that gets an error on a specific item.

The alternative is to add a conditional breakpoint in the section that fails,
but that can require findning the code in some egg somewhere, adding the
breakpoint and restarting the server. This speeds up the process.

    >>> breaker = """
    ... [transmogrifier]
    ... pipeline =
    ...     source
    ...     breaker
    ...     logger
    ...
    ... [source]
    ... blueprint = collective.transmogrifier.sections.tests.rangesource
    ... size = 3
    ...
    ... [breaker]
    ... blueprint = collective.transmogrifier.sections.breakpoint
    ... condition = python: item['id'] == 'item-01'
    ...
    ... [logger]
    ... blueprint = collective.transmogrifier.sections.logger
    ... name = logger
    ... level = INFO
    ... """
    >>> registerConfig(u'collective.transmogrifier.sections.tests.breaker',
    ...                breaker)

Since pdb requires input, for this test we replace stdin with something
giving some input (just a continue cammand).

    >>> oldstdin = make_stdin('c\n')
    >>> transmogrifier(u'collective.transmogrifier.sections.tests.breaker')
    > .../collective.transmogrifier/src/collective/transmogrifier/sections/logger.py(...)__iter__()
    -> ...
    (Pdb) c
    >>> print handler
    logger INFO
        {'id': 'item-00'}
    logger INFO
        {'id': 'item-01'}
    logger INFO
        {'id': 'item-02'}


And finally we reset the stdin:

    >>> reset_stdin(oldstdin)



Change History
**************

(name of developer listed in brackets)

1.5 (2013-07-23)
================

- Allow csvsource to read files from GS import context
  [lentinj]

- Don't use traversal to avoid problems with acquisition or views.
  [rpatterson]

- Add csvsource support for taking the filename from an item key.
  [rpatterson]

- Add csvsource restkey handling for rows with more keys than fieldnames.
  [rpatterson]

- Add a blueprint for opening and caching URLs with `urllib2`_.
  [rpatterson]

- Add a source for walking a directory with `os.walk`_.
  [rpatterson]

- Add support for arbitrary csvsource fmtparam options.
  [rpatterson]

- Add DEBUG logging for expressions, useful for tracking changes to
  items as they move through the pipeline.
  [rpatterson]

- Add an XML walker source section for walking a tree of elements.
  [rpatterson]

- Add a list source section for adding recursion and/or looping to pipelines.
  [rpatterson]

- Add pprint support to the logger section, moved from the pprint
  section used in tests to make it more useful and available in actual
  pipelines.
  [rpatterson]

1.4 (2013-04-07)
================

- Fix the import location of the pagetemplate engine for newer Zope versions.
  [leorochael]

- Bug fix to load ZCML for GS when Products.GenericSetup is installed.
  [aclark]

1.3 (2011-03-17)
================

- Added the GenericSetup import context as an annotation to the transmogrifier.
  [elro]

- Added a logger to log the value of a particular key for all items. Handy
  when debugging, you can see which path is failing, and good if you want
  to show progress in a long import.
  [regebro]

- Added a breakpoint section to break on a particular expression, which is
  handy for debugging.
  [regebro]

1.2 (2010-03-30)
================

- Bug fix: the constructor promises to encode paths to ASCII, but failed to
  do so. Thanks to gyst for finding the discrepancy.
  [mj]

1.1 (2010-03-17)
================

- Allow the CSV source to load its file from a package as well as from an
  absolute or relative file path. To load from a package, pass
  ``package.name:filename.csv`` to the ``filename`` option.
  [optilude]

- Add CMF 2.2/Plone 4 compatibility for the content constructor
  [optilude]

- Use an explicit provides attribute to register the transmogrifier adapter.
  Fixes the "Missing 'provides' attribute" errors when loading with
  zope.annotation installed.
  [mj]

- Add a required flag to the content constructor, which causes it to raise
  a KeyError if the container where to construct the new item doesn't exist.
  [regebro]

- Add an optional condition to the manipulator section.
  [regebro]

1.0 (2009-08-07)
================

- Initial transmogrifier architecture.
  [mj]


Download
********
 
File Type Py Version Uploaded on Size
collective.transmogrifier-1.5.zip (md5) Source 2013-07-23 113KB
  • Downloads (All Versions):
  • 75 downloads in the last day
  • 432 downloads in the last week
  • 1086 downloads in the last month