Skip to main content

A configurable pipeline, aimed at transforming content for import and export

Project description

Transmogrifier provides support for building pipelines that turn one thing into another. Specifically, transmogrifier pipelines are used to convert and import legacy content into a Plone site. It provides the tools to construct pipelines from multiple sections, where each section processes the data flowing through the pipe.

A “transmogrifier pipeline” refers to a description of a set of pipe sections, slotted together in a set order. The stated goal is for these sections to transform data and ultimately add content to a Plone site based on this data. Sections deal with tasks such as sourcing the data (from textfiles, databases, etc.) and characterset conversion, through to determining portal type, location and workflow state.

Note that a transmogrifier pipeline can be used to process any number of things, and is not specific to Plone content import. However, it’s original intent is to provide a pluggable way to import legacy content.

Installation

See docs/INSTALL.txt for installation instructions.

Credits

Development sponsored by

Elkjøp Nordic AS

Design and development

Martijn Pieters at Jarn

Project name

A transmogrifier is fictional device used for transforming one object into another object. The term was coined by Bill Waterson of Calvin and Hobbes fame.

Detailed Documentation

Pipelines

To transmogrify, or import and convert non-plone content, you simply define a pipeline. Pipe sections, the equivalent of parts in a buildout, are slotted together into a processing pipe. To slot sections together, you define a configuration file, define named sections, and a main pipeline definition that names the sections in order (one section per line):

>>> exampleconfig = """\
... [transmogrifier]
... pipeline =
...     section 1
...     section 2
...     section 3
...
... [section 1]
... blueprint = collective.transmogrifier.tests.examplesource
... size = 5
...
... [section 2]
... blueprint = collective.transmogrifier.tests.exampletransform
...
... [section 3]
... blueprint = collective.transmogrifier.tests.exampleconstructor
... """

As you can see this is also very similar to how you construct WSGI pipelines using paster. The format of the configuration files is defined by the Python ConfigParser module, with extensions that we’ll describe later. At minimum, at least the transmogrifier section with an empty pipeline is required:

>>> mimimalconfig = """\
... [transmogrifier]
... pipeline =
... """

Transmogrifier can load these configuration files either by looking them up in a registry or by loading them from a python package.

You register transmogrifier configurations using the registerConfig directive in the http://namespaces.plone.org/transmogrifier namespace, together with a name, and optionally a title and description:

<configure
    xmlns="http://namespaces.zope.org/zope"
    xmlns:transmogrifier="http://namespaces.plone.org/transmogrifier"
    i18n_domain="collective.transmogrifier">

<transmogrifier:registerConfig
    name="exampleconfig"
    title="Example pipeline configuration"
    description="This is an example pipeline configuration"
    configuration="example.cfg"
    />

</configure>

You can then tell transmogrifier to load the ‘exampleconfig’ configuration. To load configuration files directly from a python package, name the package and the configuration file separated by a colon, such as ‘collective.transmogrifier.tests:exampleconfig.cfg’.

Registering files with the transmogrifier registry allows other uses, such as listing available configurations in a user interface, together with the registered description. Loading files directly let’s you build reusable libraries of configuration files more quickly though.

In this document we’ll use the shorthand registerConfig to register example configurations:

>>> registerConfig(u'collective.transmogrifier.tests.exampleconfig',
...                exampleconfig)

Pipeline sections

Each section in the pipeline is created by a blueprint. Blueprints are looked up as named utilities implementing the ISectionBlueprint interface. In the transmogrifier configuration file, you refer to blueprints by the name under which they are registered. Blueprints are factories; when called they produce an ISection pipe section. ISections in turn, are iterators implementing the iterator protocol.

Here is a simple blueprint, in the form of a class definition:

>>> from zope.interface import classProvides, implements
>>> from zope.component import provideUtility
>>> class ExampleTransform(object):
...     classProvides(ISectionBlueprint)
...     implements(ISection)
...
...     def __init__(self, transmogrifier, name, options, previous):
...         self.previous = previous
...         self.name = name
...
...     def __iter__(self):
...         for item in self.previous:
...             item['exampletransformname'] = self.name
...             yield item
...
>>> provideUtility(ExampleTransform,
...                name=u'collective.transmogrifier.tests.exampletransform')

Note that we register this class as a named utility, and that instances of this class can be used as an iterator. When slotted together, items ‘flow’ through the pipeline by iterating over the last section, which in turn iterates over it’s preceding section (self.previous in the example), and so on.

By iterating over the source, then yielding the items again, each section passes items on to the next section. During the iteration loop, sections can manipulate the items. Note that items are python dictionaries; sections simply operate on the keys they care about. In our example we add a new key, exampletransformname, which we set to the name of the section.

Sources

The items that flow through the pipe have to originate from somewhere though. This is where special sections, sources, come in. A source is simply a pipe section that inserts extra items into the pipeline. This is best illustrated with another example:

>>> class ExampleSource(object):
...     classProvides(ISectionBlueprint)
...     implements(ISection)
...
...     def __init__(self, transmogrifier, name, options, previous):
...         self.previous = previous
...         self.size = int(options['size'])
...
...     def __iter__(self):
...         for item in self.previous:
...             yield item
...
...         for i in range(self.size):
...             yield dict(id='item%02d' % i)
...
>>> provideUtility(ExampleSource,
...                name=u'collective.transmogrifier.tests.examplesource')

In this example we use the options dictionary to read options from the section configuration, which in the example configuration we gave earlier has the option size defined as 5. Note that the configuration values are always strings, so we need to convert the size option to an integer here.

The source first iterates over the previous section and yields all items unchanged. Only when that loop is done, does the source produce new items and puts those into the pipeline. This order is important: when you slot multiple source sections together, you want items produced by earlier sections to be processed first too.

There is always a previous section, even for the first section defined in the pipeline. Transmogrifier passes in a empty iterator when it instantiates this first section, expecting such a first section to be a source that’ll produce items for the pipeline to process.

Constructors

As stated before, transmogrifier is intended for importing content into a Plone site. However, transmogrifier itself only drives the pipeline, inserting an empty iterator and discarding whatever it pulls out of the last section.

In order to create content then, a constructor section is required. Like source sections, you should be able to use multiple constructors, so constructors should always start with yielding the items passed in from the previous section on to a possible next section.

So, a constructor section is an ISection that consumes items from the previous section, and affects the plone site based on items, usually by creating content objects based on these items, then yield the item for a next section. For example purposes, we simply pretty print the items instead:

>>> import pprint
>>> class ExampleConstructor(object):
...     classProvides(ISectionBlueprint)
...     implements(ISection)
...
...     def __init__(self, transmogrifier, name, options, previous):
...         self.previous = previous
...         self.pprint = pprint.PrettyPrinter().pprint
...
...     def __iter__(self):
...         for item in self.previous:
...             self.pprint(item)
...             yield item
...
>>> provideUtility(ExampleConstructor,
...                name=u'collective.transmogrifier.tests.exampleconstructor')

With this last section blueprint example completed, we can load the example configuration we created earlier, and run our transmogrification:

>>> from collective.transmogrifier.transmogrifier import Transmogrifier
>>> transmogrifier = Transmogrifier(plone)
>>> transmogrifier(u'collective.transmogrifier.tests.exampleconfig')
{'exampletransformname': 'section 2', 'id': 'item00'}
{'exampletransformname': 'section 2', 'id': 'item01'}
{'exampletransformname': 'section 2', 'id': 'item02'}
{'exampletransformname': 'section 2', 'id': 'item03'}
{'exampletransformname': 'section 2', 'id': 'item04'}
Developing blueprints

As we could see from the ISectionBlueprint examples above, a blueprint gets called with several arguments: transmogrifier, name, options and previous.

We discussed previous before, it is a reference to the previous pipe section and must be looped over when the section itself is iterated. The name argument is simply the name of the section as given in the configuration file.

The transmogrifier argument is a reference to the transmogrifier itself, and it can be used to reach the context we are importing to through it’s context attribute. The transmogrifier also acts as a dictionary, mapping from section names to a mapping of the options in each section.

Finally, as seen before, the options argument is a mapping of the current section options. It is the same mapping as can be had through transmogrifier[name].

A short example shows each of these arguments in action:

>>> class TitleExampleSection(object):
...     classProvides(ISectionBlueprint)
...     implements(ISection)
...
...     def __init__(self, transmogrifier, name, options, previous):
...         self.transmogrifier = transmogrifier
...         self.name = name
...         self.options = options
...         self.previous = previous
...
...         pipeline = transmogrifier['transmogrifier']['pipeline']
...         pipeline_size = len([s.strip() for s in pipeline.split('\n')
...                              if s.strip()])
...         self.size = options['pipeline-size'] = str(pipeline_size)
...         self.site_title = transmogrifier.context.Title()
...
...     def __iter__(self):
...         for item in self.previous:
...             item['pipeline-size'] = self.size
...             item['title'] = '%s - %s' % (self.site_title, item['id'])
...             yield item
>>> provideUtility(TitleExampleSection,
...                name=u'collective.transmogrifier.tests.titleexample')
>>> titlepipeline = """\
... [transmogrifier]
... pipeline =
...     section1
...     titlesection
...     section3
...
... [section1]
... blueprint = collective.transmogrifier.tests.examplesource
... size = 5
...
... [titlesection]
... blueprint = collective.transmogrifier.tests.titleexample
...
... [section3]
... blueprint = collective.transmogrifier.tests.exampleconstructor
... """
>>> registerConfig(u'collective.transmogrifier.tests.titlepipeline',
...                titlepipeline)
>>> plone.Title()
u'Plone Test Site'
>>> transmogrifier = Transmogrifier(plone)
>>> transmogrifier(u'collective.transmogrifier.tests.titlepipeline')
{'title': u'Plone Test Site - item00', 'id': 'item00', 'pipeline-size': '3'}
{'title': u'Plone Test Site - item01', 'id': 'item01', 'pipeline-size': '3'}
{'title': u'Plone Test Site - item02', 'id': 'item02', 'pipeline-size': '3'}
{'title': u'Plone Test Site - item03', 'id': 'item03', 'pipeline-size': '3'}
{'title': u'Plone Test Site - item04', 'id': 'item04', 'pipeline-size': '3'}

Configuration file syntax

As mentioned earlier, the configuration files use the format defined by the Python ConfigParser module with extensions. The extensions are based on the zc.buildout extensions and are:

  • option names are case sensitive

  • option values can use a substitution syntax, described below, to refer to option values in specific sections.

  • you can include other configuration files, see Including other configurations.

The ConfigParser syntax is very flexible. Section names can contain any characters other than newlines and right square braces (“]”). Option names can contain any characters (within the ASCII character set) other than newlines, colons, and equal signs, can not start with a space, and don’t include trailing spaces.

It is a good idea to keep section and option names simple, sticking to alphanumeric characters, hyphens, and periods.

Variable substitution

Transmogrifier supports a string.Template-like syntax for variable substitution, using both the section and the option name joined by a colon:

>>> substitutionexample = """\
... [transmogrifier]
... pipeline =
...     section1
...     section2
...     section3
...
... [definitions]
... item_count = 3
...
... [section1]
... blueprint = collective.transmogrifier.tests.examplesource
... size = ${definitions:item_count}
...
... [section2]
... blueprint = collective.transmogrifier.tests.exampletransform
...
... [section3]
... blueprint = collective.transmogrifier.tests.exampleconstructor
... """
>>> registerConfig(u'collective.transmogrifier.tests.substitutionexample',
...                substitutionexample)

Here we created an extra section called definitions, and refer to the item_count option defined in that section to set the size of the section1 pipeline section, so we only get 3 items when we execute this pipeline:

>>> transmogrifier = Transmogrifier(plone)
>>> transmogrifier(u'collective.transmogrifier.tests.substitutionexample')
{'exampletransformname': 'section2', 'id': 'item00'}
{'exampletransformname': 'section2', 'id': 'item01'}
{'exampletransformname': 'section2', 'id': 'item02'}
Including other configurations

You can include other transmogrifier configurations with the include option in the transmogrifier section. This option takes a list of configuration ids, separated by whitespace. All sections and options from those configuration files will be included provided the options weren’t already present. This works recursively; inclusions in the included configuration files are honoured too:

>>> inclusionexample = """\
... [transmogrifier]
... include =
...     collective.transmogrifier.tests.sources
...     collective.transmogrifier.tests.base
...
... [section1]
... size = 3
... """
>>> registerConfig(u'collective.transmogrifier.tests.inclusionexample',
...                inclusionexample)
>>> sources = """\
... [section1]
... blueprint = collective.transmogrifier.tests.examplesource
... size = 10
... """
>>> registerConfig(u'collective.transmogrifier.tests.sources',
...                sources)
>>> base = """\
... [transmogrifier]
... pipeline =
...     section1
...     section2
...     section3
... include = collective.transmogrifier.tests.constructor
...
... [section2]
... blueprint = collective.transmogrifier.tests.exampletransform
... """
>>> registerConfig(u'collective.transmogrifier.tests.base',
...                base)
>>> constructor = """\
... [section3]
... blueprint = collective.transmogrifier.tests.exampleconstructor
... """
>>> registerConfig(u'collective.transmogrifier.tests.constructor',
...                constructor)
>>> transmogrifier = Transmogrifier(plone)
>>> transmogrifier(u'collective.transmogrifier.tests.inclusionexample')
{'exampletransformname': 'section2', 'id': 'item00'}
{'exampletransformname': 'section2', 'id': 'item01'}
{'exampletransformname': 'section2', 'id': 'item02'}

Like zc.buildout configurations, we can also add or remove lines from included configuration options, by using the += and -= syntax:

>>> advancedinclusionexample = """\
... [transmogrifier]
... include =
...     collective.transmogrifier.tests.inclusionexample
... pipeline -=
...     section2
...     section3
... pipeline +=
...     section4
...     section3
...
... [section4]
... blueprint = collective.transmogrifier.tests.titleexample
... """
>>> registerConfig(u'collective.transmogrifier.tests.advancedinclusionexample',
...                advancedinclusionexample)
>>> transmogrifier = Transmogrifier(plone)
>>> transmogrifier(u'collective.transmogrifier.tests.advancedinclusionexample')
{'title': u'Plone Test Site - item00', 'id': 'item00', 'pipeline-size': '3'}
{'title': u'Plone Test Site - item01', 'id': 'item01', 'pipeline-size': '3'}
{'title': u'Plone Test Site - item02', 'id': 'item02', 'pipeline-size': '3'}

When calling transmogrifier, you can provide your own sections too: any extra keyword is interpreted as a section dictionary. Do make sure you use string values though:

>>> transmogrifier(u'collective.transmogrifier.tests.inclusionexample',
...               section1=dict(size='1'))
{'exampletransformname': 'section2', 'id': 'item00'}

Conventions

At its most basic level, transmogrifier pipelines are just iterators passing ‘things’ around. Transmogrifier doesn’t expect anything more than being able to iterate over the pipeline and doesn’t dictate what happens within that pipeline, what defines a ‘thing’ or what ultimately gets accomplished.

But as has been stated repeatedly, transmogrifier has been developed to facilitate importing legacy content, processing data in incremental steps until a final section constructs new content.

To reach this end, several conventions have been established that help the various pipeline sections work together.

Items are mappings

The first one is that the ‘things’ passed from section to section are mappings; i.e. they are or behave just like python dictionaries. Again, transmogrifier doesn’t produce these by itself, source sections (see Sources) produce them by injecting them into the stream.

Keys are fields

Secondly, all keys in such mappings that do not start with an underscore will be used by constructor sections (see Constructors) to construct Plone content. So keys that do not start with an underscore are expected to map to Archetypes fields or Zope3 schema fields or whatever the constructor expects.

Paths are to the target object

Many sections either create objects (constructors) or operate on already-constructed or pre-existing objecs. Such sections should interpret paths as the complete path for the object. For constructors this means they’ll need to split the path into a container path and an id in order for them to find the correct context for constructing the object.

Keys with a leading underscore are controllers

This leaves the keys that do start with a leading underscore to have special meaning to specific sections, allowing earlier pipeline sections to inject ‘control statements’ for later sections in the item mapping. To avoid name clashes, sections that do expect such controller keys should use prefixes based on the name under which their blueprint was registered, plus optionally the name of the pipe section. This allows for precise targeting of pipe sections when inserting such keys.

We’ll illustrate this with an example. Let’s say a source section loads news items from a database, but the database tables for such items hold filenames to point to binary image data. Rather than have this section load those filenames directly and add them to the item for image creation, a generic ‘file loader’ section is used to do this. Let’s suppose that this file loader is registered as acme.transmogrifier.fileloader. This section then could be instructed to load files and store them in a named key by using 2 ‘controller’ keys named _acme.transmogrifier.fileloader_filename and _acme.transmogrifier.fileloader_targetkey. If the source section were to create pipeline items with those keys, this later fileloader section would then automatically load the filenames and inject them into the items in the right location.

If you need 2 such loaders, you can target them each individually by including their section names; so to target just the imageloader1 section you’d use the keys _acme.transmogrifier.fileloader_imageloader1_filename and _acme.transmogrifier.fileloader_imageloader1_targetkey. Sections that support such targeting should prefer such section specific keys over those only using the blueprint name.

The collective.transmogrifier.utils module has a handy utility method called defaultKeys that’ll generate these keys for you for easy matching:

>>> from collective.transmogrifier import utils
>>> keys = utils.defaultKeys('acme.transmogrifier.fileloader',
...                          'imageloader1', 'filename')
>>> pprint.pprint(keys)
('_acme.transmogrifier.fileloader_imageloader1_filename',
 '_acme.transmogrifier.fileloader_filename',
 '_imageloader1_filename',
 '_filename')
>>> utils.Matcher(*keys)('_filename', '_imageloader1_filename')
('_imageloader1_filename', True)
Keep memory use to a minimum

The above example is a little contrived of course; you’d generally configure a file loader section with a key name to grab the filename from, and perhaps put the loader after the constructor section and load the image data straight into the already constructed content item instead. This lowers memory requirements as image data can go directly into the ZODB this way, and the content object can be deactivated after the binary data has been stored.

By operating on one item at a time, a transmogrifier pipeline can handle huge numbers of content without breaking memory limits; individual sections should also avoid using memory unnecessarily.

Previous sections go first

As mentioned in the Sources section, when inserting new items into the stream, generally previous pipe sections come first. This way someone constructing a pipeline knows what source section will be processed earlier (those slotted earlier in the pipeline) and can adjust expectations accordingly. This makes content construction more predictable when dealing with multiple sources.

An exception would be a Folder Source, which inserts additional Folder items into the pipeline to ensure that the required container for any given content item exists at construction time. Such a source would inject extra items as needed, not before or after the previous source section.

Iterators have 3 stages

Some tasks have to happen before the pipeline runs, or after all content has been created. In such cases it is handy to realise that iteration within a section consists of three stages: before iteration, iteration itself, and after iteration.

For example, a section creating references may have to wait for all content to be created before it can insert the references. In this case it could build a queue during iteration, and only when the previous pipe section has been exhausted and the last item has been yielded would the section reach into the portal and create all the references.

Sources following the Previous sections go first convention basically inject the new items in the after iteration stage.

Here’s a piece of psuedo code to illustrate these 3 stages:

def __iter__(self):
    # Before iteration
    # You can do initialisation here

    for item in self.previous
        # Iteration itself
        # You could process the items, take notes, inject additional
        # items based on the current item in the pipe or manipulate portal
        # content created by previous items
        yield item

    # After iteration
    # The section still has control here and could inject additional
    # items, manipulate all portal content created by the pipeline,
    # or clean up after itself.

You can get quite creative with this. For example, the reference creator could get quite creative and defer creation of references until it knew the referenced object has been created too and periodically create these references. This would keep memory requirements smaller as not all references to create have to be remembered.

Store pipeline-wide information in annotations

If, for some reason or other, you need to remember state across section instances that is pipeline-wide (such as database connections, or data counters), such information should be stored as annotations on the transmogrifier object:

from zope.annotation.interfaces import IAnnotations

MYKEY = 'foo.bar.baz'

def __init__(self, transmogrifier, name, options, previous):
    self.storage = IAnnotations(transmogrifier).setdefault(MYKEY, {})
    self.storage.setdefault('spam', 0)
    ...

def __iter__(self):
    ...
    self.storage['spam'] += 1
    ...

GenericSetup import integration

To ease running a transmogrifier pipeline during site configuration, a generic import step for GenericSetup is included.

The import step looks for a file named transmogrifier.txt and reads pipeline configuration names from this file, one name per line. Empty lines and lines starting with a # (hash mark) are skipped. These pipelines are then executed in the same order as they are found in the file.

This means that if you want to run one or more pipelines as part of a GenericSetup profile, all you have to do is name these pipelines in a file named transmogrifier.txt in your profile directory.

Default section blueprints

Constructor section

A constructor pipeline section is the heart of a transmogrifier content import pipeline. It constructs Plone content based on the items it processes. The constructor section blueprint name is collective.transmogrifier.sections.constructor. Constructor sections do only one thing, they construct new content. No schema changes are made. Also, constructors create content without restrictions, no security checks or containment constraints are checked.

Construction needs 2 pieces of information: the path to the item (including the id for the new item itself) and it’s portal type. To determine both of these, the constructor section inspects each item and looks for 2 keys, as described below. Any item missing any of these 2 pieces will be skipped. Similarly, items with a path for a container or type that doesn’t exist will be skipped as well; make sure that these containers are constructed beforehand. Because a constructor section will only construct new objects, if an object with the same path already exists, the item will also be skipped.

For the object path, it’ll look (in order) for _collective.transmogrifier.sections.constructor_[sectionname]_path, _collective.transmogrifier.sections.constructor_path, _[sectionname]_path, and _path, where [sectionname] is replaced with the name given to the current section. This allows you to target the right section precisely if needed. Alternatively, you can specify what key to use for the path by specifying the path-key option, which should be a list of keys to try (one key per line, use a re: or regexp: prefix to specify regular expressions).

For the portal type, use the type-key option to specify a set of keys just like path-key. If omitted, the constructor will look for _collective.transmogrifier.sections.constructor_[sectionname]_type, _collective.transmogrifier.sections.constructor_type, _[sectionname]_type, _type, portal_type and Type (in that order, with [sectionname] replaced).

Unicode paths will be encoded to ASCII. Using the path and type, a new object will be constructed using invokeFactory; nothing else is done. Paths are always interpreted as relative to the context object, with the last path segment being the id of the object to create.

>>> import pprint
>>> constructor = """
... [transmogrifier]
... pipeline =
...     contentsource
...     constructor
...     printer
...
... [contentsource]
... blueprint = collective.transmogrifier.sections.tests.contentsource
...
... [constructor]
... blueprint = collective.transmogrifier.sections.constructor
...
... [printer]
... blueprint = collective.transmogrifier.sections.tests.pprinter
... """
>>> registerConfig(u'collective.transmogrifier.sections.tests.constructor',
...                constructor)
>>> transmogrifier(u'collective.transmogrifier.sections.tests.constructor')
{'_type': 'FooType', '_path': '/spam/eggs/foo'}
{'_type': 'FooType', '_path': '/foo'}
{'_path': 'not/existing/bar',
 '_type': 'BarType',
 'title': 'Should not be constructed, not an existing path'}
{'_path': '/spam/eggs/existing',
 '_type': 'FooType',
 'title': 'Should not be constructed, an existing object'}
{'_path': '/spam/eggs/incomplete',
 'title': 'Should not be constructed, no type'}
{'_path': '/spam/eggs/nosuchtype',
 '_type': 'NonExisting',
 'title': 'Should not be constructed, not an existing type'}
{'_path': 'spam/eggs/changedByFactory',
 '_type': 'FooType',
 'title': 'Factories are allowed to change the id'}
>>> pprint.pprint(plone.constructed)
(('spam/eggs', 'foo', 'FooType'),
 ('', 'foo', 'FooType'),
 ('spam/eggs', 'changedByFactory', 'FooType'))

Codec section

A codec pipeline section lets you alter the character encoding of item values, allowing you to recode text from and to unicode and any of the codecs supported by python. The codec section blueprint name is collective.transmogrifier.sections.codec.

What values to recode is determined by the keys option, which takes a set of newline-separated key names. If a key name starts with re: or regexp: it is treated as a regular expression instead.

The optional from and to options determine what codecs values are recoded from and to. Both these values default to unicode, meaning no translation. If either option is set to default, the current default encoding of the Plone site is used.

To deal with possible encoding errors, you can set the error handler of both the from and to codecs separately with the from-error-handler and to-error-handler options, respectively. These default to strict, but can be set to any error handler supported by python, including replace and ignore.

Also optional is the condition option, which lets you specify a TALES expression that when evaluating to False will prevent any en- or decoding from happening. The condition is evaluated for every matched key.

>>> codecs = """
... [transmogrifier]
... pipeline =
...     source
...     decode-all
...     encode-id
...     encode-title
...     printer
...
... [source]
... blueprint = collective.transmogrifier.sections.tests.samplesource
... encoding = utf8
...
... [decode-all]
... blueprint = collective.transmogrifier.sections.codec
... keys = re:.*
... from = utf8
...
... [encode-id]
... blueprint = collective.transmogrifier.sections.codec
... keys = id
... to = ascii
...
... [encode-title]
... blueprint = collective.transmogrifier.sections.codec
... keys = title
... to = ascii
... to-error-handler = backslashreplace
... condition = python:'Brand' not in item['title']
...
... [printer]
... blueprint = collective.transmogrifier.sections.tests.pprinter
... """
>>> registerConfig(u'collective.transmogrifier.sections.tests.codecs',
...                codecs)
>>> transmogrifier(u'collective.transmogrifier.sections.tests.codecs')
{'status': u'\u2117', 'id': 'foo', 'title': 'The Foo Fighters \\u2117'}
{'status': u'\u2122', 'id': 'bar', 'title': u'Brand Chocolate Bar \u2122'}
{'id': 'monty-python',
 'status': u'\xa9',
 'title': "Monty Python's Flying Circus \\xa9"}

The condition expression has access to the following:

item

the current pipeline item

key

the name of the matched key

match

if the key was matched by a regular expression, the match object, otherwise boolean True

transmogrifier

the transmogrifier

name

the name of the splitter section

options

the splitter options

modules

sys.modules

Inserter section

An inserter pipeline section lets you define a key and value to insert into pipeline items. The inserter section blueprint name is collective.transmogrifier.sections.inserter.

A inserter section takes a key and a value TALES expression. These expressions are evaluated to generate the actual key-value pair that gets inserted. You can also specify an optional condition option; if given, the key only gets inserted when the condition, which is also a TALES is true.

Because the inserter value expression has access to the original item, it could even be used to change existing item values. Just target an existing key, pull out the original value in the value expression and return a modified version.

>>> inserter = """
... [transmogrifier]
... pipeline =
...     source
...     simple-insertion
...     expression-insertion
...     transform-id
...     printer
...
... [source]
... blueprint = collective.transmogrifier.sections.tests.rangesource
... size = 3
...
... [simple-insertion]
... blueprint = collective.transmogrifier.sections.inserter
... key = string:foo
... value = string:bar (inserted into "${item/id}" by the "$name" section)
...
... [expression-insertion]
... blueprint = collective.transmogrifier.sections.inserter
... key = python:'foo-%s' % item['id'][-2:]
... value = python:int(item['id'][-2:]) * 15
... condition = python:int(item['id'][-2:])
...
... [transform-id]
... blueprint = collective.transmogrifier.sections.inserter
... key = string:id
... value = string:foo-${item/id}
...
... [printer]
... blueprint = collective.transmogrifier.sections.tests.pprinter
... """
>>> registerConfig(u'collective.transmogrifier.sections.tests.inserter',
...                inserter)
>>> transmogrifier(u'collective.transmogrifier.sections.tests.inserter')
{'foo': 'bar (inserted into "item-00" by the "simple-insertion" section)',
 'id': 'foo-item-00'}
{'foo': 'bar (inserted into "item-01" by the "simple-insertion" section)',
 'foo-01': 15,
 'id': 'foo-item-01'}
{'foo': 'bar (inserted into "item-02" by the "simple-insertion" section)',
 'foo-02': 30,
 'id': 'foo-item-02'}

The key, value and condition expressions have access to the following:

item

the current pipeline item

transmogrifier

the transmogrifier

name

the name of the splitter section

options

the splitter options

modules

sys.modules

key

(only for the value and condition expressions) the key being inserted

Condition section

A condition pipeline section lets you selectively discard items from the pipeline. The condition section blueprint name is collective.transmogrifier.sections.condition.

A condition section takes a condition TALES expression. When this expression when matched against the current item is True, the item is yielded to the next pipe section, otherwise it is not:

>>> condition = """
... [transmogrifier]
... pipeline =
...     source
...     condition
...     printer
...
... [source]
... blueprint = collective.transmogrifier.sections.tests.rangesource
... size = 5
...
... [condition]
... blueprint = collective.transmogrifier.sections.condition
... condition = python:int(item['id'][-2:]) > 2
...
... [printer]
... blueprint = collective.transmogrifier.sections.tests.pprinter
... """
>>> registerConfig(u'collective.transmogrifier.sections.tests.condition',
...                condition)
>>> transmogrifier(u'collective.transmogrifier.sections.tests.condition')
{'id': 'item-03'}
{'id': 'item-04'}

The condition expression has access to the following:

item

the current pipeline item

transmogrifier

the transmogrifier

name

the name of the splitter section

options

the splitter options

modules

sys.modules

As condition sections skip items in the pipeline, they should not be used inside a splitter section!

Manipulator section

A manipulator pipeline section lets you copy, move or discard keys from the pipeline. The manipulator section blueprint name is collective.transmogrifier.sections.manipulator.

A manipulator section will copy keys when you specify a set of keys to copy, and an expression to determine what to copy these to. These are the keys and destination options.

The keys option is a set of key names, one on each line; keynames starting with re: or regexp: are treated as regular expresions. The destination expression is a TALES expression that can access not only the item, but also the matched key and, if a regular expression was used, the match object.

If a delete option is specified, it is also interpreted as a set of keys, like the keys option. These keys will be deleted from the item; if used together with the keys and destination options, keys will be renamed instead of copied.

>>> manipulator = """
... [transmogrifier]
... pipeline =
...     source
...     copy
...     rename
...     delete
...     printer
...
... [source]
... blueprint = collective.transmogrifier.sections.tests.samplesource
...
... [copy]
... blueprint = collective.transmogrifier.sections.manipulator
... keys =
...     title
...     id
... destination = string:$key-copy
...
... [rename]
... blueprint = collective.transmogrifier.sections.manipulator
... keys = re:([^-]+)-copy$
... destination = python:'%s-duplicate' % match.group(1)
... delete = ${rename:keys}
...
... [delete]
... blueprint = collective.transmogrifier.sections.manipulator
... delete = status
...
... [printer]
... blueprint = collective.transmogrifier.sections.tests.pprinter
... """
>>> registerConfig(u'collective.transmogrifier.sections.tests.manipulator',
...                manipulator)
>>> transmogrifier(u'collective.transmogrifier.sections.tests.manipulator')
{'id': 'foo',
 'id-duplicate': 'foo',
 'title': u'The Foo Fighters \u2117',
 'title-duplicate': u'The Foo Fighters \u2117'}
{'id': 'bar',
 'id-duplicate': 'bar',
 'title': u'Brand Chocolate Bar \u2122',
 'title-duplicate': u'Brand Chocolate Bar \u2122'}
{'id': 'monty-python',
 'id-duplicate': 'monty-python',
 'title': u"Monty Python's Flying Circus \xa9",
 'title-duplicate': u"Monty Python's Flying Circus \xa9"}

The destination expression has access to the following:

item

the current pipeline item

key

the name of the matched key

match

if the key was matched by a regular expression, the match object, otherwise boolean True

transmogrifier

the transmogrifier

name

the name of the splitter section

options

the splitter options

modules

sys.modules

Splitter section

A splitter pipeline section lets you branch a pipeline into 2 or more sub-pipelines. The splitter section blueprint name is collective.transmogrifier.sections.splitter.

A splitter section takes 2 or more pipeline definitions, and sends the items from the previous section through each of these sub-pipelines, each with it’s own copy [*] of the items:

>>> emptysplitter = """
... [transmogrifier]
... pipeline =
...     source
...     splitter
...     printer
...
... [source]
... blueprint = collective.transmogrifier.sections.tests.rangesource
... size = 3
...
... [splitter]
... blueprint = collective.transmogrifier.sections.splitter
... pipeline-1 =
... pipeline-2 =
...
... [printer]
... blueprint = collective.transmogrifier.sections.tests.pprinter
... """
>>> registerConfig(u'collective.transmogrifier.sections.tests.emptysplitter',
...                emptysplitter)
>>> transmogrifier(u'collective.transmogrifier.sections.tests.emptysplitter')
{'id': 'item-00'}
{'id': 'item-00'}
{'id': 'item-01'}
{'id': 'item-01'}
{'id': 'item-02'}
{'id': 'item-02'}

Although the pipeline definitions in the splitter are empty, we end up with 2 copies of every item in the pipeline as both splitter pipelines get to process a copy. Splitter pipelines are defined by options starting with pipeline-.

Normally you’ll use conditions to identify items for each sub-pipe, making the splitter the pipeline equivalent of an if/elif statement. Conditions are optional and use the pipeline option name plus -condition:

>>> evenoddsplitter = """
... [transmogrifier]
... pipeline =
...     source
...     splitter
...     printer
...
... [source]
... blueprint = collective.transmogrifier.sections.tests.rangesource
... size = 3
...
... [splitter]
... blueprint = collective.transmogrifier.sections.splitter
... pipeline-even-condition = python:int(item['id'][-2:]) % 2
... pipeline-even = even-section
... pipeline-odd-condition = not:${splitter:pipeline-even-condition}
... pipeline-odd = odd-section
...
... [odd-section]
... blueprint = collective.transmogrifier.sections.inserter
... key = string:even
... value = string:The even pipe
...
... [even-section]
... blueprint = collective.transmogrifier.sections.inserter
... key = string:odd
... value = string:The odd pipe
...
... [printer]
... blueprint = collective.transmogrifier.sections.tests.pprinter
... """
>>> registerConfig(u'collective.transmogrifier.sections.tests.evenodd',
...                evenoddsplitter)
>>> transmogrifier(u'collective.transmogrifier.sections.tests.evenodd')
{'even': 'The even pipe', 'id': 'item-00'}
{'odd': 'The odd pipe', 'id': 'item-01'}
{'even': 'The even pipe', 'id': 'item-02'}

Conditions are expressed as TALES statements, and have access to:

item

the current pipeline item

transmogrifier

the transmogrifier

name

the name of the splitter section

pipeline

the name of the splitter pipeline this condition belongs to (including the pipeline- prefix)

options

the splitter options

modules

sys.modules

Savepoint section

A savepoint pipeline section commits a savepoint every so often, which has a side-effect of freeing up memory. The savepoint section blueprint name is collective.transmogrifier.sections.savepoint.

A savepoint section takes an optional every option, which defaults to 1000; a savepoint is committed every every items passing through the pipe. A savepoint section doesn’t alter the items in any way:

>>> savepoint = """
... [transmogrifier]
... pipeline =
...     source
...     savepoint
...
... [source]
... blueprint = collective.transmogrifier.sections.tests.rangesource
... size = 10
...
... [savepoint]
... blueprint = collective.transmogrifier.sections.savepoint
... every = 3
... """
>>> registerConfig(u'collective.transmogrifier.sections.tests.savepoint',
...                savepoint)

We’ll show savepoints being committed by overriding transaction.savepoint:

>>> import transaction
>>> original_savepoint = transaction.savepoint
>>> counter = [0]
>>> def test_savepoint(counter=counter, *args, **kw):
...     counter[0] += 1
>>> transaction.savepoint = test_savepoint
>>> transmogrifier(u'collective.transmogrifier.sections.tests.savepoint')
>>> transaction.savepoint = original_savepoint
>>> counter[0]
3

CSV source section

A CSV source pipeline section lets you create pipeline items from CSV files. The CSV source section blueprint name is collective.transmogrifier.sections.csvsource.

A CSV source section will load the CSV file named in the filename option, and will yield an item for each line in the CSV file. It’ll use the first line of the CSV file to determine what keys to use, or you can specify a fieldnames option to specify the key names.

By default the CSV file is assumed to use the Excel CSV dialect, but you can specify any dialect supported by the python csv module if you specify it with the dialect option.

>>> import tempfile
>>> tmp = tempfile.NamedTemporaryFile('w+', suffix='.csv')
>>> tmp.write('\r\n'.join("""\
... foo,bar,baz
... first-foo,first-bar,first-baz
... second-foo,second-bar,second-baz
... """.splitlines()))
>>> tmp.flush()
>>> csvsource = """
... [transmogrifier]
... pipeline =
...     csvsource
...     printer
...
... [csvsource]
... blueprint = collective.transmogrifier.sections.csvsource
... filename = %s
...
... [printer]
... blueprint = collective.transmogrifier.sections.tests.pprinter
... """ % tmp.name
>>> registerConfig(u'collective.transmogrifier.sections.tests.csvsource',
...                csvsource)
>>> transmogrifier(u'collective.transmogrifier.sections.tests.csvsource')
{'baz': 'first-baz', 'foo': 'first-foo', 'bar': 'first-bar'}
{'baz': 'second-baz', 'foo': 'second-foo', 'bar': 'second-bar'}
>>> transmogrifier(u'collective.transmogrifier.sections.tests.csvsource',
...                csvsource=dict(fieldnames='monty spam eggs'))
{'eggs': 'baz', 'monty': 'foo', 'spam': 'bar'}
{'eggs': 'first-baz', 'monty': 'first-foo', 'spam': 'first-bar'}
{'eggs': 'second-baz', 'monty': 'second-foo', 'spam': 'second-bar'}

Change History

(name of developer listed in brackets)

1.0 (2009-08-07)

  • Initial transmogrifier architecture. [mj]

Download

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

collective.transmogrifier-1.0.zip (79.6 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page