parse2plone

Easily import static HTML files into Plone.

Project description

Parse2Plone is an lxml/soup parser (in the form of a Buildout recipe that creates a script for you) to easily get content from static HTML files into Plone.

Warning

This is a Buildout recipe! By itself it does nothing. If you do not know what Buildout is, please see: http://www.buildout.org/.

Getting started

Because it always drives me nuts when you have to dig for a recipe’s options, here they are:

[import]
recipe = parse2plone
#path = Plone
#html_extensions = html
#image_extensions = gif jpg jpeg png
#target_tags = a div h1 h2 p
#illegal_chars = _

Everything but the recipe parameter is commented out; the parameters listed are configured with default values. Uncomment/edit these if you would like to change the default behavior, they are (hopefully) self-explanatory. Now you can just cut and paste to get started, or keep reading if you would like to know more.

Installation

You can install Parse2Plone by editing your buildout.cfg file like so:

First, add an import section:
```
[import]
recipe = parse2plone
```
Then, add the import section to the list of parts:
```
[buildout]
...
parts =
    ...
    import
```
Now run bin/buildout as usual.

Execution

You can run Parse2Plone like so:

$ bin/plone run bin/import /path/to/files

Demonstration

If you have a site in /var/www/html that contains the following:

/var/www/html/index.html
/var/www/html/about/index.html

You should run:

$ bin/plone run bin/import /var/www/html

And the following will be created:

http://localhost:8080/Plone/index.html

http://localhost:8080/Plone/about/index.html

Explanation

Why did you create Parse2Plone when the following packages (and probably many more) already exist:

http://pypi.python.org/pypi/collective.transmogrifier

http://pypi.python.org/pypi/transmogrify.filesystem

http://pypi.python.org/pypi/transmogrify.htmlcontentextractor

Here are a few reasons:

Because Parse2Plone is aimed at lowering the bar for folks who don’t already know (or want to know) what a “transmogrifier blueprint” is but can update their buildout.cfg file, run Buildout, and then run a single import command to import static content from the file system all without having to think very much.
collective.transmogrify provides a framework for creating reusable pipes (whose definitions are called blueprints). Parse2Plone provides a single, non-reusable “pipe/blueprint”.
The author had an itch to scratch; it will be nice for him to be able to say “just go write a script” and then point to an example.

Consternation

Here are some trouble-shooting comments/tips.

lxml

Parse2Plone requires lxml which in turn requires libxml2 and libxslt. If you do not have lxml installed “globally” (i.e. in your system Python’s site-packages directory) then Buildout will try to install it for you.

At this point lxml will look for the libxml2/libxslt2 development libraries to build against, and if you don’t have them installed on your system already your mileage may vary (i.e. Buildout will fail).

Database access

Before running parse2plone, you must either stop your Plone site or use ZEO. Otherwise parse2plone will not be able to access the database.

Modification

Modifying the default behavior of parse2plone is easy; use the command line options or add parameters to your buildout.cfg file.

Both approaches allow customization of the same set of options, but the command line arguments will trump any settings found in your buildout.cfg file.

Command line

The following parse2plone command line options are available.

Path (--path, -p)

You can specify an alternate path to the Plone site object located within the database (‘/Plone’ by default) with --path or -p:

$ bin/plone run bin/import /path/to/files --path=/path/to/Plone
$ bin/plone run bin/import /path/to/files -p MyPloneSite

Buildout

The following parse2plone recipe options are available.

Parameters

You can configure the following parameters in your buildout.cfg file:

path - Specify an alternate location for the Plone site object in the database.
html_file_ext - Specify HTML file extensions. parse2plone will import HTML files with these extensions.
illegal_chars - Specify illegal characters. parse2plone will ignore files that contain these characters.
image_file_ext - Specify image file extensions. parse2plone will import image files with these extensions.
target_tags - Specify target tags. parse2plone will parse the contents of HTML tags listed.

Example

Instead of accepting the default behaviour, in your buildout.cfg file you may specify the following configuration:

[import]
recipe = parse2plone
path = Plone2
html_extensions = htm
image_extensions = png
target_tags = p

This will configure parse2plone to (only) import images ending in .png, and content in p tags from files ending in .htm to a Plone site object named Plone2.

Communication

Questions, comments, or concerns? Email: aclark@aclark.net

History

0.6 - 10/25/2010

No really, revert ‘add Plone to install_requires’
Add configurable options for: path, illegal_chars, html_extensions, image_extensions, and target_tags
Allow user to set all configurable options via both buildout.cfg and command line arguments
Refactor utility functions
Add tests.py

0.5 - 10/22/2010

Revert ‘add Plone to install_requires’

0.4 - 10/22/2010

Add ‘Plone’ to install_requires

0.3 - 10/22/2010

Another setuptools fix

0.2 - 10/22/2010

Setuptools fix

0.1 - 10/21/2010

Initial release

Project details

Release history Release notifications | RSS feed

1.0a4 pre-release

Jan 12, 2011

1.0a3 pre-release

Nov 17, 2010

1.0a2 pre-release

Nov 17, 2010

1.0a1 pre-release

Nov 17, 2010

0.9.9

Nov 16, 2010

0.9.8

Nov 13, 2010

0.9.7

Nov 10, 2010

0.9.6

Nov 10, 2010

0.9.5

Nov 9, 2010

0.9.4

Nov 6, 2010

0.9.3

Nov 5, 2010

0.9.2

Nov 3, 2010

0.9.1

Nov 3, 2010

0.9.0

Nov 3, 2010

0.8.2

Nov 2, 2010

0.8.1

Oct 30, 2010

0.8

Oct 27, 2010

0.7

Oct 25, 2010

This version

0.6

Oct 25, 2010

0.5

Oct 22, 2010

0.4

Oct 22, 2010

0.3

Oct 22, 2010

0.2

Oct 22, 2010

0.1

Oct 22, 2010

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parse2plone-0.6.zip (17.8 kB view hashes)

Uploaded Oct 25, 2010 Source

Hashes for parse2plone-0.6.zip

Hashes for parse2plone-0.6.zip
Algorithm	Hash digest
SHA256	`fd7874973461ce6bdbc323df6ba6e44e08864ee7f59385ed49fb0449bd8826a4`
MD5	`9283d0f4c18f53f4abeb3b874b258e3c`
BLAKE2b-256	`734123dd0172003f59ccf2bb6306216eac5cdea53d9e3a7cfe24ba9eec722080`

parse2plone 0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Warning

Getting started

Installation

Execution

Demonstration

Explanation

Consternation

lxml

Database access

Modification

Command line

Path (--path, -p)

Buildout

Parameters

Example

Communication

History

0.6 - 10/25/2010

0.5 - 10/22/2010

0.4 - 10/22/2010

0.3 - 10/22/2010

0.2 - 10/22/2010

0.1 - 10/21/2010

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution