Skip to main content

Run OpenOffice as web service.

Project description

ulif.openoffice

Convert office docs with LibreOffice/OpenOffice via Python, Commandline, or HTTP (including XMLRPC).

build-status

This package provides tools like WSGI apps, cache managers, and commandline converters to ease access to LibreOffice/OpenOffice installations for Python programmers. Beside basic converting it provides ‘document processors’ for further finetuning of generated docs (mainly HTML).

Out of the box these processors allow extracting CSS from HTML conversions, removal of LibreOffice-specific tags, zipping, unzipping, etc.

If the given processors are not enough for you, or you want some special handling of results (say, sign generated docs cryptographically, add watermarks, or whatever), you can define own additional document processors in your own packages by using the Python entry-point API. ulif.openoffice will integrate them automatically during document processing and provide them in webservices, commandline clients and Python API.

Resources

ulif.openoffice sources are hosted on

https://github.com/ulif/ulif.openoffice

The complete documentation can be found at

https://ulif-openoffice.readthedocs.org/en/latest/

Examples

Conversion via Python

A .doc to .html conversion via the Python API can be done like this:

>>> from ulif.openoffice.client import Client
>>> client = Client()
>>> result = client.convert('document.doc')
>>> pprint(result)
('.../document.html.zip', None, {'error': False, 'oocp_status': 0})

The generated document is by default brushed up HTML with separate stylesheets and images all put into a single .zip document.

You can configure the document conversion via various options. This way you can set the output type (at least PDF, HTML, XHTML and TXT are supported), tell whether separate CSS stylesheets should be extracted, which PDF format should be generated (1.3 aka PDF/A or 1.4), and many, many things more.

Conversion via Commandline

We also provide a handy commandline tool to perform conversions:

$ oooclient document.doc
RESULT in /tmp/.../document.html.zip

As you can see, the result is put in a freshly created directory.

The commandline client also provides help to display all supported options, document processors, etc.:

$ oooclient --help

will give you the comprehensive list.

Conversion via Web (XMLRPC or RESTful)

ulif.openoffice comes with two WSGI applications that provide document conversion services to web clients. One is a RESTful document conversion service, the other is a WSGI based XMLRPC server. With one of these applications running you can send office documents to a server and will receive the converted document.

All WSGI document converters supports (optional) local caching which will store conversion results and deliver it (bypassing new conversion) if a document was requested to be converted already.

The package comes with prepared configuration files to setup and start such a web-based document converter in minutes.

See the extended docs under

https://ulif-openoffice.readthedocs.org/en/latest/

for details.

Install

User Install

ulif.openoffice can be installed via pip:

$ pip install ulif.openoffice

Afterwards all commandline tools should be available.

Developer Install

It is recommended to setup sources in a virtual environment:

$ virtualenv py27      # Python 2.6, 2.7 are supported
$ source py27/bin/activate
(py27) $

Get the sources:

(py27) $ git clone https://github.com/ulif/ulif.openoffice.git
(py27) $ cd ulif.openoffice

Install packages for testing:

(py27) $ python setup.py dev

It is recommended to start the oooctl daemon before running tests:

(py27) $ oooctl start

This will make LibreOffice listen in background and reduce runtime of tests significantly.

Running tests:

(py27) $ py.test

We also support tox to run tests for all supported Python versions:

(py27) $ pip install tox
(py27) $ tox

Of course you must have the respective Python versions installed (currently: Python 2.6, 2.7).

Running coverage detector:

(py27) $ py.test --cov=ulif.openoffice    # for cmdline results
(py27) $ py.test --cov=ulif.openoffice --cov-report=html

The latter will generate HTML coverage reports in a subdirectory.

Install packages for Sphinx-base documentation:

(py27) $ python setup.py docs
(py27) $ cd doc
(py27) $ make html

Will generate the documentation in a subdirectory.

License

ulif.openoffice is covered by the GPL version 2.

Author

By Uli Fouquet (uli at gnufix dot de). Please do not hesitate to contact me for wishes, requests, suggestions, or other questions.

CHANGES

1.1.1 (2015-07-23)

  • Close file handles properly.

  • The commandline client now only handles one input file and does not copy whole directory contents any more.

1.1 (2015-07-12)

  • Added a WSGI-based XMLRPC application to trigger conversion via XMLRPC.

  • Added get_cached method for client and XMLRPC client to retrieve docs stored in cache.

  • Added get_cached_file_by_source method for cachemanager. This method is expensive but allows finding cached files without a cache key.

  • Fixed bug: OOCP processor returned wrong result file path for XHTML output.

  • Fixed bug: Remove temporary dir if converting fails in client.

  • Modified tests to accept also docs generated on Ubuntu 14.04.

  • Fixed bug: Catch shutil.Error in copytree() [thanks to: sbywater]

  • Added new option: –css-cleaner-prettify-html prettifies generated HTML code. This was done automatically in previous releases and can lead to gaps in rendered output. This option, when set (disabled by default) enables the old behaviour. Fixes #3.

1.0 (2013-09-02)

Major rewrite of the whole package.

  • convert now uses the commandline tool unoconv. You need this tool to use the package.

  • As unoconv has all you want from the package script convert (and much more!), the convert script is not provided as a script any more. Simply use unoconv instead.

  • oooctl is now a daemonizer for unoconv -l.

  • Apply PEP 8 rules to cachemanager.py.

  • Moved unittests to dedicated tests/ dir in package root.

  • Switched from Zope testing to py.test.

  • Removed pyuno server, clients and related components.

  • Removed find functionality as it is based on direct pyuno access.

  • Removed zc.buildout support.

  • Removed cherrypy-based restserver. The new WSGI app is the replacement.

  • Added WSGI based document converter.

  • Added simple htaccess WSGI filter for web authentication.

  • Replaced cachemanager with a more robust and lightweight version. Old caches do not work any more with this new implementation.

  • Introduced a new central Options component to manage supported options for all other components.

0.4 (2011-02-11)

  • Added functionality to find text in documents. Many thanks to sig at akasig.org for the patch!

0.3 (2010-11-17)

  • Added option to disable caching completely: set --cache-dir to empty string to disable caching [Thanks to Adama Groszer for patches!]

  • Removed unwanted output when running in foreground mode.

  • Cachemanager now supports listing all sources contained in cache dir.

  • Fixed bug in cachemanager: under rare circumstances (two different input files with same MD5 hash digest and identical file stats were considered to be identical by the cachemanager and thus led to inconsistencies in cache). We now check thoroughly whether two such files differ.

  • Lots of test fixes [Thanks to Adam Groszer for patches!]

0.2.1 (2010-06-13)

  • Fixed fix to cope with pyuno monkey-patching standard __import__ function. More recent pyuno versions do not do that kind of stuff any more (which is an improvement).

  • Fixed foreground start of `oooctl` server. It didn’t work correctly with more recent OpenOffice.org/pyuno installs. You now don’t have to press CTRL-C two times anymore when trying to stop a oooctl server running in foreground.

0.2 (2010-05-20)

  • Added license and copyright file to comply with policy of major Linux distributors.

  • Added sphinx docs.

  • Fixed wrong result path when returning cached HTML results.

  • Added mode fg for oooctl. Using oooctl fg one can start oooctl in foreground now.

  • Added mode fg for pyunoctl. Using pyunoctl fg one can start pyunoctl in foreground now.

  • Added state check for oooctl: when OpenOffice.org server is down during runtime it is restarted automatically. The check happens every second.

  • Use standard lib doctest instead of zope.testing.doctest.

  • Changed PDF creation: by default now normal PDF (and not PDF/A) is created when converting to PDF. This is due to an endianess bug in many recent OpenOffice.org binaries running on 64-bit platforms.

0.1 (2010-03-02)

  • Initial implementation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ulif.openoffice-1.1.1.tar.gz (1.6 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page