skip to navigation
skip to content

collatex 2.0.0pre7

CollateX is a collation tool.

Latest Version: 2.1.3rc0

CollateX is a software to

  • read multiple (≥ 2) versions of a text, splitting each version into parts (tokens) to be compared,
  • identify similarities of and differences between the versions (including moved/transposed segments) by aligning tokens, and
  • output the alignment results in a variety of formats for further processing, for instance to support the production of a critical apparatus or the stemmatical analysis of a text’s genesis.


  • non progressive multiple sequence alignment
  • multiple output formats: alignment table, variant graph

How to install:

Mac/Linux: sudo pip install –pre collatex

if you don’t have pip installed, install it first with: sudo easy_install pip

Windows: There is no official Windows binary distribution of pygraphviz, which is needed for SVG rendering of the variant graph. To add SVG support in Windows, before doing the above, install an “unofficial” Windows pygraphviz binary from the link at, along with the main Graphviz file at the link provided there. Then add the path to the graphviz installation (specifically, to dot.exe) to the system path.

Simple example:

from collatex import *

collation = Collation()
collation.add_witness("A", "The quick brown fox jumps over the dog.")
collation.add_witness("B", "The brown fox jumps over the lazy dog.")

alignment_table = collate(collation)

When running from the command shell run the example script with:

python ./

When using IPython Notebook a nice HTML representation of the alignment table is shown when the collate function is called. When using a textual Python prompt add


to show the results. Output can also be shown as a graph instead of a table when graphviz and pygraphviz are installed:

collate(collation, output="graph")


2.0.0pre7 (2014-07-14)

  • Fixed handling of segmentation parameter in pretokenized JSON function.

2.0.0pre6 (2014-06-30)

  • Added Windows support. Thanks to David J. Birnbaum.
  • Fixed handling of IPython imports.

2.0.0pre5 (2014-06-11)

  • Added JSON output to collate method.
  • Added option to collate method to enable or disable parallel segmentation.
  • Added table output to collate_pretokenized_json method, next to the already existing JSON output.
  • Cached the suffix and LCP arrays to prevent unnecessary recalculation
  • Fixed handling of empty cells in JSON output of pretokenized JSON.
  • Fixed compatibility issue when rendering HTML or SVG with IPython 2.1 instead of IPython 0.13.
  • Corrected RST syntax in package info description.

2.0.0pre4 (2014-06-11)

  • Added pretokenized JSON support.
  • Added JSON visualization for the alignment table.

2.0.0pre3 (2014-06-10)

  • Fixed imports in, “from collatex import *” now works correctly.
  • Added IPython HTML support for alignment table.
  • Added IPython SVG support for variant graph.
  • Added convenience constructors on Collation object.
  • Added horizontal layout for the alignment table visualization, next to vertical one.

2.0.0pre2 (2014-06-09)

  • Removed max 6 witness limit in aligner, now n number of witnesses can be aligned.
  • Added transposition detection.
  • Added alignment table plus plain text visualization.
  • Added collate convenience function.

2.0.0pre1 (2014-06-02)

  • First release on PyPI.
  • First pure Python development release of CollateX.
  • New collation algorithm, which does non progressive multiple witness alignment.
File Type Py Version Uploaded on Size
collatex-2.0.0pre7-py2.7.egg (md5) Python Egg 2.7 2014-07-14 63KB
collatex-2.0.0pre7.tar.gz (md5) Source 2014-07-14 53KB