skip to navigation
skip to content

Not Logged In

metapensiero.sphinx.patchdb 1.2.1

Extract scripts from a reST document and apply them in order.

Building and maintaining the schema of a database is always a challenge. It may quickly become a nightmare when dealing with even moderately complex databases, in a distribuited development environment. You have new features going in, and fixes here and there, that keeps accumulating in the development branch. You also have several already deployed instances of the database you wanna upgrade now and then.

In my experience, it’s very difficult to impossible to come up with a completely automated solution, for several reasons:

  • comparison between different releases of a database schema is tricky
  • actual contents of the database must be preserved
  • some changes require specific recipes to upgrade the data
  • any automated solution hide some detail, by definition: I need complete control, to be able to create temporary tables and/or procedures for example

I tried, and wrote myself, several different approaches to the problem, and this package is my latest and most satisfying effort: it builds on top of docutils and Sphinx, with the side advantage that you get a quite nice and good documentation of the whole architecture: literate database scheming!

How it works

The package contains two distinct pieces: a Sphinx extension and the patchdb command line tool.

The extension implements a new ReST directive able to embed a script in the document: when processed by the sphinx-build tool, all the scripts will be collected in an external file, configurable.

The patchdb tool takes that script collection and determines which scripts need to be applied to some database, and the right order.

It keeps and maintains a single very simple table within the database, where it records the last version of each script it successfully execute, so that it won’t reexecute the same script (actually, a particular revision of it) twice.

So, on the development side you simply write (and document!) each piece, and when it comes the time of deploying current state you distribute just the script collection (a single file, usually in YAML or JSON format, or a pickle archive) to the end points where the database instances live, and execute patchdb against each one.

Scripts

The basic building block is a script, an arbitrary sequence of statements written in some language (currently, either Python, SQL or Shell), augmented with some metadata such as the scriptid, possibly a longer description, its revision and so on.

As a complete example of the syntax, consider the following:

.. patchdb:script:: My first script
   :description: Full example of a script
   :revision: 2
   :depends: Other script@4
   :preceeds: Yet another
   :language: python
   :conditions: python_2_x

   print "Yeah!"

This will introduce a script globally identified by My first script, written in Python: this is its second release, and its execution must be constrained such that it happens after the execution of the fourth revision of Other script and before Yet another.

The example shows also an usage of the conditions, allowing more than one variant of a script like:

.. patchdb:script:: My first script (py3)
   :description: Full example of a script
   :revision: 2
   :depends: Other script@4
   :preceeds: Yet another
   :language: python
   :conditions: python_3_x

   print("Yeah!")

The dependencies may be a comma separated list of script ids, such as:

.. patchdb:script:: Create master table

   CREATE TABLE some_table (id INTEGER PRIMARY KEY, tt_id INTEGER)

.. patchdb:script:: Create target table

   CREATE TABLE target_table (id INTEGER PRIMARY KEY)

.. patchdb:script:: Add foreign key to some_table
   :depends: Create master table, Create target table

   ALTER TABLE some_table
         ADD CONSTRAINT fk_master_target
             FOREIGN KEY (tt_id) REFERENCES target_table (id)

Independently from the order these scripts appear in the documentation, the third script will execute only after the first two get successfully applied to the database. As you can notice, most of the options are optional: by default, :language: is sql, :revision: is 1, the :description: is taken from the title (that is, the script ID), while :depends: and :preceeds: are empty.

Just for illustration purposes, the same effect could be achieved with:

.. patchdb:script:: Create master table
   :preceeds: Add foreign key to some_table

   CREATE TABLE some_table (id INTEGER PRIMARY KEY, tt_id INTEGER)

.. patchdb:script:: Create target table

   CREATE TABLE target_table (id INTEGER PRIMARY KEY)

.. patchdb:script:: Add foreign key to some_table
   :depends: Create target table

   ALTER TABLE some_table
         ADD CONSTRAINT fk_master_target
             FOREIGN KEY (tt_id) REFERENCES target_table (id)

Patches

A patch is a particular flavour of script, one that specify a brings dependency list. Imagine that the example above was the first version of the database, and that the current version looks like the following:

.. patchdb:script:: Create master table
   :revision: 2

   CREATE TABLE some_table (
     id INTEGER PRIMARY KEY,
     description VARCHAR(80),
     tt_id INTEGER
   )

that is, some_table now contains one more field, description.

We need an upgrade path from the first revision of the table to the second:

.. patchdb:script:: Add a description to the master table
   :depends: Create master table@1
   :brings: Create master table@2

   ALTER TABLE some_table ADD COLUMN description VARCHAR(80)

When patchdb examines the database status, it will execute one or the other. If the script Create master table isn’t executed yet (for example when operating on a new database), it will take the former script (the one that creates the table from scratch). Otherwise, if the database “contains” revision 1 (and not higher than 1) of the script, it will execute the latter, bumping up the revision number.

Run-always scripts

Yet another variant of scripts, which gets applied always, every time patchdb is executed. This kind may be used to perform arbitrary operations, either at the start or at the end of the patchdb session:

.. patchdb:script:: Say hello
   :language: python
   :always: first

   print("Hello!")

.. patchdb:script:: Say goodbye
   :language: python
   :always: last

   print("Goodbye!")

Usage

Collecting patches

To use it, first of all you must register the extension within the Sphinx environment, adding the full name of the package to the extensions list in the file conf.py, for example:

# Add any Sphinx extension module names here, as strings.
extensions = ['metapensiero.sphinx.patchdb']

The other required bit of customization is the location of the on disk scripts storage, i.e. the path of the file that will contain the information about every found script: this is kept separated from the documentation itself because you will probably deploy on production servers just to update their database. If the filename ends with .json it will contain a JSON formatted array, if it ends with .yaml the information will be dumped in YAML, otherwise it will be a Python pickle. I usually prefer JSON or YAML, because those formats are more VCs friendly and open to human inspection.

The location may be set in the same conf.py as above, like:

# Location of the external storage
patchdb_storage = '…/dbname.json'

Otherwise, you can set it using the -D option of the sphinx-build command, so that you can easily share its definition with other rules in a Makefile. I usually put the following snippet at the beginning of the Makefile created by sphinx-quickstart:

TOPDIR ?= ..
STORAGE ?= $(TOPDIR)/database.json

SPHINXOPTS = -D patchdb_storage=$(STORAGE)

At this point, executing the usual make html will update the scripts archive: that file contains everything is needed to update the database either local or remote; in other words, running Sphinx (or even having it installed) is not required to update a database.

Updating the database

The other side of the coin is managed by the patchdb tool, that digests the scripts archive and is able to determine which of the scripts are not already applied and eventually does that, in the right order.

When your database does already exist and you are just starting using patchdb you may need to force the initial state with the following command:

patchdb --assume-already-applied --postgres "dbname=test" --patch-storage database.json

that will just update the patchdb table registering current revision of all the missing scripts, without executing them.

You can inspect what will be done, that is obtain the list of not already applied patches, with a command like:

patchdb --dry-run --postgres "dbname=test" -s database.json

The database.json archive can be sent to the production machines (in some cases I put it in a production branch of the repository and use the version control tool to update the remote machines, in other I simply used scp or rsync based solutions).

Example development Makefile snippet

The following is a snippet that I usually put in my outer Makefile:

export TOPDIR := $(CURDIR)
DBHOST := localhost
DBPORT := 5432
DBNAME := dbname
DROPDB := dropdb --host=$(DBHOST) --port=$(DBPORT)
CREATEDB := createdb --host=$(DBHOST) --port=$(DBPORT) --encoding=UTF8
STORAGE := $(TOPDIR)/$(DBNAME).json
DSN := host=$(DBHOST) port=$(DBPORT) dbname=$(DBNAME)
PUP := $(PATCHDB) --patch-storage=$(STORAGE) \
                  --postgres="$(DSN)" --log-file=$(DBNAME).log

# Build the Sphinx documentation
doc:
        $(MAKE) -C doc STORAGE=$(STORAGE) html

$(STORAGE): doc

# Show what is missing
missing-patches: $(STORAGE)
        $(PUP) --dry-run

# Upgrade the database to the latest revision
database: $(STORAGE)
        $(PUP)

# Remove current database and start from scratch
scratch-database:
        -$(DROPDB) $(DBNAME)
        $(CREATEDB) $(DBNAME)
        $(MAKE) database

Changes

1.2.1 (2014-07-02)

  • Add script’s “conditions” and “run-always” to the sphinx rendering
  • dbloady’s load_yaml() now returns a dictionary with loaded instances

1.2.0 (2014-06-19)

  • New “run-always” scripts
  • Poor man “CREATE DOMAIN” for MySQL
  • User defined assertions

1.1.2 (2014-06-05)

  • New –assume-already-applied option, useful when you start using patchdb on an already existing database

1.1.1 (2014-06-03)

  • Fix packaging, adding a MANIFEST.in

1.1.0 (2014-06-03)

  • Use setuptools instead of distribute
  • Use argparse instead of optparse
  • New mimetype property on scripts, to select the right Pygments highlighter
  • New MySQL specific context, using cymysql

1.0.7 (2013-08-23)

  • published on bitbucket

1.0.6 (2013-03-12)

  • dbloady: ability to load field values from external files

1.0.5 (2013-03-11)

  • dbloady: fix encoding error when printing messages coming from PostgresQL
  • dbloady: emit a progress bar on stderr

1.0.4 (2013-02-27)

  • dbloady, a new utility script, to load base data from a YAML stream.

1.0.3 (2012-11-07)

  • Fix :patchdb:script role

1.0.2 (2012-10-19)

  • Pickier way to split the multi-statements SQL scripts, now the ;; separator must be on a line by its own
  • More precise line number tracking when applying multi-statements SQL scripts
  • Dump and load script dependencies and conditions as lists, to avoid pointless repeated splits and joins

1.0.1 (2012-10-13)

  • Fix error loading JSON storage, simplejson already yields unicode strings
  • Possibly use the original title of the script as description, if not explicitly set
  • More precise error on unknown script reference
  • Minor corrections

1.0 (2012-10-10)

  • Added JSON support for the on disk scripts storage

  • Adapted to work with SQLAlchemy 0.7.x

  • Updated to work with docutils > 0.8

  • Refactored as a Sphinx domain

    Attention!

    This means that the directive names are now prefixed with patchdb: (that is, the old script directive is now patchdb:script). You can use the default-domain directive if that annoys you.

  • Renamed the status table from prst_applied_info to simply patchdb

    Attention!

    This is the main incompatible change with previous version: you should eventually rename the table manually, sorry for the inconvenience.

  • Renamed prst_patch_storage configuration setting to patchdb_storage

  • Each script ID is now lower case, to avoid ambiguities

0.3 (2010-11-14)

  • Updated to work with Sphinx 1.0
  • New :script: role for cross-references
  • New :file: option on script directive, to keep the actual text in an external file

0.2 (2010-03-03)

  • Compatibility with SQLAlchemy 0.6
  • New patchdb command line tool

0.1 (2009-10-28)

  • Replace home brew solution with SQLAlchemy topological sort
  • Use YAML for the persistent storage
  • Mostly working Sphinx adaptor
  • Rudimentary and mostly untested SQLAlchemy backend (basically only the direct PostgresQL backend has been battle tested in production…)
  • First standalone version

0.0

  • still a PylGAM side-product
  • simply a set of docutils directives
  • started with Firebird in mind, but grown up with PostgresQL
 
File Type Py Version Uploaded on Size
metapensiero.sphinx.patchdb-1.2.1.tar.gz (md5) Source 2014-07-02 30KB
  • Downloads (All Versions):
  • 1 downloads in the last day
  • 63 downloads in the last week
  • 266 downloads in the last month