skip to navigation
skip to content

metapensiero.sphinx.patchdb 2.5

Extract scripts from a reST document and apply them in order.

Building and maintaining the schema of a database is always a challenge. It may quickly become a nightmare when dealing with even moderately complex databases, in a distribuited development environment. You have new features going in, and fixes here and there, that keeps accumulating in the development branch. You also have several already deployed instances of the database you wanna upgrade now and then.

In my experience, it’s very difficult to impossible to come up with a completely automated solution, for several reasons:

  • comparison between different releases of a database schema is tricky
  • actual contents of the database must be preserved
  • some changes require specific recipes to upgrade the data
  • any automated solution hides some detail, by definition: I need complete control, to be able to create temporary tables and/or procedures for example

I tried, and wrote myself, several different approaches to the problem[*], and this package is my latest and most satisfying effort: it builds on top of docutils and Sphinx, with the side advantage that you get a quite nice and good documentation of the whole architecture: literate database scheming!

[*]

Just to mention a few alternatives:

Alembic
Written on top of SQLAlchemy by the same author: it does not help when you need to manage something outside SA knowledge (stored procedures, permissions, …)
Sqlibrist
Some similarities with PatchDB, very young, Django integration.
Squitch
Quite good, although Perl based…

See the schema migration page on Wikipedia for further details.


How it works

The package contains two distinct pieces: a Sphinx extension and the patchdb command line tool.

The extension implements a new ReST directive able to embed a script in the document: when processed by the sphinx-build tool, all the scripts will be collected in an external file, configurable.

The patchdb tool takes that script collection and determines which scripts need to be applied to some database, and the right order.

It creates and maintains a single very simple table within the database (unsurprisingly named patchdb), where it records the last version of each script it successfully execute, so that it won’t reexecute the same script (actually, a particular revision of it) twice.

So, on the development side you simply write (and document!) each piece, and when it comes the time of deploying current state you distribute just the script collection (a single file, usually in AXON, JSON or YAML format, or a pickle archive, see storage formats below) to the end points where the database instances live, and execute patchdb against each one.

Scripts

The basic building block is a script, an arbitrary sequence of statements written in some language (currently, either Python, SQL or Shell), augmented with some metadata such as the scriptid, possibly a longer description, its revision and so on.

As a complete example of the syntax, consider the following:

.. patchdb:script:: My first script
   :description: Full example of a script
   :revision: 2
   :depends: Other script@4
   :preceeds: Yet another
   :language: python
   :conditions: python_2_x

   print "Yeah!"

This will introduce a script globally identified by My first script, written in Python: this is its second release, and its execution must be constrained such that it happens after the execution of the fourth revision of Other script and before Yet another.

Conditions

The example shows also an usage of the conditions, allowing more than one variant of a script like:

.. patchdb:script:: My first script (py3)
   :description: Full example of a script
   :revision: 2
   :depends: Other script@4
   :preceeds: Yet another
   :language: python
   :conditions: python_3_x

   print("Yeah!")

As another use case of this feature, the following snippet declares the same table for two different databases:

.. patchdb:script:: Simple table (PostgreSQL)
  :language: sql
  :mimetype: text/x-postgresql
  :conditions: postgres
  :file: postgresql/simple.sql

.. patchdb:script:: Simple table (MySQL)
  :language: sql
  :mimetype: text/x-mysql
  :conditions: mysql
  :file: mysql/simple.sql

As you can see, the content of the script can be conveniently stored in an external file, and the particular dialect specified with the :mimetype: option, so it will be properly highlighted by Pygments.

Such conditions may also be arbitrarily defined on the command line, so you can have for example:

.. patchdb:script:: Configure for production
  :language: sql
  :conditions: PRODUCTION

  UPDATE configuration SET is_production = true

and then add the option --assert PRODUCTION when it is the case.

Variables

Another way to influence a script effect is by using variables: a script may contain one or more references to arbitrary variables using the syntax {{VARNAME}}, that must be defined at application time, using the --define VARNAME=VALUE command line option. Alternatively with the syntax {{name=default}} the reference can set the default value for the variable, that can be overridden from the command line.

As an example, you can have the following script:

.. patchdb:script:: Create table and give read-only rights to the web user
   :language: sql

   CREATE TABLE foo (id INTEGER)
   ;;
   GRANT SELECT ON TABLE foo TO {{WEB=www}}
   ;;
   GRANT ALL ON TABLE foo TO {{ADMIN}}

To apply it, you must specify the value for the ADMIN variable, with something like --define ADMIN=$USER.

The variable name must be an identifier (that is, at least an alphabetic letter possibly followed by alphanumerics), while its value may contain whitespaces, letters or digits.

Dependencies

The dependencies may be a comma separated list of script ids, such as:

.. patchdb:script:: Create master table

   CREATE TABLE some_table (id INTEGER PRIMARY KEY, tt_id INTEGER)

.. patchdb:script:: Create target table

   CREATE TABLE target_table (id INTEGER PRIMARY KEY)

.. patchdb:script:: Add foreign key to some_table
   :depends: Create master table, Create target table

   ALTER TABLE some_table
         ADD CONSTRAINT fk_master_target
             FOREIGN KEY (tt_id) REFERENCES target_table (id)

Independently from the order these scripts appear in the documentation, the third script will execute only after the first two get successfully applied to the database. As you can notice, most of the options are optional: by default, :language: is sql, :revision: is 1, the :description: is taken from the title (that is, the script ID), while :depends: and :preceeds: are empty.

Just for illustration purposes, the same effect could be achieved with:

.. patchdb:script:: Create master table
   :preceeds: Add foreign key to some_table

   CREATE TABLE some_table (id INTEGER PRIMARY KEY, tt_id INTEGER)

.. patchdb:script:: Create target table

   CREATE TABLE target_table (id INTEGER PRIMARY KEY)

.. patchdb:script:: Add foreign key to some_table
   :depends: Create target table

   ALTER TABLE some_table
         ADD CONSTRAINT fk_master_target
             FOREIGN KEY (tt_id) REFERENCES target_table (id)

Errors handling

By default patchdb stops when it fails to apply one script. Some time you may want to relax that rule, for example when operating on a database that was created with other methods so you cannot relay on the existence of a specific script to make the decision. In such cases, the option :onerror: may be used:

.. patchdb:script:: Remove obsoleted tables and functions
   :onerror: ignore

   DROP TABLE foo
   ;;
   DROP FUNCTION initialize_record_foo()

When :onerror: is set to ignore, each statement in the script is executed and if an error occurs it is ignored and patchdb proceeds with the next one. On good databases like PostgreSQL and SQLite where even DDL statements are transactional, each statement is executed in a nested subtransaction, so subsequent errors do not ruin the effect of correctly applied previous statements.

Another possible setting of this option is skip: in this case, whenever an error occurs the effect of the whole script is undone and it is considered as applied. For example, assuming that the old version of SomeProcedure accepted a single argument and the new one requires two of them, you could do something like the following:

.. patchdb:script:: Fix stored procedure signature
   :onerror: skip

   SELECT somecol FROM SomeProcedure(NULL, NULL)
   ;;
   ALTER PROCEDURE SomeProcedure(p_first INTEGER, p_second INTEGER)
   RETURNS (somecol INTEGER) AS
   BEGIN
     somecol = p_first * p_second;
     SUSPEND;
   END

Patches

A patch is a particular flavour of script, one that specify a brings dependency list. Imagine that the example above was the first version of the database, and that the current version looks like the following:

.. patchdb:script:: Create master table
   :revision: 2

   CREATE TABLE some_table (
     id INTEGER PRIMARY KEY,
     description VARCHAR(80),
     tt_id INTEGER
   )

that is, some_table now contains one more field, description.

We need an upgrade path from the first revision of the table to the second:

.. patchdb:script:: Add a description to the master table
   :depends: Create master table@1
   :brings: Create master table@2

   ALTER TABLE some_table ADD COLUMN description VARCHAR(80)

When patchdb examines the database status, it will execute one or the other. If the script Create master table isn’t executed yet (for example when operating on a new database), it will take the former script (the one that creates the table from scratch). Otherwise, if the database “contains” revision 1 (and not higher than 1) of the script, it will execute the latter, bumping up the revision number.

Run-always scripts

Yet another variant of scripts, which get applied always, every time patchdb is executed. This kind may be used to perform arbitrary operations, either at the start or at the end of the patchdb session:

.. patchdb:script:: Say hello
   :language: python
   :always: first

   print("Hello!")

.. patchdb:script:: Say goodbye
   :language: python
   :always: last

   print("Goodbye!")

Fake data domains

As a special case that uses this kind of script, the following example illustrate an approximation of the data domains with MySQL, that lacks them:

.. patchdb:script:: Define data domains (MySQL)
   :language: sql
   :mimetype: text/x-mysql
   :conditions: mysql
   :always: first

   CREATE DOMAIN bigint_t bigint
   ;;
   CREATE DOMAIN `Boolean_t` char(1)

.. patchdb:script:: Create some table (MySQL)
   :language: sql
   :mimetype: text/x-mysql
   :conditions: mysql
   :always: first

   CREATE TABLE `some_table` (
       `ID` bigint_t NOT NULL,
     , `FLAG` `Boolean_t`

     , PRIMARY KEY (`ID`)
   )

Warning

This is just a dirty hack, based on relatively simple search and replace: don’t take it seriously, use a better database if you really need data domains!

Note

This works also with SQLite.

Placeholders

Another feature is that the definition of the database, that is the collection of the scripts that actually define its schema, may be splitted on multiple Sphinx environments: the use case is when you have a complex application, composed by multiple modules, each of them requiring its own set of DB objects.

A script is considered a placeholder when it has an empty body: it won’t be ever applied, but instead its presence in the database will be asserted. In this way, one Sphinx environment could contain the following script:

.. patchdb:script:: Create table a

   CREATE TABLE a (
       id INTEGER NOT NULL PRIMARY KEY
     , value INTEGER
   )

and another documentation set could extend that with:

.. patchdb:script:: Create table a
   :description: Place holder

.. patchdb:script:: Create unique index on value
   :depends: Create table a

   CREATE UNIQUE INDEX on_value ON a (value)

The second set can be applied only after the former one is.

Usage

Collecting patches

To use it, first of all you must register the extension within the Sphinx environment, adding the full name of the package to the extensions list in the file conf.py, for example:

# Add any Sphinx extension module names here, as strings.
extensions = ['metapensiero.sphinx.patchdb']

If you want to take advantage of the augmented DataDocumenter, add also metapensiero.sphinx.patchdb.autodoc_sa to that list.

The other required bit of customization is the location of the on disk scripts storage, i.e. the path of the file that will contain the information about every found script: this is kept separated from the documentation itself because you will probably deploy it on production servers just to update their database.

Storage formats

If the filename ends with .json it will contain a JSON formatted array, if it ends with .yaml the information will be dumped in YAML, if it ends with .axon the dump will be formatted using AXON, otherwise it will be a Python pickle. I usually prefer AXON, JSON or YAML, because those formats are more VCs friendly and open to human inspection. These days I tend to use AXON for this kind of things as it is slightly more readable and more VCs friendly than JSON, while YAML is very slow.

The location may be set in the same conf.py as above, like:

# Location of the external storage
patchdb_storage = '…/dbname.json'

Otherwise, you can set it using the -D option of the sphinx-build command, so that you can easily share its definition with other rules in a Makefile. I usually put the following snippet at the beginning of the Makefile created by sphinx-quickstart:

TOPDIR ?= ..
STORAGE ?= $(TOPDIR)/database.json

SPHINXOPTS = -D patchdb_storage=$(STORAGE)

At this point, executing the usual make html will update the scripts archive: that file contains everything is needed to update the database either local or remote; in other words, running Sphinx (or even having it installed) is not required to update a database.

Updating the database

The other side of the coin is managed by the patchdb tool, that digests the scripts archive and is able to determine which of the scripts are not already applied and eventually does that, in the right order.

When your database does already exist and you are just starting using patchdb you may need to force the initial state with the following command:

patchdb --assume-already-applied --postgresql "dbname=test" database.json

that will just update the patchdb table registering current revision of all the missing scripts, without executing them.

You can inspect what will be done, that is obtain the list of not already applied patches, with a command like:

patchdb --dry-run --postgresql "dbname=test" database.json

The database.json archive can be sent to the production machines (in some cases I put it in a production branch of the repository and use the version control tool to update the remote machines, in other I simply used scp or rsync based solutions). Another way is to include it in some package and then use the syntax some.package:path/database.json.

The scripts may even come from several different archives (see placeholders above):

patchdb --postgresql "dbname=test" app.db.base:pdb.json app.db.auth:pdb.json

Automatic backup

In particular in development mode, I find it useful to have a simple way of going back to a previous state and retry the upgrade, either to test different upgrade paths or to fix silly typos in the new patches.

Since version 2.3 patchdb has a new option, --backups-dir, that controls an automatic backup facility: at each execution, before proceeding with applying missing patches, regardless whether there are any, by default it takes a backup of the current database and keeps a simple index of these snapshots.

The option defaults to the system-wide temporary directory (usually /tmp on POSIX systems): if you you don’t need the automatic backup (a reasonable production system should have a different approach to taking such snapshots), specify None as argument to the option.

With the patchdb-states tool you obtain a list of the available snapshots, or restore any previous one:

$ patchdb-states list
[lun 18 apr 2016 08:24:48 CEST] bc5c5527ece6f11da529858d5ac735a8 <create first table@1>
[lun 18 apr 2016 10:27:11 CEST] 693fd245ad9e5f4de0e79549255fbd6e <update first table@1>

$ patchdb-states restore --sqlite /tmp/quicktest.sqlite 693fd245ad9e5f4de0e79549255fbd6e
[I] Creating patchdb table
[I] Restored SQLite database /tmp/quicktest.sqlite from /tmp/693fd245ad9e5f4de0e79549255fbd6e

$ patchdb-states clean -k 1
Removed /tmp/bc5c5527ece6f11da529858d5ac735a8
Kept most recent 1 snapshot

Supported databases

As of version 2, patchdb can operate on the following databases:

  • Firebird (requires fdb)
  • MySQL (requires PyMySQL by default, see option --driver to select a different one)
  • PostgreSQL (requires psycopg2)
  • SQLite (uses the standard library sqlite3 module)
  • any database supported by SQLAlchemy (but some features don’t work)

Example development Makefile snippet

The following is a snippet that I usually put in my outer Makefile:

export TOPDIR := $(CURDIR)
DBHOST := localhost
DBPORT := 5432
DBNAME := dbname
DROPDB := dropdb --host=$(DBHOST) --port=$(DBPORT) --if-exists
CREATEDB := createdb --host=$(DBHOST) --port=$(DBPORT) --encoding=UTF8
STORAGE := $(TOPDIR)/$(DBNAME).json
DSN := host=$(DBHOST) port=$(DBPORT) dbname=$(DBNAME)
PUP := $(PATCHDB) --postgresql="$(DSN)" --log-file=$(DBNAME).log $(STORAGE)

# Build the Sphinx documentation
doc:
        $(MAKE) -C doc STORAGE=$(STORAGE) html

$(STORAGE): doc

# Show what is missing
missing-patches: $(STORAGE)
        $(PUP) --dry-run

# Upgrade the database to the latest revision
database: $(STORAGE)
        $(PUP)

# Remove current database and start from scratch
scratch-database:
        $(DROPDB) $(DBNAME)
        $(CREATEDB) $(DBNAME)
        $(MAKE) database

Changes

2.5 (2016-05-17)

  • Catch silly MySQL’s “There is no such grant defined” error on REVOKE ALL PRIVILEGES

2.4 (2016-05-13)

  • User defined variables

2.3 (2016-04-19)

  • Automatic backup functionality, with a new patchdb-states tool able to go back to a previous state of the database
  • Bring back Firebird support
  • Fix Python 2.7 compatibility

2.2 (2016-03-12)

  • Support loading from multiple archives in one shot, particularly handy with placeholder scripts

2.1 (2016-03-04)

  • Promote script problems to hard Sphinx errors

2.0 (2016-03-01)

  • Shiny new tests suite
  • New SQLite specific context
  • Generalized and somewhat better fake data domains for MySQL and SQLite. Warning: the syntax in not backward compatible with previous implementation added in version 1.2.0.
  • New placeholder scripts, to allow splitting schema in several different Sphinx environments
  • Now two scripts cannot have the same title, even within the same document
  • Fix onerror handling, broken long ago by a typo

1.7 (2016-02-20)

  • Fix packaging issues

1.6 (2016-02-10)

  • Data files and preload/postload scripts may be specified also as package relative resources
  • Deprecate the --patch-storage option for patchdb, replaced by a single positional argument: it’s going to be removed in version 2.0, in the meanwhile it’s still recognized

1.5 (2016-01-07)

  • Repackage dbloady as a standalone tool, metapensiero.sqlalchemy.dbloady

1.4.2 (2015-10-22)

  • Allow using keyed values (e.g. PostgreSQL HSTORE) to lookup instances in dbloady

1.4.1 (2015-09-23)

  • Augmented Sphinx autodoc DataDocumenter able to pretty print SA queries

1.4.0 (2015-08-19)

  • New experimental dbloady feature, mainly intendended for test fixtures: it is now able to take note about the instances it creates writing a YAML file with the same input format, and delete them from the database in a subsequent run

1.3.11 (2015-08-16)

  • dbloady now flushes changes after each entity to honor referential integrity checks

1.3.10 (2015-08-15)

  • Fix problem with the patchdb:script role, when the target gets splitted on two or more lines

1.3.9 (2015-08-08)

  • Fix problem with different MySQL drivers exceptions internals

1.3.8 (2015-08-08)

  • Allow longer patch ids, up to 100 characters

1.3.7 (2015-07-20)

  • Use PyMySQL by default, allow selection of a different driver with a command line option

1.3.6 (2015-07-06)

  • Do not decode patch id from UTF-8 but let the driver do that if needed

1.3.5 (2015-07-06)

  • Fix type of MySQL port number, must be an integer

1.3.4 (2015-07-06)

  • Accept also the port number to reach the MySQL server

1.3.3 (2015-06-24)

  • Some more tweaks to adapt dbloady to Python 3

1.3.2 (2015-06-23)

  • Flush the standard error stream to show the progress immediately
  • Do not encode statements in UTF-8 but let the driver do that if needed

1.3.1 (2015-06-23)

  • Fix “brown paper bag” syntax error

1.3.0 (2015-06-21)

  • Use fdb instead of kinterbasdb for Firebird
  • Support the AXON format for the on disk patch storage

1.2.1 (2014-07-02)

  • Add script’s “conditions” and “run-always” to the sphinx rendering
  • dbloady’s load_yaml() now returns a dictionary with loaded instances

1.2.0 (2014-06-19)

  • New “run-always” scripts
  • Poor man “CREATE DOMAIN” for MySQL
  • User defined assertions

1.1.2 (2014-06-05)

  • New –assume-already-applied option, useful when you start using patchdb on an already existing database

1.1.1 (2014-06-03)

  • Fix packaging, adding a MANIFEST.in

1.1.0 (2014-06-03)

  • Use setuptools instead of distribute
  • Use argparse instead of optparse
  • New mimetype property on scripts, to select the right Pygments highlighter
  • New MySQL specific context, using cymysql

1.0.7 (2013-08-23)

  • published on bitbucket

1.0.6 (2013-03-12)

  • dbloady: ability to load field values from external files

1.0.5 (2013-03-11)

  • dbloady: fix encoding error when printing messages coming from PostgreSQL
  • dbloady: emit a progress bar on stderr

1.0.4 (2013-02-27)

  • dbloady, a new utility script, to load base data from a YAML stream.

1.0.3 (2012-11-07)

  • Fix :patchdb:script role

1.0.2 (2012-10-19)

  • Pickier way to split the multi-statements SQL scripts, now the ;; separator must be on a line by its own
  • More precise line number tracking when applying multi-statements SQL scripts
  • Dump and load script dependencies and conditions as lists, to avoid pointless repeated splits and joins

1.0.1 (2012-10-13)

  • Fix error loading JSON storage, simplejson already yields unicode strings
  • Possibly use the original title of the script as description, if not explicitly set
  • More precise error on unknown script reference
  • Minor corrections

1.0 (2012-10-10)

  • Added JSON support for the on disk scripts storage

  • Adapted to work with SQLAlchemy 0.7.x

  • Updated to work with docutils > 0.8

  • Refactored as a Sphinx domain

    Attention!

    This means that the directive names are now prefixed with patchdb: (that is, the old script directive is now patchdb:script). You can use the default-domain directive if that annoys you.

  • Renamed the status table from prst_applied_info to simply patchdb

    Attention!

    This is the main incompatible change with previous version: you should eventually rename the table manually, sorry for the inconvenience.

  • Renamed prst_patch_storage configuration setting to patchdb_storage

  • Each script ID is now lower case, to avoid ambiguities

0.3 (2010-11-14)

  • Updated to work with Sphinx 1.0
  • New :script: role for cross-references
  • New :file: option on script directive, to keep the actual text in an external file

0.2 (2010-03-03)

  • Compatibility with SQLAlchemy 0.6
  • New patchdb command line tool

0.1 (2009-10-28)

  • Replace home brew solution with SQLAlchemy topological sort
  • Use YAML for the persistent storage
  • Mostly working Sphinx adaptor
  • Rudimentary and mostly untested SQLAlchemy backend (basically only the direct PostgreSQL backend has been battle tested in production…)
  • First standalone version

0.0

  • still a PylGAM side-product
  • simply a set of docutils directives
  • started with Firebird in mind, but grown up with PostgreSQL
 
File Type Py Version Uploaded on Size
metapensiero.sphinx.patchdb-2.5.tar.gz (md5) Source 2016-05-17 47KB