mbdata

MusicBrainz Database Tools

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

MusicBrainz Database Replica

This repository now contains a collection of scripts for managing a replica of the MusicBrainz database. These used to be called “mbslave”, but have been moved to this repository.

The main motivation for these scripts is to be able to customize your database. If you don’t need such custimizations, it might be easier to use the replication tools provided by MusicBrainz itself.

Installation

Make sure you have Python and psycopg2 installed. On Debian and Ubuntu, that means installing these packages:
```
sudo apt install python python-pip python-psycopg2
pip install -U mbdata  # if you don't have it installed already
```
The command will install the mbslave script into $HOME/.local/bin.
Get an API token on the MetaBrainz website.
Create mbslave.conf by copying and editing mbslave.conf.default:
```
curl https://raw.githubusercontent.com/lalinsky/mbdata/master/mbslave.conf.default -o mbslave.conf
vim mbslave.conf
```
By default, the mbslave script will look for the config file in the current directory. If you want it to find it from anywhere, either save it to /etc/mbslave.conf or set the MBSLAVE_CONFIG environment variable. For example::
```
export MBSLAVE_CONFIG=/usr/local/etc/mbslave.conf
```

Setup the database. If you are starting completely from scratch, you can use the following commands to setup a clean database:

sudo su - postgres
createuser musicbrainz
createdb -l C -E UTF-8 -T template0 -O musicbrainz musicbrainz
createlang plpgsql musicbrainz
psql musicbrainz -c 'CREATE EXTENSION cube;'
psql musicbrainz -c 'CREATE EXTENSION earthdistance;'

Prepare empty schemas for the MusicBrainz database and create the table structure:

echo 'CREATE SCHEMA musicbrainz;' | mbslave psql -S
echo 'CREATE SCHEMA statistics;' | mbslave psql -S
echo 'CREATE SCHEMA cover_art_archive;' | mbslave psql -S
echo 'CREATE SCHEMA wikidocs;' | mbslave psql -S
echo 'CREATE SCHEMA documentation;' | mbslave psql -S

mbslave psql -f CreateTables.sql
mbslave psql -f statistics/CreateTables.sql
mbslave psql -f caa/CreateTables.sql
mbslave psql -f wikidocs/CreateTables.sql
mbslave psql -f documentation/CreateTables.sql

Download the MusicBrainz database dump files from http://ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/

Import the data dumps, for example:

mbslave import mbdump.tar.bz2 mbdump-derived.tar.bz2

Setup primary keys, indexes and views:

mbslave psql -f CreatePrimaryKeys.sql
mbslave psql -f statistics/CreatePrimaryKeys.sql
mbslave psql -f caa/CreatePrimaryKeys.sql
mbslave psql -f wikidocs/CreatePrimaryKeys.sql
mbslave psql -f documentation/CreatePrimaryKeys.sql

mbslave psql -f CreateIndexes.sql
mbslave psql -f CreateSlaveIndexes.sql
mbslave psql -f statistics/CreateIndexes.sql
mbslave psql -f caa/CreateIndexes.sql

mbslave psql -f CreateFunctions.sql
mbslave psql -f CreateViews.sql

Vacuum the newly created database (optional):
```
echo 'VACUUM ANALYZE;' | mbslave psql
```

Replication

After the initial database setup, you might want to update the database with the latest data. The mbslave sync script will fetch updates from MusicBrainz and apply it to your local database:

mbslave sync

In order to update your database regularly, add a cron job like this that runs every hour:

15 * * * * mbslave sync >>/var/log/mbslave.log

Schema Upgrade

When the MusicBrainz database schema changes, the replication will stop working. This is usually announced on the MusicBrainz blog. When it happens, you need to upgrade the database.

Release 2019-05-14 (25)

Run the upgrade scripts:

mbslave psql -f updates/schema-change/25.slave.sql
echo 'UPDATE replication_control SET current_schema_sequence = 25;' | mbslave psql

Release 2017-05-25 (24)

Run the upgrade scripts:

mbslave psql -f updates/schema-change/24.slave.sql
echo 'UPDATE replication_control SET current_schema_sequence = 24;' | mbslave psql

Tips and Tricks

Single Database Schema

MusicBrainz used a number of schemas by default. If you are embedding the MusicBrainz database into an existing database for your application, it’s convenient to merge them all into a single schema. That can be done by changing your config like this:

[schemas]
musicbrainz=musicbrainz
statistics=musicbrainz
cover_art_archive=musicbrainz
wikidocs=musicbrainz
documentation=musicbrainz

After this, you only need to create the “musicbrainz” schema and import all the tables there.

Full Import Schema Upgrade

You can use the schema mapping feature to do zero-downtime upgrade of the database with full data import. You can temporarily map all schemas to e.g. “musicbrainz_NEW”, import your new database there and then rename it:

echo 'BEGIN; ALTER SCHEMA musicbrainz RENAME TO musicbrainz_OLD; ALTER SCHEMA musicbrainz_NEW RENAME TO musicbrainz; COMMIT;' | mbslave psql -S

SQLAlchemy Models

If you are developing a Python application that needs access to the MusicBrainz data, you can use the mbdata.models module to get SQLAlchemy models mapped to the MusicBrainz database tables.

All tables from the MusicBrainz database are mapped, all foreign keys have one-way relationships set up and some models, where it’s essential to access their related models, have two-way relationships (collections) set up.

In order to work with the relationships efficiently, you should use the appropriate kind of eager loading.

Example usage of the models:

>>> from sqlalchemy import create_engine
>>> from sqlalchemy.orm import sessionmaker
>>> from mbdata.models import Artist
>>> engine = create_engine('postgresql://musicbrainz:musicbrainz@127.0.0.1/musicbrainz', echo=True)
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> artist = session.query(Artist).filter_by(gid='8970d868-0723-483b-a75b-51088913d3d4').first()
>>> print artist.name

If you use the models in your own application and want to define foreign keys from your own models to the MusicBrainz schema, you will need to let mbdata know which metadata object to add the MusicBrainz tables to:

from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()

# this should be the first place where you import anything from mbdata
import mbdata.config
mbdata.config.configure(base_class=Base)

# now you can import and use the mbdata models
import mbdata.models

You can also use mbdata.config to re-map the MusicBrainz schema names, if your database doesn’t follow the original structure:

import mbdata.config
mbdata.config.configure(schema='my_own_mb_schema')

If you need sample MusicBrainz data for your tests, you can use mbdata.sample_data:

from mbdata.sample_data import create_sample_data
create_sample_data(session)

HTTP API

Note: This is very much a work in progress. It is not ready to use yet. Any help is welcome.

There is also a HTTP API, which you can use to access the MusicBrainz data using JSON or XML formats over HTTP. This is useful if you want to abstract away the MusicBrainz PostgreSQL database.

Installation:

virtualenv --system-site-packages e
. e/bin/activate
pip install -r requirements.txt
python setup.py develop

Configuration:

cp settings.py.sample settings.py
vim settings.py

Start the development server:

MBDATA_API_SETTINGS=`pwd`/settings.py python -m mbdata.api.app

Query the API:

curl 'http://127.0.0.1:5000/v1/artist/get?id=b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d'

For production use, you should use server software like uWSGI and nginx to run the service.

Solr Index

Create a minimal Solr configuration:

./bin/create_solr_home.py -d /tmp/mbdata_solr

Start Solr:

cd /path/to/solr-4.6.1/example
java -Dsolr.solr.home=/tmp/mbdata_solr -jar start.jar

Development

Normally you should work against a regular PostgreSQL database with MusicBrainz data, but for testing purposes, you can use a SQLite database with small data sub-set used in unit tests. You can create the database using:

./bin/create_sample_db.py sample.db

Then you can change your configuration:

DATABASE_URI = 'sqlite:///sample.db'

Running tests:

nosetests -v

If you want to see the SQL queries from a failed test, you can use the following:

MBDATA_DATABASE_ECHO=1 nosetests -v

Jenkins task that automatically runs the tests after each commit is here.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

27.1.0

Dec 2, 2022

26.0.1

Dec 26, 2021

26.0.0

Dec 26, 2021

25.0.4

May 16, 2019

This version

25.0.3

May 16, 2019

25.0.2

May 16, 2019

25.0.1

May 16, 2019

25.0.0

May 15, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mbdata-25.0.3.tar.gz (351.9 kB view hashes)

Uploaded May 16, 2019 Source

Hashes for mbdata-25.0.3.tar.gz

Hashes for mbdata-25.0.3.tar.gz
Algorithm	Hash digest
SHA256	`8b4e9925879759b5564a79a1734104676a9fd3a409b6113d4571d90a5b0df280`
MD5	`e519b3930b2e88d2540c4981b642e7b4`
BLAKE2b-256	`0110dc20bf65a75bfbbf1744009ca4f0e12dbe025974fc59e9058adf5c9eeeb0`