backalaika

Backup utility for small offices

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Backalaika is not a russian musical
instrument, it is a simple backup solution for small offices.

http://sourceforge.net/projects/backalaika/

Features
========

* zip, tar.gzip and tar.bz2 compression
* predefined backup profiles
* differential and incremental backups
* file filters
* file database with hashes
* media sizes
* multi-volume
* etc.

Requirements
============

Backalaika requires Python 2.6.x and these libraries:

* Unipath - http://pypi.python.org/pypi/Unipath
* SQLAlchemy - http://pypi.python.org/pypi/SQLAlchemy

Writing backup jobs
===================

You define backup jobs in small text files (actually small Python files).
They are simple. Here is what a backup job looks like:

job = dict(name='nandos-work',
sources = [
('/home/nando/2010', 'zip,separate_dirs'),
("/home/nando/.mozilla-thunderbird/", "bz2"),
# more sources here if you like...
]
)

The job name ('nandos-work') will be used as part of the
backup directory name.

*sources* is a list of tuples. Each tuple contains a directory
to be backed up and the corresponding options. Each directory will
end up in its own compressed file. What kind of compression?

* Zip if you say 'zip' in the options, or
* tar.gz if you say 'gz' in the options, or
* tar.bz2 if you say 'bz2' in the options.

You should prefer bz2 for very big archives.

Zip was originally limited to 2GB; now it is limited to 4GB but many
programs still only read zip files shorter than 2 GB.

bz2 compresses better than the others, therefore the default is bz2.

The directories are backed up recursively; in other words,
subfolders are included.

The *separate_dirs* option causes the *subfolders* of the specified
directory to be backed up -- each in its own compressed file.

You must separate the options with commas, in the options string.

If you are a Python programmer, note that in the job file you can have
other things happen too, in addition to defining the *job* global
variable. For instance, you could mount a smbfs share. If you define an *after_backup()* function, it is called before exiting.

To perform a backup, you specify one backup job file:

backalaika.py -j jobs/myjob.py

Backalaika then asks 3 questions on the console:

1. Only backup a file once
==========================

Now just let me explain that Backalaika can be used to maintain a
database containing information about all your backup media and files.

This database is able to uniquely identify a file (by its hash), so
when making a backup, the database can be looked up to prevent inclusion
of repeated (not modified) files.

The first question the program asks is about this.
Should we skip known files or not.

2. Minimum file modification date
=================================

Backalaika then asks what is the minimum file modification date.
This is another mechanism useful for incremental or differential backups.
For instance, suppose the last backup you did was on December 25th, 2008.
You might then answer:

2008-12-25

...causing only files modified later than that to be backed up.
If you want a *full* backup, just hit Enter for that question.

3. A brief comment
==================

Next, the program asks for a brief comment. You can type one or two
words, these will be added to the name of the directory containing
the backup.

The backup process
==================

Finally, the program presents information about the job and asks you
to press Enter to start.
(When you want to interrupt the program, you press CTRL-C.)

During backup, the program keeps writing the names of the files on
the screen. If some files cannot be accessed for some reason,
the error is shown.

Errors are collected and shown again *after* the backup is complete.

Main configuration
==================

You may write configuration in two places:
1) a ".backalaika" file in your home directory, and
2) in each job file itself. And the settings in here win over the others.

Here is the default configuration:

config = dict(
dir_backups = '~/backups',

# File extensions of files that should be stored, not recompressed:
store_only = 'pdf|ods|odt|jpg|jpeg|png|gif|' 'zip|rar|gz|tgz|bz2|jar|deb|rpm|cab' '|mpg|mpeg|avi|mov|flv|wmv|wma|mp3|ogg',

# Worthless files that shall NOT be backed up:
skip_files = '.DS_Store|.localized|Thumbs.db|*.pyc',

# How long can each archive volume be?
# max_size = 1024 * 1024 * 699 # CD size (untested),
max_size = 1024 * 1024 * 4350 # DVD size (untested),

# Database connection string
db_uri = 'sqlite:///backalaika.sqlite',
)

As you saw above, configuration consists of declaring a dictionary
containing the following keys:

* *dir_backups* is the directory where the backup files will be written.

* *store_only* is for zip archives, they can just store files as
opposed to compressing them. This is useful for files that are already
compressed. So in this variable you should write extensions of files
that cannot be further compressed. The extensions are separated by
a pipe character: |

* *skip_files* lets you specify file names that should not be backed up.
In this variable you may use the asterisk for wildcards.

* *max_size* is the desired archive length, for multiple volume backups.
For instance, you could set it to the size of a DVD (in bytes).
When performing a backup job, the program creates the next volume
whenever a file possibly wouldn't fit in the current volume.

* *db_uri* is a database connection string as supported by SQLAlchemy -
http://www.sqlalchemy.org

There are a couple more keys:

Volume sizes
============

You can add to the config dict a list of media sizes (for instance,
to fill some multisession CDs), in megabytes:

volume_sizes = [596, 396, 640, 2700, 4350, 700] # partially used CDs

The program first consumes all volume_sizes, then applies max_size.

Database URI
============

Backalaika uses SQLAlchemy to support various database backends such as
Postgres, MySQL and sqlite. The default is ~/backups/backalaika.sqlite.
To change this, create the database, then configure Backalaika.

Like any configuration, it can be done in the job file. But usually you
want to use a single database, so you create a ~/.backalaika file and
add to it your connection string like this:

config = dict(
db_uri = 'postgresql://user:password@localhost/database_name',
)

For more help creating a db_uri, consult SQLALchemy docs.

Command examples
================

Now try this command, it lists the command-line switches:

backalaika.py -h

After you burn your archives to DVDs, you can insert the media and have
Backalaika add all its files to the database:

backalaika.py -a 'MyBackupMedia001' -d 'Description here'

The program enters zip, tar.gz and tar.bz2 files, as if they were
directories, adding THEIR contents to the database.

It hashes files by default. You can turn hashing off with -n:

backalaika.py -n -a 'AnotherMedia002' -d 'Videos'

This lists all volume titles in the database:

backalaika.py -v

You can view the files of a specific volume:

backalaika.py -f 'AnotherMedia002'

...or all volumes:

backalaika.py -f ''

Use -r to remove a volume and its contents from the database:

backalaika.py -r 'AnotherMedia002'

Please give feedback
====================

Backalaika is a work in progress. Do send us feature requests etc.