Skip to main content

An implementation of Roger Sayle's SmiZip algorithm for compressing short strings

Project description

SmiZip is a compression method for short strings. It was developed by Roger Sayle in 1998 while at Metaphorics LLC to compress SMILES strings.

This repo is an implementation in Python of the SmiZip algorithm as described by Roger in a presentation in 2001.

Quick start

Install as follows:

pip install smizip

Let’s compress a .smi file that originated with RDKit:

python3 scripts/compress.py

SMILES strings must be encoded and decoded with the same n-grams. These are listed in a JSON file. Several JSON files are included by default but you can create your own by training on a dataset (find_best_ngrams.py), or by modifying existing ones (add_char_to_json.py).

Let’s

This codebase was developed by Noel O’Boyle based on the information in that presentation.`

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smizip-1.0.tar.gz (7.6 kB view hashes)

Uploaded Source

Built Distribution

smizip-1.0-py3-none-any.whl (9.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page