spectrify

Tools for working with Redshift Spectrum.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Project description

Spectrify

A simple yet powerful tool to move your data from Redshift to Redshift Spectrum.

Free software: MIT license
Documentation: https://spectrify.readthedocs.io.

Features

One-liners to:

Export a Redshift table to S3 (CSV)
Convert exported CSVs to Parquet files in parallel
Create the Spectrum table on your Redshift cluster
Perform all 3 steps in sequence, essentially “copying” a Redshift table Spectrum in one command.

S3 credentials are specified using boto3. See http://boto3.readthedocs.io/en/latest/guide/configuration.html

Redshift credentials are supplied via environment variables, command-line parameters, or interactive prompt.

Install

$ pip install spectrify

Command-line Usage

Export Redshift table my_table to a folder of CSV files on S3:

$ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb export my_table \
    's3://example-bucket/my_table'

Convert exported CSVs to Parquet:

$ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb convert my_table \
    's3://example-bucket/my_table'

Create Spectrum table from S3 folder:

$ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb create_table \
    's3://example-bucket/my_table' my_table my_spectrum_table

Transform Redshift table by performing all 3 steps in sequence:

$ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb transform my_table \
    's3://example-bucket/my_table'

Python Usage

Currently, you’ll have to supply your own SQL Alchemy engine to each of the below commands (pull requests welcome to make this eaiser).

Export to S3:

from spectrify.export import export_to_csv
export_to_csv(sa_engine, table_name, s3_csv_dir)

Convert exported CSVs to Parquet:

from spectrify.convert import convert_redshift_manifest_to_parquet
from spectrify.utils.schema import get_table_schema
sa_table = get_table_schema(sa_engine, source_table_name)
convert_redshift_manifest_to_parquet(s3_csv_manifest_path, sa_table, s3_spectrum_dir)

Create Spectrum table from S3 parquet folder:

from spectrify.create import create_external_table
from spectrify.utils.schema import get_table_schema
sa_table = get_table_schema(sa_engine, source_table_name)
create_external_table(sa_engine, dest_schema, dest_table_name, sa_table, s3_spectrum_path)

Transform Redshift table by performing all 3 steps in sequence:

from spectrify.transform import transform_table
transform_table(sa_engine, table_name, s3_base_path, dest_schema, dest_table, num_workers)

Contribute

Contributions always welcome! Read our guide on contributing here: http://spectrify.readthedocs.io/en/latest/contributing.html

License

History

0.3.0 (2017-10-30)

Support 16- and 32-bit integers
Packaging updates

0.2.1 (2017-09-27)

Fix Readme

0.2.0 (2017-09-27)

First release on PyPI.

0.1.0 (2017-09-13)

Didn’t even make it to PyPI.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

3.1.0

Jan 18, 2020

3.0.1

Nov 26, 2019

3.0.0

Nov 26, 2019

2.0.0

Mar 9, 2019

1.0.1

Jul 12, 2018

1.0.0

Apr 20, 2018

0.4.1

Mar 25, 2018

0.4.0

Feb 25, 2018

This version

0.3.0

Oct 31, 2017

0.2.1

Sep 27, 2017

0.2.0

Sep 27, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spectrify-0.3.0.tar.gz (20.8 kB view hashes)

Uploaded Oct 31, 2017 Source

Built Distribution

spectrify-0.3.0-py2.py3-none-any.whl (15.8 kB view hashes)

Uploaded Oct 31, 2017 Python 2 Python 3

Hashes for spectrify-0.3.0.tar.gz

Hashes for spectrify-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`aeb4fe485f3737247c85e57c68624cf4fca041a1584160c58acc76b2c0e9836c`
MD5	`08347a36e9782a87ca7b3603b950edcd`
BLAKE2b-256	`8ac788c213b8d7b98018991d114fe696c58635ff45dabcca8f5021798cc5eb1f`

Hashes for spectrify-0.3.0-py2.py3-none-any.whl

Hashes for spectrify-0.3.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`60b7afda308edefdcd8b6cc9df191f750f66ef1d715597e3f078ee46b97a51ac`
MD5	`4c2f8cb4641bc86c0abc4c4a39b14dbb`
BLAKE2b-256	`27c21a1962cae890a38d2add4c40268fdf39f6ec4ce65b8117d3c5be915e0921`