Skip to main content

Crytographically secure file compression.

Project description

CI

PZip

PZip is an encrypted file format (with optional gzip compression), a command-line tool, and a Python file-like interface.

Installation

pip install pzip

Command Line Usage

For a full list of options, run pzip -h. Basic usage is summarized below:

pzip --key keyfile sensitive_data.csv
pzip --key keyfile sensitive_data.csv.pz

Piping and outputting to stdout is also supported:

tar cf - somedir | pzip -z --key keyfile -o somedir.pz
pzip --key keyfile -c somedir.pz | tar xf -

PZip will generate an encryption key automatically, if you want:

pzip -a sensitive_data.csv
encrypting with password: 7xRLoyHgK6J2-4mUkT3JoklSyfSYxHb1EkMABjasnUc

pzip -p 7xRLoyHgK6J2-4mUkT3JoklSyfSYxHb1EkMABjasnUc sensitive_data.csv.pz

Python Usage

import os
from pzip import PZip

key = os.urandom(32)

with PZip("myfile.pz", PZip.Mode.ENCRYPT, key) as f:
    f.write(b"sensitive data")

with PZip("myfile.pz", PZip.Mode.DECRYPT, key) as f:
    print(f.read())

Encryption

PZip uses AES-GCM with 128-, 192-, or 256-bit (default) keys. Keys are derived using PBKDF2-SHA256 with a configurable iteration count (currently 200,000) and a random salt per file. A random 128-bit nonce (GCM IV) is generated by default for each file, but may also be supplied via the Python interface for systems that can more strongly guarantee uniqueness. The key size, iteration count, salt, nonce/IV, and GCM authentication tag are stored in the PZip file header. Additionally, the 128-bit nonce is prepended to the file contents when encrypting as a way to fail fast when doing streaming decryption. The decrypted plaintext will still be authenticated via the tag at the end, but a fail-fast mechanism is important when dealing with large files.

Compression

PZip optionally compresses data using gzip at the default compression level. Nothing about the file format precludes adding an option in the future to allow conifguration of the comprssion level, or even the compression algorithm.

File Format

The PZip file format consists of a 68-byte header, followed by the encrypted file data, the first 16 bytes of which are the nonce repeated. The header is big/network endian, with the following fields/sizes:

  • File identification (magic), 4 bytes - PZIP
  • File format version, 1 byte - currently \x01
  • Flags, 2 bytes (unsigned short bitfield) - currently only bit 0 is set when the file data is gzip-compressed
  • AES key size (in bytes), 1 byte - must be 16, 24, or 32
  • Plaintext size, 8 bytes (unsigned long long) - unencrypted/decompressed file size
  • PBKDF2 iterations (4 bytes, unsigned int/long)
  • PBKDF2 salt (16 bytes)
  • GCM nonce/IV (16 bytes)
  • GCM authentication tag (16 bytes)

FAQ

Why does this exist?

Nothing PZip does couldn't be done by chaining together existing tools - compressing with gzip, deriving a key and encrypting with openssl, generating a MAC (if not using GCM), etc. But at that point, you're probably writing a script to automate the process, tacking on bits of data here and there (or writing multiple files). PZip simply wraps that in a nice package and documents a file format. Plus having a Python interface you can pretty much treat as a file is super nice.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

pzip-0.9.3-py3-none-any.whl (8.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page