Skip to main content

Crytographically secure file compression.

Project description

CI

PZip

PZip is an encrypted file format (with optional gzip compression), a command-line tool, and a Python file-like interface.

Installation

pip install pzip

Command Line Usage

For a full list of options, run pzip -h. Basic usage is summarized below:

pzip --key keyfile sensitive_data.csv
pzip --key keyfile sensitive_data.csv.pz

Piping and outputting to stdout is also supported:

tar cf - somedir | pzip -z --key keyfile -o somedir.pz
pzip --key keyfile -c somedir.pz | tar xf -

PZip will generate an encryption key automatically, if you want:

pzip -a sensitive_data.csv
encrypting with password: 7xRLoyHgK6J2-4mUkT3JoklSyfSYxHb1EkMABjasnUc

pzip -p 7xRLoyHgK6J2-4mUkT3JoklSyfSYxHb1EkMABjasnUc sensitive_data.csv.pz

Python Usage

import os
from pzip import PZip

key = os.urandom(32)

with PZip("myfile.pz", PZip.Mode.ENCRYPT, key) as f:
    f.write(b"sensitive data")

with PZip("myfile.pz", PZip.Mode.DECRYPT, key) as f:
    print(f.read())

For on-the-fly/streaming encryption, or writing to non-seekable files, you may pass in the length of the plaintext that will be written in the PZip header. Alternately, if you don't wish to store the plaintext length in the header for privacy reasons, you can pass size=0.

plaintext = b"hello world"
with PZip(streaming_response, "wb", key, size=len(plaintext)) as f:
    f.write(plaintext)

Encryption

PZip uses AES-GCM with 128-, 192-, or 256-bit (default) keys. Keys are derived using PBKDF2-SHA256 with a configurable iteration count (currently 200,000) and a random salt per file. A random 96-bit nonce (GCM IV) is generated by default for each file, but may also be supplied via the Python interface for systems that can more strongly guarantee uniqueness. The key size, nonce size, iteration count, salt, and nonce/IV are stored in the PZip file header. Additionally, the nonce is prepended to the file contents when encrypting as a way to fail fast when doing streaming decryption. The decrypted plaintext will still be authenticated via the tag at the end, but a fail-fast mechanism is important when dealing with large files. The authentication tag is appended after the ciphertext in order to make this format suitable for on-the-fly streaming encryption.

Compression

PZip optionally compresses data using gzip at the default compression level. Nothing about the file format precludes adding an option in the future to allow conifguration of the comprssion level, or even the compression algorithm.

File Format

The PZip file format consists of a 36-byte header, followed by a variable-size nonce in plaintext, immediately followed by the same nonce encrypted. The remainder of the file is encrypted data, except for the last 16 bytes, which are the AES-GCM authentication tag data. The header is big/network endian, with the following fields/sizes:

  • File identification (magic), 4 bytes - PZIP
  • File format version, 1 byte - currently \x01
  • Flags, 1 byte - currently only bit 0 is set when the file data is gzip-compressed
  • AES key size (in bytes), 1 byte - must be 16, 24, or 32
  • GCM nonce size (in bytes), 1 byte - 12 by default, may be larger
  • PBKDF2 iterations (4 bytes, unsigned int/long)
  • PBKDF2 salt (16 bytes)
  • Plaintext length (8 bytes, unsigned long long) - optional, may be set to 0

Below is an example of a PZip file containing the plaintext "hello world", encrypted with a key derived from the string "pzip", with no compression (for readability). The portion sectioned off in double bars (===) is encrypted.

+-------------------------------------------------+------+-------------+------------------------+
| Bytes                                           | Size | Value       | Description            |
+-------------------------------------------------+------+-------------+------------------------+
| 50 5A 49 50                                     | 4    | PZIP        | File identification    |
| 01                                              | 1    | 1           | Version                |
| 00                                              | 1    | 0           | Flags                  |
| 20                                              | 1    | 32          | AES key size in bytes  |
| 0C                                              | 1    | 12          | Nonce size in bytes    |
| 00 03 0d 40                                     | 4    | 200000      | PBKDF2 iterations      |
| AD 46 72 0C 70 00 FF CC 20 97 10 5B 10 D4 0B B8 | 16   | <salt>      | PBKDF2 salt            |
| 00 00 00 00 00 00 00 0B                         | 8    | 11          | Plaintext length       |
+-------------------------------------------------+------+-------------+------------------------+
| B2 4F DD E3 FF 21 A8 09 3E 0C 1C 3E             | 12   | <nonce>     | Nonce (unencrypted)    |
+=================================================+======+=============+========================+
| 8B EB 12 D4 81 AD 6B 47 B0 0F 74 70             | 12   | <nonce>     | Nonce (encrypted)      |
| 8E A1 96 74 A9 51 31 47 B9 5C A2                | 11   | hello world | Ciphertext             |
+=================================================+======+=============+========================+
| 12 58 A6 8B ED F1 A9 08 47 3A 10 BC B6 1E 28 24 | 16   | <tag>       | GCM authentication tag |
+-------------------------------------------------+------+-------------+------------------------+

You can verify the above example in Python:

>>> import binascii, io, pzip
>>> data = binascii.unhexlify(
...     '505A49500100200C00030d40AD46720C7000FFCC2097105B10D40BB8000000000000000BB24FDDE3FF21A80'
...     '93E0C1C3E8BEB12D481AD6B47B00F74708EA19674A9513147B95CA21258A68BEDF1A908473A10BCB61E2824'
... )
>>> pzip.PZip(io.BytesIO(data), "rb", b"pzip").read()
b'hello world'

FAQ

Why does this exist?

Nothing PZip does couldn't be done by chaining together existing tools - compressing with gzip, deriving a key and encrypting with openssl, generating a MAC (if not using GCM), etc. But at that point, you're probably writing a script to automate the process, tacking on bits of data here and there (or writing multiple files). PZip simply wraps that in a nice package and documents a file format. Plus having a Python interface you can pretty much treat as a file is super nice.

Why not store filename?

Storing the original filename has a number of security implications, both technical and otherwise. At a technical level, PZip would need to ensure safe filename handling across all platforms with regards to path delimiters, encodings, etc. Additionally, PZip was designed for a system where user-generated file attachments may contain sensitive information in the filenames themselves. In reality, having a stored filename is of minimal use anyway, since the default behavior is to append and remove a .pz suffix when encrypting/decrypting. If a .pz file was renamed, you would have a conflict that would likely be resolved by using the actual filename (not the stored filename) anyway.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

pzip-0.9.6-py3-none-any.whl (11.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page