Skip to main content

Python function to construct a ZIP archive with stream processing - without having to store the entire ZIP in memory or disk

Project description

stream-zip CircleCI Test Coverage

Python function to construct a ZIP archive on the fly - without having to store the entire ZIP in memory or disk. This is useful in memory-constrained environments, or when you would like to start returning compressed data before you've even retrieved all the uncompressed data. Generating ZIPs on-demand in a web server is a typical use case for stream-zip.

Offers similar functionality to zipfly, but with a different API, and does not use Python's zipfile module under the hood.

To unZIP files on the fly try stream-unzip.

Installation

pip install stream-zip

Usage

from datetime import datetime
from stream_zip import ZIP64, ZIP, NO_COMPRESSION, stream_zip

def unzipped_files():
    modified_at = datetime.now()
    perms = 0o600

    def file_1_data():
        yield b'Some bytes'

    def file_2_data():
        yield b'Some bytes'

    def file_3_data():
        yield b'Some bytes'

    # ZIP64 mode
    yield 'my-file-1.txt', modified_at, perms, ZIP64, file_1_data()

    # ZIP mode
    yield 'my-file-1.txt', modified_at, perms, ZIP, file_2_data()

    # No compression
    yield 'my-file-2.txt', modified_at, perms, NO_COMPRESSION, file_3_data()

for zipped_chunk in stream_zip(unzipped_files()):
    print(zipped_chunk)

Limitations

It's not possible to completely stream-write ZIP files. Small bits of metadata for each member file, such as its name, must be placed at the end of the ZIP. In order to do this, stream-unzip buffers this metadata in memory until it can be output.

No compression is supported via the NO_COMPRESSION constant as in the above examples. However in this case the entire contents of these are buffered in memory, and so this should not be used for large files. This is because for uncompressed data, its size and CRC32 must be before it in the ZIP file.

It doesn't seem possible to automatically choose ZIP64 based on file sizes if streaming, since the specification of ZIP vs ZIP64 must be before the compressed data of each file in the final stream, and so before the sizes are known. Hence the onus is on client code to choose. ZIP has greater support but is limited to 4GiB (gibibyte), while ZIP64 has less support, but has a much greater limit of 16EiB (exbibyte). These limits apply to the compressed size of each member file, the uncompressed size of each member file, and to the size of the entire archive.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stream-zip-0.0.16.tar.gz (5.1 kB view hashes)

Uploaded Source

Built Distribution

stream_zip-0.0.16-py3-none-any.whl (5.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page