Skip to main content

Copy a file in the most efficient way possible while generating a SHA256 hash of the data

Project description

hashcopy

This module contains one class, HashCopier, which will copy data from an input file to an output file with minimal memory copying, while computing a SHA256 hash of the data. It can also be used without an output file if you just want the hash.

This module works by mapping the entire source file into memory using mmap, then using madvise to tell the system that we will be reading sequentially. When .update() is called, it will hash a certain amount of data (default: 4MB) directly from the memory mapping, then (if an output file descriptor was passed). it will call write to write that data to the output file. Then, it will call madvise(..., MADV_DONTNEED) to tell the system that we no longer need this data, reducing the number of resident pages.

Because this module depends on mapping the entire source file, it will likely fail on 32-bit systems if the size of the file exceeds the usable address space.

Installation

Example

from pathlib import Path
from hashcopy import HashCopier

with Path('hashcopy.c').open('rb') as inputfp, Path('output.c').open('wb') as outputfp:
    with HashCopier(inputfp.fileno(), outputfp.fileno()) as hasher:
        while (bytes_copied := hasher.update()) > 0:
            print(f'hashed {bytes_copied} bytes')
        print(f'hash result = {hasher.finalize().hex()}')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hashcopy-1.0.4.tar.gz (5.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page