Efficiently download HIBP new pwned password data by hash-prefix for a local-copy
Project description
hibp-downloader
This is a CLI tool to efficiently download a local copy of the pwned password hash data from the very awesome HIBP pwned passwords api-endpoint using multiprocessing, async-processes, local-caching, content-etags and http2-connection pooling to make things as fast as (seems) Pythonly possible.
Features
- Only download hash-prefix content blocks when the hash-prefix block content has changed.
- Start, stop and re-start the data-collection process without loss of data already collected.
- Ability to query clear text values and return results from the pwned password data set.
- Generate a single text file with pwned password hash values in-order, similar to PwnedPasswordsDownloader from the HIBP team.
- Per prefix file metadata in JSON format for easy data reuse.
Install
pip install --upgrade hibp-downloader
Usage
Performance
Sample download activity log; host with 12 cores on 45Mbit/s DSL connection.
2023-07-31T03:22:45+1000 | INFO | hibp-downloader | prefix=e585f source=[lc:265201 et:0 rc:722148 ro:3 xx:0] runtime_rate=[11.2MBit/s 86req/s ~71005H/s] runtime=2.33hr download=11748.0MB
2023-07-31T03:22:48+1000 | INFO | hibp-downloader | prefix=e5877 source=[lc:265201 et:0 rc:722268 ro:3 xx:0] runtime_rate=[11.2MBit/s 86req/s ~70998H/s] runtime=2.33hr download=11750.0MB
2023-07-31T03:22:50+1000 | INFO | hibp-downloader | prefix=f5837 source=[lc:265201 et:0 rc:722388 ro:3 xx:0] runtime_rate=[11.2MBit/s 86req/s ~70992H/s] runtime=2.33hr download=11751.9MB
- 86 requests per second to
api.pwnedpasswords.com
- 265,201 prefix files from (
lc
) local-cache; 722,388 from (rc
) remote-cache; 3 from (ro
) remote-origin; 0 failed (xx
) download - estimated ~70k hash values downloaded per second
- 11.5GB (11,751MB) downloaded in 2.3 hours (full dataset is ~3.5 hours)
Project
- Github - github.com/threatpatrols/hibp-downloader
- PyPI - pypi.org/project/hibp-downloader/
- ReadTheDocs - hibp-downloader.readthedocs.io
Copyright
- Copyright © 2023 Threat Patrols Pty Ltd
- Copyright © 2023 Nicholas de Jong
All rights reserved.
License
- BSD-3-Clause - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hibp_downloader-0.1.4.tar.gz
(18.5 kB
view hashes)
Built Distribution
Close
Hashes for hibp_downloader-0.1.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64a511bbebb31f846963d5c66d381d951924b4d5e137e48803603d17796dd15f |
|
MD5 | df5e40e993e64df2b8a5ec9989f54393 |
|
BLAKE2b-256 | 0f0aeb2b5155304e12f391993e1eeb89c2eec0874d390efc718fb9e0618596ac |