A python-facing API for creating and interacting with ZIM files
Project description
python-libzim
libzim
module allows you to read and write ZIM
files in Python. It provides a shallow python
interface on top of the C++ libzim
library.
It is primarily used in openZIM scrapers like sotoki
or youtube2zim
.
Installation
pip install libzim
Our PyPI wheels bundle a recent release of the C++ libzim and are available for the following platforms:
- macOS for
x86_64
andarm64
- GNU/Linux for
x86_64
,armhf
andaarch64
- Linux+musl for
x86_64
andaarch64
Wheels are available for both CPython and PyPy.
Users on other platforms can install the source distribution (see Building below).
Contributions
git clone git@github.com:openzim/python-libzim.git && cd python-libzim
# python -m venv env && source env/bin/activate
pip install -U setuptools invoke
invoke download-libzim install-dev build-ext test
# invoke --list for available development helpers
See CONTRIBUTING.md for additional details then Open a ticket or submit a Pull Request on Github 🤗!
Usage
Read a ZIM file
from libzim.reader import Archive
from libzim.search import Query, Searcher
from libzim.suggestion import SuggestionSearcher
zim = Archive("test.zim")
print(f"Main entry is at {zim.main_entry.get_item().path}")
entry = zim.get_entry_by_path("home/fr")
print(f"Entry {entry.title} at {entry.path} is {entry.get_item().size}b.")
print(bytes(entry.get_item().content).decode("UTF-8"))
# searching using full-text index
search_string = "Welcome"
query = Query().set_query(search_string)
searcher = Searcher(zim)
search = searcher.search(query)
search_count = search.getEstimatedMatches()
print(f"there are {search_count} matches for {search_string}")
print(list(search.getResults(0, search_count)))
# accessing suggestions
search_string = "kiwix"
suggestion_searcher = SuggestionSearcher(zim)
suggestion = suggestion_searcher.suggest(search_string)
suggestion_count = suggestion.getEstimatedMatches()
print(f"there are {suggestion_count} matches for {search_string}")
print(list(suggestion.getResults(0, suggestion_count)))
Write a ZIM file
from libzim.writer import Creator, Item, StringProvider, FileProvider, Hint
class MyItem(Item):
def __init__(self, title, path, content = "", fpath = None):
super().__init__()
self.path = path
self.title = title
self.content = content
self.fpath = fpath
def get_path(self):
return self.path
def get_title(self):
return self.title
def get_mimetype(self):
return "text/html"
def get_contentprovider(self):
if self.fpath is not None:
return FileProvider(self.fpath)
return StringProvider(self.content)
def get_hints(self):
return {Hint.FRONT_ARTICLE: True}
content = """<html><head><meta charset="UTF-8"><title>Web Page Title</title></head>
<body><h1>Welcome to this ZIM</h1><p>Kiwix</p></body></html>"""
item = MyItem("Hello Kiwix", "home", content)
item2 = MyItem("Bonjour Kiwix", "home/fr", None, "home-fr.html")
with Creator("test.zim").config_indexing(True, "eng") as creator:
creator.set_mainpath("home")
creator.add_item(item)
creator.add_item(item2)
for name, value in {
"creator": "python-libzim",
"description": "Created in python",
"name": "my-zim",
"publisher": "You",
"title": "Test ZIM",
}.items():
creator.add_metadata(name.title(), value)
Building
libzim
package building offers different behaviors via environment variables
Variable | Example | Use case |
---|---|---|
LIBZIM_DL_VERSION |
8.1.1 or 2023-04-14 |
Specify the C++ libzim binary version to download and bundle. Either a release version string or a date, in which case it downloads a nightly |
USE_SYSTEM_LIBZIM |
1 |
Uses LDFLAG and CFLAGS to find the libzim to link against. Resulting wheel won't bundle C++ libzim. |
DONT_DOWNLOAD_LIBZIM |
1 |
Disable downloading of C++ libzim. Place headers in include/ and libzim dylib/so in libzim/ if no using system libzim. It will be bundled in wheel. |
PROFILE |
1 |
Enable profile tracing in Cython extension. Required for Cython code coverage reporting. |
SIGN_APPLE |
1 |
Set to sign and notarize the extension for macOS. Requires following informations |
APPLE_SIGNING_IDENTITY |
Developer ID Application: OrgName (ID) |
Required for signing on macOS |
APPLE_SIGNING_KEYCHAIN_PATH |
/tmp/build.keychain |
Path to the Keychain containing the certificate to sign for macOS with |
APPLE_SIGNING_KEYCHAIN_PROFILE |
build |
Name of the profile in the specified Keychain |
Examples
Default: downloading and bundling most appropriate libzim release binary
python3 -m build
Using system libzim (brew, debian or manually installed) - not bundled
# using system-installed C++ libzim
brew install libzim # macOS
apt-get install libzim-devel # debian
dnf install libzim-dev # fedora
USE_SYSTEM_LIBZIM=1 python3 -m build --wheel
# using a specific C++ libzim
USE_SYSTEM_LIBZIM=1 \
CFLAGS="-I/usr/local/include" \
LDFLAGS="-L/usr/local/lib"
DYLD_LIBRARY_PATH="/usr/local/lib" \
LD_LIBRARY_PATH="/usr/local/lib" \
python3 -m build --wheel
Other platforms
On platforms for which there is no official binary available, you'd have to compile C++ libzim from source first then either use DONT_DOWNLOAD_LIBZIM
or USE_SYSTEM_LIBZIM
.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for libzim-3.1.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3685d79118a1e019003efde7689bf0f3ce75af76475a9b0ce2147f9ccd79a3df |
|
MD5 | bcffba421bf28af99f2664853c3167c2 |
|
BLAKE2b-256 | 80d24f5d770766d0ad319eaeccb7b9f3d92b37f99ef71eba5e90429c734f4856 |
Hashes for libzim-3.1.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 31fd9878e054f5bd7e91213af0ce96cc4440dca1d68803a9f75e5a6bceea924a |
|
MD5 | 79e739814fbbe758406cc007738d6bfa |
|
BLAKE2b-256 | 049d984d5555adb43defc48a54cf6f2461c84347c3f724bce2c6c75e8dc4a22d |
Hashes for libzim-3.1.0-cp311-cp311-macosx_12_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0dc8401439de0cd63e23c94fef7ded69bed1cf872dc061592db9bf252ba85ec4 |
|
MD5 | eff1fc6fb33fb3d35a43b4447db1cb0e |
|
BLAKE2b-256 | 7cd1826a8492a83b3685494cc3a98c7124424be707f11b79876c259583099653 |
Hashes for libzim-3.1.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 01f21a2c0651a797c2fa397a62424bd1539f01ea99028f9bbd0328cd3a1a0a98 |
|
MD5 | 75cc4b748856d74193a23c6d732f8e4f |
|
BLAKE2b-256 | 20c26b31dde01b13abbb9798d7ad88c4e74b0b3c1fbe2fdbe32b957b598abfdf |
Hashes for libzim-3.1.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33277e61affae69bfe52f4d7a7179c9a9af891114e2ec8e69dcb9e7cd6ba14ad |
|
MD5 | 64dfb0017bc2e37ef687316db68b8eae |
|
BLAKE2b-256 | 70f86fada4b46c235ab7dcbd6a0e6e98f51cbed36efe50f215cb62ce275d81c3 |
Hashes for libzim-3.1.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 600b00fb92ad521cadf701d0cdbbda4a38c8e4ce684d127c5c8ff84c08c45142 |
|
MD5 | 9501e61de70ff7209d419d4f88edf3a0 |
|
BLAKE2b-256 | ff1180d834e45226cd123b912ff56552112b728bd6ade78a0e3b9c9c6f910b9b |
Hashes for libzim-3.1.0-cp310-cp310-macosx_12_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aac6e8dd30ed48c0bf6f7c4c5802922ff231e21f50120d25e93b2e6fc7b3e746 |
|
MD5 | e1680cb4f338c2193bc9e2d471037ca2 |
|
BLAKE2b-256 | 955107c04bc8e0f57294d23ca8c04fffbc50b273a110496902ecb6e279e3ca02 |
Hashes for libzim-3.1.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99e50269a3e7f50bfc272cfcbe90931f0891c0665990c8ac2303611bce11ae38 |
|
MD5 | 0e52c44eee69ac0058e6c23de6ab6cb1 |
|
BLAKE2b-256 | 0de288fe720e96d5f53bfedfcf4672df61f8db561c13e30038a361e45ebd011e |
Hashes for libzim-3.1.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c19aeb8bc6c8a39c588e4b615e60efe23e3c6402a53026fa90317fece56ccbca |
|
MD5 | c258bfbd6af91c3ff8f12b3844c9eb46 |
|
BLAKE2b-256 | 83908288e2a7235d76be49740a31d86b499751b186fd9e1ecd0c7cd2230c58a8 |
Hashes for libzim-3.1.0-cp39-cp39-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e78a394e4f4aebebc9b12ecaf7f06598899cac1b89e297abf41d713c13b6d367 |
|
MD5 | dc3e1bb6843dfe8da9b67e1c0aa4d97b |
|
BLAKE2b-256 | 3429cc321dae304fffd58ea2577fae7ab31a9cae60b2481b8c26ae885610d20c |
Hashes for libzim-3.1.0-cp39-cp39-macosx_12_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d56d29ea4b5764be04951b7e88b4860b92f59895d4479cf5e30493ecb4ef1923 |
|
MD5 | dffaaf26a28050f7fdc8e527cb1525bd |
|
BLAKE2b-256 | 1eb0f5239491e2f8c4f1eeb4fe43ab8e8b45e5c818d1a20754387a9c0ff27ff6 |
Hashes for libzim-3.1.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 224e528eb28cee90859f3672699867d33e3551a6626408f77de3943a2573351a |
|
MD5 | 829d73215db53c40d206a266ee31d867 |
|
BLAKE2b-256 | 3ed090a427d1a009e339d60536c4df9ee523ca3aed65acd0762d75165fb79fa6 |
Hashes for libzim-3.1.0-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7966ed780dbea3c618bfad81fa9886376d84f9f560610d2953d392732a08f23 |
|
MD5 | dcc32b2e6dd75b3f3b4103035ef6acca |
|
BLAKE2b-256 | 1f073e657fd7ba079da26a01259d54d6e62af96f786d8b537ea95679939ecc9b |
Hashes for libzim-3.1.0-cp38-cp38-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 03b9c32646c62cfcf1f2221029e70ddfcff0f04701b2ca6ba28411bcdba13a4a |
|
MD5 | 4c3833e8e5e4772d4883ac7ff8d6fc18 |
|
BLAKE2b-256 | 94ec75403b28af804a3cfd06892e1ac7ac2daef0b0833e4dd6a4c19d20431e98 |
Hashes for libzim-3.1.0-cp38-cp38-macosx_12_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 14b175ac4a497e1fe566ca47afea2e3059f5f60a472427d3e95ded0dd9305f3d |
|
MD5 | 55141edfc25689229e6905c2cd6d92d4 |
|
BLAKE2b-256 | 89b09778652ed31ff5c2f92448548a4f264f7bdba7d84d79b55ceacb6868e2bc |
Hashes for libzim-3.1.0-cp38-cp38-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c763661668b18f5c602d0bc9a6d7d4800d630bf041f53d79c8dc283ee4039f3e |
|
MD5 | fc6787a0050f4a3449691b403ca34570 |
|
BLAKE2b-256 | cdefd4f7a32bb8ed1e43cc96273734df4f0d8a523bf705778d6eb5d9a75f1fd6 |
Hashes for libzim-3.1.0-cp37-cp37m-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c26766621225408e2361c832355715620145f8d87a9bcae4fcfd42900834ca0 |
|
MD5 | 4075efc84c4bc65597ace67162868b4c |
|
BLAKE2b-256 | b6be541e213797867765846809af3138a550538acfa5fe028058506acdd83cae |
Hashes for libzim-3.1.0-cp37-cp37m-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c1b30d83b124a7426f7893dd53c7bd0c0cea0d09d1a8b3489d6a4cf3b763ead8 |
|
MD5 | af0d0befdfdaffea093dd086cd897a97 |
|
BLAKE2b-256 | f325af9b2c60d3d62e1ce1d927b1efb452d637a4e55dc2bdcead9a5bb731249a |
Hashes for libzim-3.1.0-cp37-cp37m-macosx_12_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a2c8c1381c7883ed08fcf46305f6d32770b80f4e31efb3bd0c38356c0f931f47 |
|
MD5 | dde7694c6a2d4130f29c35d5ec4eef2b |
|
BLAKE2b-256 | aa241845479005ae604fadd08c21dd385798c4fc8796698b197b132f2ba5373a |