A python-facing API for creating and interacting with ZIM files
Project description
python-libzim
libzim
module allows you to read and write ZIM
files in Python. It provides a shallow python
interface on top of the C++ libzim
library.
It is primarily used in openZIM scrapers like sotoki
or youtube2zim
.
Installation
pip install libzim
Our PyPI wheels bundle a recent release of the C++ libzim and are available for the following platforms:
- macOS for
x86_64
andarm64
- GNU/Linux for
x86_64
,armhf
andaarch64
- Linux+musl for
x86_64
andaarch64
Wheels are available for both CPython and PyPy.
Users on other platforms can install the source distribution (see Building below).
Contributions
git clone git@github.com:openzim/python-libzim.git && cd python-libzim
# python -m venv env && source env/bin/activate
pip install -U setuptools invoke
invoke download-libzim install-dev build-ext test
# invoke --list for available development helpers
See CONTRIBUTING.md for additional details then Open a ticket or submit a Pull Request on Github 🤗!
Usage
Read a ZIM file
from libzim.reader import Archive
from libzim.search import Query, Searcher
from libzim.suggestion import SuggestionSearcher
zim = Archive("test.zim")
print(f"Main entry is at {zim.main_entry.get_item().path}")
entry = zim.get_entry_by_path("home/fr")
print(f"Entry {entry.title} at {entry.path} is {entry.get_item().size}b.")
print(bytes(entry.get_item().content).decode("UTF-8"))
# searching using full-text index
search_string = "Welcome"
query = Query().set_query(search_string)
searcher = Searcher(zim)
search = searcher.search(query)
search_count = search.getEstimatedMatches()
print(f"there are {search_count} matches for {search_string}")
print(list(search.getResults(0, search_count)))
# accessing suggestions
search_string = "kiwix"
suggestion_searcher = SuggestionSearcher(zim)
suggestion = suggestion_searcher.suggest(search_string)
suggestion_count = suggestion.getEstimatedMatches()
print(f"there are {suggestion_count} matches for {search_string}")
print(list(suggestion.getResults(0, suggestion_count)))
Write a ZIM file
from libzim.writer import Creator, Item, StringProvider, FileProvider, Hint
class MyItem(Item):
def __init__(self, title, path, content = "", fpath = None):
super().__init__()
self.path = path
self.title = title
self.content = content
self.fpath = fpath
def get_path(self):
return self.path
def get_title(self):
return self.title
def get_mimetype(self):
return "text/html"
def get_contentprovider(self):
if self.fpath is not None:
return FileProvider(self.fpath)
return StringProvider(self.content)
def get_hints(self):
return {Hint.FRONT_ARTICLE: True}
content = """<html><head><meta charset="UTF-8"><title>Web Page Title</title></head>
<body><h1>Welcome to this ZIM</h1><p>Kiwix</p></body></html>"""
item = MyItem("Hello Kiwix", "home", content)
item2 = MyItem("Bonjour Kiwix", "home/fr", None, "home-fr.html")
with Creator("test.zim").config_indexing(True, "eng") as creator:
creator.set_mainpath("home")
creator.add_item(item)
creator.add_item(item2)
for name, value in {
"creator": "python-libzim",
"description": "Created in python",
"name": "my-zim",
"publisher": "You",
"title": "Test ZIM",
}.items():
creator.add_metadata(name.title(), value)
Building
libzim
package building offers different behaviors via environment variables
Variable | Example | Use case |
---|---|---|
LIBZIM_DL_VERSION |
8.1.1 or 2023-04-14 |
Specify the C++ libzim binary version to download and bundle. Either a release version string or a date, in which case it downloads a nightly |
USE_SYSTEM_LIBZIM |
1 |
Uses LDFLAG and CFLAGS to find the libzim to link against. Resulting wheel won't bundle C++ libzim. |
DONT_DOWNLOAD_LIBZIM |
1 |
Disable downloading of C++ libzim. Place headers in include/ and libzim dylib/so in libzim/ if no using system libzim. It will be bundled in wheel. |
PROFILE |
1 |
Enable profile tracing in Cython extension. Required for Cython code coverage reporting. |
SIGN_APPLE |
1 |
Set to sign and notarize the extension for macOS. Requires following informations |
APPLE_SIGNING_IDENTITY |
Developer ID Application: OrgName (ID) |
Required for signing on macOS |
APPLE_SIGNING_KEYCHAIN_PATH |
/tmp/build.keychain |
Path to the Keychain containing the certificate to sign for macOS with |
APPLE_SIGNING_KEYCHAIN_PROFILE |
build |
Name of the profile in the specified Keychain |
Examples
Default: downloading and bundling most appropriate libzim release binary
python3 -m build
Using system libzim (brew, debian or manually installed) - not bundled
# using system-installed C++ libzim
brew install libzim # macOS
apt-get install libzim-devel # debian
dnf install libzim-dev # fedora
USE_SYSTEM_LIBZIM=1 python3 -m build --wheel
# using a specific C++ libzim
USE_SYSTEM_LIBZIM=1 \
CFLAGS="-I/usr/local/include" \
LDFLAGS="-L/usr/local/lib"
DYLD_LIBRARY_PATH="/usr/local/lib" \
LD_LIBRARY_PATH="/usr/local/lib" \
python3 -m build --wheel
Other platforms
On platforms for which there is no official binary available, you'd have to compile C++ libzim from source first then either use DONT_DOWNLOAD_LIBZIM
or USE_SYSTEM_LIBZIM
.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for libzim-3.3.0.post0-cp312-cp312-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 97093965a3964eaa517d30a29711351ca32686eada8e0057a88b12389d6660be |
|
MD5 | 81b57b0832a6873d0703284d2aa7187a |
|
BLAKE2b-256 | 2841a8e06fd2b883e70f7e8c18183f81114f2efacacf1a988811528a131ad616 |
Hashes for libzim-3.3.0.post0-cp312-cp312-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 26f1ee1e31ce0bb5ac0a8d6aa87ce27a0e4940359886f9f99a347e04b2a8d582 |
|
MD5 | 3d96a03a82be0bf91d4b1ae3301d9e5a |
|
BLAKE2b-256 | cb0b76c943c8cccb3d43e324ae0fa3375786305ca849ebf811ce5b6b33eb600c |
Hashes for libzim-3.3.0.post0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b60b91c769a8f7a10baaa00c107b561bcb1401dd217d1526b6448aa0c758b05d |
|
MD5 | f2ec927242049e1803e6b28e17c3c36c |
|
BLAKE2b-256 | b6a18cb75f5d2915656eca1f20d533d1465bd937720ed4e2c9b7e4947ec55005 |
Hashes for libzim-3.3.0.post0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2589f1a5843ef7ae03e7b59faf882abdffa30c16f36a012ba04409fb106a2154 |
|
MD5 | 2572b8ad05c0cde953e834e7376bf53f |
|
BLAKE2b-256 | 0192f4201d50af899e5c77d0fe985978b310a4da3943ba92a4a8413e4bdb204b |
Hashes for libzim-3.3.0.post0-cp312-cp312-macosx_13_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cbe8090f70cd526408e11f7cda59f426dac89ffbc3124f2f34039c02c7f8e9fc |
|
MD5 | 6d881e6eff0c18051714b8f9432a2fb3 |
|
BLAKE2b-256 | 2af9ae70a447167746c0c81bfbf1c65c7b7343a288d1062d8c0e7fb87b4626ec |
Hashes for libzim-3.3.0.post0-cp312-cp312-macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46f7e6839488becb00df1ef0ac0eefe80a7c2a3dca8186f3b22872f65d541a6d |
|
MD5 | 6dbdaa72d038447b5bd83e159ad96234 |
|
BLAKE2b-256 | fe72320c6914399a4851abcfc1d910a1d7b00f48a21cf76b717a2ddb0a35205f |
Hashes for libzim-3.3.0.post0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | de90fd96edb6a467ff4bd8e84f02d97f584fbc1429b1c7bb48da55ec6110f8ec |
|
MD5 | 7673596157666c77d4fb3ebde1ee72f6 |
|
BLAKE2b-256 | f78290a6db04d39077ce10879292c3ff146ad4438bf9658fdfff80325d1437db |
Hashes for libzim-3.3.0.post0-cp311-cp311-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8569e7497e56f4b7bde798d8acca7da9f27a1a85ae7b7426ec5dfca38a6f06ec |
|
MD5 | f6bdd3727f6d00a463d41b6c8e837217 |
|
BLAKE2b-256 | 55fa4f0dd9f44e4ebebbce741ee4725b4a9ed31e053561a7356f3dbab007cdff |
Hashes for libzim-3.3.0.post0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49e994627e8543b2631a2eaef673dceee514d67623d98386c1197fb880b68e83 |
|
MD5 | 348cc22c742e2c0edf3c7cbcf4c31ce9 |
|
BLAKE2b-256 | 9fb33678425106b29cb488c2c14889e608b464aa10a5678f738b09c814b7b9af |
Hashes for libzim-3.3.0.post0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8ce8cc051ec17a4af2e75939922c298f78c7b78f14b197d55d024cf3ec708219 |
|
MD5 | 0b8dd824142af00ac6216572b7e2f733 |
|
BLAKE2b-256 | 5e48c41edcd1403db6d09e58166fbb2972f78936ccde1420b16abd4aad479ed9 |
Hashes for libzim-3.3.0.post0-cp311-cp311-macosx_13_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1558b286650b8aa575331c2fd4ca7dd3ba1f059c3be51207db53f4ed73c4f94 |
|
MD5 | 26f40e4bf27b262933d2e430d7cd9c34 |
|
BLAKE2b-256 | 9592b9cc87d7b64af9892ab0ddd03249debf361f4f30268c1f133aeed10138a6 |
Hashes for libzim-3.3.0.post0-cp311-cp311-macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | abae2d92db4323446c2d8b70455021bf505166ba6c0e82ec12c90b9bc183313c |
|
MD5 | 56869946d5c3ee5d34b9b13cd1bfd7c3 |
|
BLAKE2b-256 | ab42a6e06290059219f138dab2058e1b50a52a4ce5b8ccc691b2085ee93aed8f |
Hashes for libzim-3.3.0.post0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b06c5b234df9f1be832aa681a65d527280441cc5fe42cfa0984b0c5a69370130 |
|
MD5 | b8aa04b440fe32801b9e5dabda5352fa |
|
BLAKE2b-256 | 2777dba5a1f33b1d2a2a9d801a28809481199aef24c05f99fc29b2a691c69bdc |
Hashes for libzim-3.3.0.post0-cp310-cp310-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00612c5fc7060a27fa9f044d3ac090a078c0f0802f9d06b4660b2bfc07941f28 |
|
MD5 | 2260dd2aba0592c1d80b55529933fc7e |
|
BLAKE2b-256 | 21a9e0919c008f8a996e623f1fe7980f76b5083c540d3adae9ccca856a92852f |
Hashes for libzim-3.3.0.post0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c536ba1d4f33fd967127b420eb3e9a59127c747edfea2f0f80f3963958b16f07 |
|
MD5 | a3d757d4bfbbfe5419286d254f020a74 |
|
BLAKE2b-256 | 4a13039df4d6f0c302e54e653ee2817a99d1a4dfdfcef90d9aac44975765405a |
Hashes for libzim-3.3.0.post0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aeeab423c102dd2772b54d9fc505aa9487ef2052202476b795d4700545a3d6de |
|
MD5 | 86a6ab7991fa64ba672d04f8726a30b7 |
|
BLAKE2b-256 | 321b1beccb5d11c1266e58331a48abd7e988ae764102da9d50bb9b09e758b54a |
Hashes for libzim-3.3.0.post0-cp310-cp310-macosx_13_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8a9e09c97badef47434d39c7be895df55d8ab3cf15879be7b7e4d44b1bf8c55 |
|
MD5 | 85dbfa15bfb4fb12acf979458958eff5 |
|
BLAKE2b-256 | d17a0086c01892cf9afc8ce0ef38aa8daec2cf9e82ee64245811d1f927922365 |
Hashes for libzim-3.3.0.post0-cp310-cp310-macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5a0e3e0600bec196ef60570977d0478a4a7a3f5b3adc3229851db04bd00fc8c |
|
MD5 | 31cdff1a358e69b1f8e8ed28bd54a483 |
|
BLAKE2b-256 | 6459dadc12a77be323324c31486ed4dc541236bebee7e66f0df7c58c4922ca98 |
Hashes for libzim-3.3.0.post0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 928f0d2d805132b1a74efb3e85493a7699d68b13e03c3e86314556b542f86d29 |
|
MD5 | b3be71af5681f0bf7d8bf27396a81106 |
|
BLAKE2b-256 | ee8d48b445cebab31cecc7944b7aa35a7b2a97b4274d1ff3212c4c0e9620288c |
Hashes for libzim-3.3.0.post0-cp39-cp39-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f38d8c6926413c3433cc2a72109aefba47ea6091e327dfc1486bf6d1f0cb5bec |
|
MD5 | 44a4938c7156666b262a3c5dd27b20c5 |
|
BLAKE2b-256 | c8d7a96f44220e2af51e5b55d27950e1ccc6368fd964544c90b0438da5502eab |
Hashes for libzim-3.3.0.post0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 063f4cb0c3423898042f0259997fdde19c61f4dab7a660262362ec8fe57ce9a1 |
|
MD5 | 991ea67367253cf7683595675c4913a5 |
|
BLAKE2b-256 | e83a8d02b975c1e147fb891dc7f9463bca1414222d875f037f45a13fca6a8261 |
Hashes for libzim-3.3.0.post0-cp39-cp39-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f910ee11510ca616b511a8ea87592fd6b74f9cff51ac47991f6e3abe07b601d4 |
|
MD5 | 3743647d987b1729e8138bc4a58089b3 |
|
BLAKE2b-256 | f7f2739887eef0518b54f3deed024d7d5c4b3f3e03eab4b701ae8598be598352 |
Hashes for libzim-3.3.0.post0-cp39-cp39-macosx_13_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6506d74c24e0a3a461a38a422578dac85bc976c25b48b1621b5aec02f34b3d06 |
|
MD5 | d8902218e34f3bec50461cdbecb4e16f |
|
BLAKE2b-256 | 2f7a5bc8e38a2b6dc1a5a1d63cc39f48513e0b11bc138f3830cd37adc60ece9d |
Hashes for libzim-3.3.0.post0-cp39-cp39-macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6eae8d35effac4cbcd66cfa7aef0b58bcd5cc2e48b69c114b48a3c940aaa67f |
|
MD5 | de0a670ace02675515be68ab37ef4b0a |
|
BLAKE2b-256 | 393e48e41cf39f06bdc050138443116a7b859a80a137d82334ddf918e2f96e0c |
Hashes for libzim-3.3.0.post0-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2f31377ee41ed205434d45890b1405338166ffafb9e6eee9f6238d4aafb84fc6 |
|
MD5 | e6d8db1fc0cceca3efcdd8124bc5fc97 |
|
BLAKE2b-256 | 0436c259ac0f8c3d8be457da522fdb69a8f254a6b9ac72671e67643af8e423ee |
Hashes for libzim-3.3.0.post0-cp38-cp38-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6184425b044a452505922c5e791a58f56c67c70f948182ddf06f11ca2edcd3e |
|
MD5 | 138ac5f5088500d396f4991d8af05468 |
|
BLAKE2b-256 | 4d34f90ce6f4cf7994d9202c9126018099b2679f480233e7b3d96012cfc522cb |
Hashes for libzim-3.3.0.post0-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 40b4b7824d28320b0f4be3d9e601a074902df52bb66f4b869e78c2bf285acb0f |
|
MD5 | 80d0ba60e78a29954f39db9ce66fe7cc |
|
BLAKE2b-256 | 25ea0018c278fbd65f34139f71ed004f1af73fb92b1b0874e2794c17e5f2d62a |
Hashes for libzim-3.3.0.post0-cp38-cp38-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a0132532aa75c7d04eec62d1ff2cb637e8a560b2b72879ec928b1b8a459fc75a |
|
MD5 | cd2b87dad6dd91d82d1a40a184abc958 |
|
BLAKE2b-256 | db3e230e2c34e3818fa5968c07bed91722b37563a6972182503ec8643510e39d |
Hashes for libzim-3.3.0.post0-cp38-cp38-macosx_13_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7016efe9b095458094d9041e333629f6ee658a39e91238c8632078403bb9b94c |
|
MD5 | 07a45b71dd3441ec73020430658c0db5 |
|
BLAKE2b-256 | e2495fda599d17a0328e2c26e46d5b3b76f12b310bcaeafb58f16cd5e0681a90 |
Hashes for libzim-3.3.0.post0-cp38-cp38-macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 16c65d22e4adefbc03e7a11524d1ac6cad397aec4d71c94d3befe9d321aff107 |
|
MD5 | 65ac7b4e8be9ea1f5fe0521f9532c340 |
|
BLAKE2b-256 | eb332d4184805da0adbc61e45f8deaad50da9519abf2e85dc31762ec5ba376bf |