A python-facing API for creating and interacting with ZIM files
Project description
python-libzim
libzim
module allows you to read and write ZIM
files in Python. It provides a shallow python
interface on top of the C++ libzim
library.
It is primarily used in openZIM scrapers like sotoki
or youtube2zim
.
Installation
pip install libzim
Our PyPI wheels bundle a recent release of the C++ libzim and are available for the following platforms:
- macOS for
x86_64
andarm64
- GNU/Linux for
x86_64
,armhf
andaarch64
- Linux+musl for
x86_64
andaarch64
Wheels are available for both CPython and PyPy.
Users on other platforms can install the source distribution (see Building below).
Contributions
git clone git@github.com:openzim/python-libzim.git && cd python-libzim
# python -m venv env && source env/bin/activate
pip install -U setuptools invoke
invoke download-libzim install-dev build-ext test
# invoke --list for available development helpers
See CONTRIBUTING.md for additional details then Open a ticket or submit a Pull Request on Github 🤗!
Usage
Read a ZIM file
from libzim.reader import Archive
from libzim.search import Query, Searcher
from libzim.suggestion import SuggestionSearcher
zim = Archive("test.zim")
print(f"Main entry is at {zim.main_entry.get_item().path}")
entry = zim.get_entry_by_path("home/fr")
print(f"Entry {entry.title} at {entry.path} is {entry.get_item().size}b.")
print(bytes(entry.get_item().content).decode("UTF-8"))
# searching using full-text index
search_string = "Welcome"
query = Query().set_query(search_string)
searcher = Searcher(zim)
search = searcher.search(query)
search_count = search.getEstimatedMatches()
print(f"there are {search_count} matches for {search_string}")
print(list(search.getResults(0, search_count)))
# accessing suggestions
search_string = "kiwix"
suggestion_searcher = SuggestionSearcher(zim)
suggestion = suggestion_searcher.suggest(search_string)
suggestion_count = suggestion.getEstimatedMatches()
print(f"there are {suggestion_count} matches for {search_string}")
print(list(suggestion.getResults(0, suggestion_count)))
Write a ZIM file
from libzim.writer import Creator, Item, StringProvider, FileProvider, Hint
class MyItem(Item):
def __init__(self, title, path, content = "", fpath = None):
super().__init__()
self.path = path
self.title = title
self.content = content
self.fpath = fpath
def get_path(self):
return self.path
def get_title(self):
return self.title
def get_mimetype(self):
return "text/html"
def get_contentprovider(self):
if self.fpath is not None:
return FileProvider(self.fpath)
return StringProvider(self.content)
def get_hints(self):
return {Hint.FRONT_ARTICLE: True}
content = """<html><head><meta charset="UTF-8"><title>Web Page Title</title></head>
<body><h1>Welcome to this ZIM</h1><p>Kiwix</p></body></html>"""
item = MyItem("Hello Kiwix", "home", content)
item2 = MyItem("Bonjour Kiwix", "home/fr", None, "home-fr.html")
with Creator("test.zim").config_indexing(True, "eng") as creator:
creator.set_mainpath("home")
creator.add_item(item)
creator.add_item(item2)
for name, value in {
"creator": "python-libzim",
"description": "Created in python",
"name": "my-zim",
"publisher": "You",
"title": "Test ZIM",
}.items():
creator.add_metadata(name.title(), value)
Building
libzim
package building offers different behaviors via environment variables
Variable | Example | Use case |
---|---|---|
LIBZIM_DL_VERSION |
8.1.1 or 2023-04-14 |
Specify the C++ libzim binary version to download and bundle. Either a release version string or a date, in which case it downloads a nightly |
USE_SYSTEM_LIBZIM |
1 |
Uses LDFLAG and CFLAGS to find the libzim to link against. Resulting wheel won't bundle C++ libzim. |
DONT_DOWNLOAD_LIBZIM |
1 |
Disable downloading of C++ libzim. Place headers in include/ and libzim dylib/so in libzim/ if no using system libzim. It will be bundled in wheel. |
PROFILE |
1 |
Enable profile tracing in Cython extension. Required for Cython code coverage reporting. |
SIGN_APPLE |
1 |
Set to sign and notarize the extension for macOS. Requires following informations |
APPLE_SIGNING_IDENTITY |
Developer ID Application: OrgName (ID) |
Required for signing on macOS |
APPLE_SIGNING_KEYCHAIN_PATH |
/tmp/build.keychain |
Path to the Keychain containing the certificate to sign for macOS with |
APPLE_SIGNING_KEYCHAIN_PROFILE |
build |
Name of the profile in the specified Keychain |
Examples
Default: downloading and bundling most appropriate libzim release binary
python3 -m build
Using system libzim (brew, debian or manually installed) - not bundled
# using system-installed C++ libzim
brew install libzim # macOS
apt-get install libzim-devel # debian
dnf install libzim-dev # fedora
USE_SYSTEM_LIBZIM=1 python3 -m build --wheel
# using a specific C++ libzim
USE_SYSTEM_LIBZIM=1 \
CFLAGS="-I/usr/local/include" \
LDFLAGS="-L/usr/local/lib"
DYLD_LIBRARY_PATH="/usr/local/lib" \
LD_LIBRARY_PATH="/usr/local/lib" \
python3 -m build --wheel
Other platforms
On platforms for which there is no official binary available, you'd have to compile C++ libzim from source first then either use DONT_DOWNLOAD_LIBZIM
or USE_SYSTEM_LIBZIM
.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for libzim-3.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0c10bfb1e3342c8a60d5727073e5389d514ce59e1722fabc1ac6ac8bff3f0c6a |
|
MD5 | a4a2205f3137e0e602d474d58da2acf6 |
|
BLAKE2b-256 | ac15b34e514a032f272ff58a1befa19284c8f327b9de90dcb9d4a5eff3d3b6fa |
Hashes for libzim-3.2.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f4ec7fa34cdc9abdc211c8a775db46c8222aa40d4365046a75aa1d56edd0e361 |
|
MD5 | 49415caf4c548215fe2461a990380504 |
|
BLAKE2b-256 | 52f287f8e50a23182fea70fdf77360695a593b206a2b9bec7c76010b16a77e5f |
Hashes for libzim-3.2.0-cp311-cp311-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 09e0e360e7c657c384037c3a49ca47af843db70863f134612c9024754c5e02a5 |
|
MD5 | 0557d98bc1102d77b938ffdc2a2a3532 |
|
BLAKE2b-256 | d466f66c59b24f08a76bac5bcc0e8baf2dd4234488f209dd06489b6a0a74a96c |
Hashes for libzim-3.2.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f41186a845045b4a1778e0485f05aa1907dcb86ae0ccccd7e5438c51b7baed63 |
|
MD5 | 98e21928f9cc1942906c05dfc57b05f4 |
|
BLAKE2b-256 | b606c769f7215a3791f5247c1b71f64b86489402832af90ceae6812344c3fa1d |
Hashes for libzim-3.2.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0cd17bcf8d161f23f7c3e0ef3f3b980d83941313b65c2cbf6b3ce2fd144ff5b |
|
MD5 | 6a91f9245f468c8586baa6b560a7e039 |
|
BLAKE2b-256 | 4b0112b23e08fa312ca145909677f4165557d62e5d14202046a81d1c42e77f81 |
Hashes for libzim-3.2.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 884092878ebe41d9112ba0ccf262ff18d14263668cd534159226a8c02e94ef36 |
|
MD5 | 605ca21b7eb304a43fafd668dd5f7a4e |
|
BLAKE2b-256 | 2beefa8e936efe320257cf2bb835d806928bd0c258ed24a9bdfd1aedb336125e |
Hashes for libzim-3.2.0-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0515fca91693704b5d518041b12319799d11d805a7b31310eb55cd4d9f4188a |
|
MD5 | 0f2e1044ea6122eec78e56bc99cb4e07 |
|
BLAKE2b-256 | 551e431ac0c2e3626d2f8d2cd613e00d115ffd4c6778833553c4003c0fe77926 |
Hashes for libzim-3.2.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 116d60ca2681f1ef2f4eb28b16f0e2052654513073e36aa6729f7b7bbf647a46 |
|
MD5 | 64f6cb3b6bd3610d0f791a9aa17b5b0b |
|
BLAKE2b-256 | 4e4eb6ceed5349e350a90e5fbe87af505793f8da1d6d452e7b50cde57378fd6f |
Hashes for libzim-3.2.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dec225543809b7879b692ef5951c74eaa4a7b9caa509fcf3571fb1828cc4f7f3 |
|
MD5 | bd77b816bc66f3d186ef4470aab47150 |
|
BLAKE2b-256 | 7352aec5f880336acfe201b5af642891a5a48c7651b97e5b6fc7dc505189c333 |
Hashes for libzim-3.2.0-cp39-cp39-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d7eee217de160ccdd3a82062b88c380e40a59eaa570d7fc817aac22ded6b7f2 |
|
MD5 | 6c3280e60edd1fc90fca8a3ad83a3a7e |
|
BLAKE2b-256 | 0f9ac58918205aac3608fb3534331cd0549b4f38dc97fb544af82451cb346fa4 |
Hashes for libzim-3.2.0-cp39-cp39-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d3786dc6254eb14ea865dabc83c77ed9e8a8008c84adbd7c6e053f8a9e1bd32 |
|
MD5 | 57f4e7469575e7a4f59f90bed9f1e369 |
|
BLAKE2b-256 | 49a58d38714d342fea387776090b6a3dd19768244a6d3cd87ffa2f5d4e7113f4 |
Hashes for libzim-3.2.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f11daef983a4bc3b9b1bd2a54b090145970642f16bbbb2bb6a17d5911f01ba39 |
|
MD5 | ad5f43f137c094f039907067c7c83171 |
|
BLAKE2b-256 | 6b2ed5cb34dd799641b9aeba0db468775eb4ed441aa397e45b8264fd5a48a20a |
Hashes for libzim-3.2.0-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 51397e707089dbbd31ae926c88fd1aab5b577fb83ae007a1e0d636dc6ab62f52 |
|
MD5 | 164c71d18f90d41847be98b205ca0d44 |
|
BLAKE2b-256 | 842b9bdb1f6c04ea33aba7002d1eb58718141de80bc05cf4e5cdc1367014a4a7 |
Hashes for libzim-3.2.0-cp38-cp38-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d56c171de01350b6f46276b226741746d4290f68616253a8f2273562d825c0a7 |
|
MD5 | c058eeca0dbbec31e737fcb31a972f7a |
|
BLAKE2b-256 | 579eb88d9a11367fa6c9e8a757ba9703a142e1d5b42e743ed2912ff6919b5331 |
Hashes for libzim-3.2.0-cp38-cp38-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc967f5d3cc8ac2bf10394f20d9a0fc7a158d4e036bc4bf0340fba05118833e6 |
|
MD5 | a231745ec76117e02b0fa12e059194e8 |
|
BLAKE2b-256 | ca808e4e2b4fee41c1298e500ca431992d2dc9ec8ec69962b935693199e7039c |
Hashes for libzim-3.2.0-cp38-cp38-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0565b1d3cbaab9f8f5009cdb06575e2ab79a0fc172fe6b27a7829163705724c7 |
|
MD5 | cf4ad987880c2ff9cd618e8c071d4f64 |
|
BLAKE2b-256 | 9cfadbd3a899e50f8b7b9da8a22c553581faa708d5edd2ac8d8abfa34dff4fe6 |