A python-facing API for creating and interacting with ZIM files
Project description
python-libzim
libzim
module allows you to read and write ZIM
files in Python. It provides a shallow python
interface on top of the C++ libzim
library.
It is primarily used in openZIM scrapers like sotoki
or youtube2zim
.
Installation
pip install libzim
The PyPI package is available for x86_64 macOS and GNU/Linux only. It bundles a recent release of the C++ libzim.
On other platforms, you'd have to compile C++ libzim from
source first then build this one, adjusting LD_LIBRARY_PATH
.
Contributions
git clone git@github.com:openzim/python-libzim.git && cd python-libzim
# python -m venv env && source env/bin/activate
pip install -U setuptools invoke
invoke download-libzim install-dev build-ext test
# invoke --list for available development helpers
See CONTRIBUTING.md for additional details then Open a ticket or submit a Pull Request on Github 🤗!
Usage
Read a ZIM file
from libzim.reader import Archive
from libzim.search import Query, Searcher
from libzim.suggestion import SuggestionSearcher
zim = Archive("test.zim")
print(f"Main entry is at {zim.main_entry.get_item().path}")
entry = zim.get_entry_by_path("home/fr")
print(f"Entry {entry.title} at {entry.path} is {entry.get_item().size}b.")
print(bytes(entry.get_item().content).decode("UTF-8"))
# searching using full-text index
search_string = "Welcome"
query = Query().set_query(search_string)
searcher = Searcher(zim)
search = searcher.search(query)
search_count = search.getEstimatedMatches()
print(f"there are {search_count} matches for {search_string}")
print(list(search.getResults(0, search_count)))
# accessing suggestions
search_string = "kiwix"
suggestion_searcher = SuggestionSearcher(zim)
suggestion = suggestion_searcher.suggest(search_string)
suggestion_count = suggestion.getEstimatedMatches()
print(f"there are {suggestion_count} matches for {search_string}")
print(list(suggestion.getResults(0, suggestion_count)))
Write a ZIM file
from libzim.writer import Creator, Item, StringProvider, FileProvider, Hint
class MyItem(Item):
def __init__(self, title, path, content = "", fpath = None):
super().__init__()
self.path = path
self.title = title
self.content = content
self.fpath = fpath
def get_path(self):
return self.path
def get_title(self):
return self.title
def get_mimetype(self):
return "text/html"
def get_contentprovider(self):
if self.fpath is not None:
return FileProvider(self.fpath)
return StringProvider(self.content)
def get_hints(self):
return {Hint.FRONT_ARTICLE: True}
content = """<html><head><meta charset="UTF-8"><title>Web Page Title</title></head>
<body><h1>Welcome to this ZIM</h1><p>Kiwix</p></body></html>"""
item = MyItem("Hello Kiwix", "home", content)
item2 = MyItem("Bonjour Kiwix", "home/fr", None, "home-fr.html")
with Creator("test.zim").config_indexing(True, "eng") as creator:
creator.set_mainpath("home")
creator.add_item(item)
creator.add_item(item2)
for name, value in {
"creator": "python-libzim",
"description": "Created in python",
"name": "my-zim",
"publisher": "You",
"title": "Test ZIM",
}.items():
creator.add_metadata(name.title(), value)
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
libzim-2.0.0.tar.gz
(187.2 kB
view hashes)
Built Distributions
Close
Hashes for libzim-2.0.0-cp310-cp310-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ec9e2a20ae05c620bb788289f8a3c147587d48f9b795e2d24965c74243fc91e |
|
MD5 | b4412b3fd52109787f7fab68fa91280b |
|
BLAKE2b-256 | c11ac0ce9d2e05d5e6888e05fa7cb5d660398ef1bd31787189f0536cff324624 |
Close
Hashes for libzim-2.0.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9d0b867d6fc4ab41aaf2b7a6e347125c66beb24b9e889eaba6caf1c704becb7 |
|
MD5 | 2e9954c3fccccf49b246ff0a928d546a |
|
BLAKE2b-256 | a45f1644063406a49f17f26d9b4183f7d85fb5ecf1b7a1e257f4ac96e3497bb3 |
Close
Hashes for libzim-2.0.0-cp39-cp39-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64ad727ef4d15dc2ecbdab62b3cc39b50a4df1766a4b67d03fe23860581d90fc |
|
MD5 | 410342a093ec6e92a59f9a4b34465c43 |
|
BLAKE2b-256 | 3f5ab90a0198cd94fbcb4b894fb1487a63c581dc548e5859bec20ddfe865e436 |
Close
Hashes for libzim-2.0.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce87cd9331e32324c66ac778725fc32782cf3c422d464739617610b26c9f414e |
|
MD5 | 4987195b513baa49dad5adac3b8210bc |
|
BLAKE2b-256 | a09a5c60b2b12390c713ba05889c49cb50c958188ac48ba4abc2e7f440b6af44 |
Close
Hashes for libzim-2.0.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb31ecf7da7bdd492f999b3b0fdc73a6bf82c48dca170663d97fee55e933700f |
|
MD5 | 97edbadc7352b7388286b044c0d27066 |
|
BLAKE2b-256 | 0485f5bd2f9a8338dda1be993d4f9b6c2c966465521369c61f5e8b1a651e0007 |
Close
Hashes for libzim-2.0.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4bba76f25d3fdfb6ccc174b5bf8701090cd212d160e297c99ffc79e69da0e1ea |
|
MD5 | 84b7a701bb702971148419dffd90cae9 |
|
BLAKE2b-256 | 8e2aaaceaf7a5488c5772b80c3a4194f5242c60117b18b4294a2e5b562a0a851 |
Close
Hashes for libzim-2.0.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 706589087507b1849bc09accaf0400f4d4042cb0e8ebf804bef974f10035bafb |
|
MD5 | 09981613ffecd9ed1ce3a6852f769127 |
|
BLAKE2b-256 | 410ea5f657a820ca5122ac69cb41c856d52e96cc226ce84b3f621db4038687a4 |
Close
Hashes for libzim-2.0.0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d1981b20b853b3f889582bb1ae7f70ce08f73faeb69c2bd0cc2efac7b90a7c5f |
|
MD5 | 8a2d474d4555127fd742650ad9fc0ac2 |
|
BLAKE2b-256 | de662987fe8367c3cf4f3b09397feb80629e7ede90c312d7b1931fe047b14602 |
Close
Hashes for libzim-2.0.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12877c12ebc79429418434df362d9cf0a4f9a0c4c4ccd85449128db755abf4cd |
|
MD5 | 3fd703250e472107ce17b63df492c9a3 |
|
BLAKE2b-256 | 7debe8c1db605e447ce25b8df37d3d3cf2e81b95dbbbe4a6a23a5007b7396214 |
Close
Hashes for libzim-2.0.0-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad6ec9c901284465b275b4d7c02e89549fa5b49b98cf1db37e378b78a5bd07be |
|
MD5 | c2c4c6313a61285be56b8ff5fba769dc |
|
BLAKE2b-256 | 78c1b25e9efa6869541231ea764b8ed2ae46e3e04e7bef36d6839e17d9ad4088 |