No project description provided
Project description
Whispool
Fast in memory word frequency count, with Rust
Installation
pip install whispool
Basic usage
We assume we have n sentences, with each sentence broken into words. And each sentence is
# each sentence could have diffrent length of words
sentence = [
['This', 'is', 'a', 'sentence'],
['This', 'is', 'another', 'sentence'],
['This', 'is', 'a', 'third', 'sentence', 'here'],
...
]
# each info list has to be IN THE SAME LENGTH
info = list([
['id1', 'category1', 'sci-fi', 'movie','Jan.'],
['id2', 'category1', 'romance', 'movie','Jan.'],
['id3', 'category2', 'sci-fi', 'tv','Feb.'],
...
])
Put in data
Now we put in data
from whispool import Whispool
from pathlib import Path
whisper_multi = Whispool(
directory=Path("test_data"), threads=2, capacity=200)
whisper_multi.consume(sentence, info)
whisper_multi.consume(sentence_batch2, info_batch2)
...
whisper_multi.consume(sentence_batchN, info_batchN)
Fetch the result data
Now we want to see the hot key words of Sci-Fi movies in January
.
>>> df = whisper_multi.final_stats(
[None, None, 'sci-fi', 'movie', 'Jan.'],
top_n=2)
Build from rust
maturin build --release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
whispool-0.1.1.tar.gz
(541.7 kB
view hashes)
Built Distribution
Close
Hashes for whispool-0.1.1-cp37-cp37m-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49b091dac53911b8509daf63cca23681941c144ff5d7080933a3385294dd8b53 |
|
MD5 | 151f75af6865931e9c274705090067a7 |
|
BLAKE2b-256 | 7e257932951d77caa15af70019f507c00178226e8052890b7b142c7e9ada7820 |