a fast regex for object
Project description
pyrefo: a fast regex for object
This project is based on refo and the paper Regular Expression Matching: the Virtual Machine Approach, it use cffi to extend python with c to speed accelerate processing performance.
This project has done the following work:
- full compatiable with refo api, support all patterns and match, search, finditer methods;
- fix c source bug included in the paper;
- use cffi to extend python with c;
- add new feature which supports partial match;
- add new
Phrase
pattern which can realize'ab'
match['a', 'b', 'c']
list;
performance test
prerequisites
import jieba
text = '为什么在本店买东西?因为物流迅速+品质保证。为什么我购买的每件商品评价都一样呢?因为我买的东西太多了,积累了很多未评价的订单,所以我统一用这段话作为评价内容。如果我用了这段话作为评价,那就说明这款产品非常赞,非常好!'
tokens = list(jieba.cut(text))
CPython
- pyrefo
from pyrefo import search, Group, Star, Any, Literal
%timeit search(Group(Literal('物流') + Star(Any()) + Literal('迅速'), 'a'), tokens)
95.9 µs ± 472 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
- refo
import refo
%timeit refo.search(refo.Group(refo.Literal('物流') + refo.Star(refo.Any()) + refo.Literal('迅速'), 'a'), tokens)
1.03 ms ± 7.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
- re
import re
%timeit re.search('(物流.*速度)', text)
989 ns ± 4.69 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
PyPy
- pyrefo
from pyrefo import search, Group, Star, Any, Literal
%timeit search(Group(Literal('物流') + Star(Any()) + Literal('迅速'), 'a'), tokens)
53.4 µs ± 28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
- refo
import refo
%timeit refo.search(refo.Group(refo.Literal('物流') + refo.Star(refo.Any()) + refo.Literal('迅速'), 'a'), tokens)
78 µs ± 35.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
- re
import re
%timeit re.search('(物流.*速度)', text)
347 ns ± 3.26 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyrefo-0.2.tar.gz
(23.7 kB
view hashes)
Built Distribution
Close
Hashes for pyrefo-0.2-cp37-cp37m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a516e6296b60e55117970537a5888dcde33f975f850ef9c63b595025ed4a37fa |
|
MD5 | de4c424775084555e27c12fda74b59ab |
|
BLAKE2b-256 | 2241557811b5ac2e4f80f57341ccaf2c6d58728cc32ac84461a2c047d2ba3e93 |