This tool clusterizes lines of text given a collection of input patterns modeled using regular expressions.
Project description
Pattern clustering
This tool clusterizes lines of text given a collection of input patterns modeled using regular expressions.
This work has been published to:
[ICPR’2022] A novel pattern-based edit distance for automatic log parsing, Maxime Raynal, Marc-Olivier Buob, Georges Quénot.
Features
Forms groups of homogeneous line using a pattern based distance, based on customizable patterns.
Configured by default to use common patterns (IP addresses, numeric values, etc.)
License
This project is licensed under the BSD-3-Clause license - see the LICENSE.
More about pattern-clustering
For more information, feel free to visit the wiki:
Acks
The skeleton package was created with Cookiecutter and the francois-durand/package_helper_2 project template.
The sphinx part is inspired from Sphinx-Autosummary-Recursion.
History
0.1.0 (2022-05-11): First release
First release on PyPI.
0.2.0 (2022-06-02): CI
Updated tox.ini and GitHub actions, work in progress.
0.3.0 (2022-06-22): Bug fixes and CI improvements
Fixed sphinx local build
Fixed bumpversion
Add experiments notebooks and datasets
Improved test suite
0.3.1 (2022-06-22): Bug fixes and CI improvements
Fixed readthedoc build
0.4.1 (2022-06-24): Bug fixes and CI improvements
Fixed readthedoc build
Implemented console script (cli)
Reworked PatternClusteringEnv class
Bug fixes
Updated documentation
0.4.2 (2022-06-24): Added entry points
Added pattern-distance entry point, see pattern-distance –help.
Added pattern-clustering-mkconf entry point. The resulting json may be passed to pattern-distance and pattern-clustering commands.
0.5.0 (2022-06-25): Added entry points
Bug fixes in notebooks/
Removed unused patterns
1.0.0 (2022-07-01): checked experiments
Checked experiments in notebooks/
Fixed warning related to documentation build
Improved tests