Library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). Re-implementation of OpenFst in Rust.
Project description
Rustfst
Rust
Python
This repo contains a Rust implementation of Weighted Finite States Transducers. Along with a Python binding.
Rustfst is a library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). Weighted finite-state transducers are automata where each transition has an input label, an output label, and a weight. The more familiar finite-state acceptor is represented as a transducer with each transition's input and output label equal. Finite-state acceptors are used to represent sets of strings (specifically, regular or rational sets); finite-state transducers are used to represent binary relations between pairs of strings (specifically, rational transductions). The weights can be used to represent the cost of taking a particular transition.
FSTs have key applications in speech recognition and synthesis, machine translation, optical character recognition, pattern matching, string processing, machine learning, information extraction and retrieval among others. Often a weighted transducer is used to represent a probabilistic model (e.g., an n-gram model, pronunciation model). FSTs can be optimized by determinization and minimization, models can be applied to hypothesis sets (also represented as automata) or cascaded by finite-state composition, and the best results can be selected by shortest-path algorithms.
References
Implementation heavily inspired from Mehryar Mohri's, Cyril Allauzen's and Michael Riley's work :
- Weighted automata algorithms
- The design principles of a weighted finite-state transducer library
- OpenFst: A general and efficient weighted finite-state transducer library
- Weighted finite-state transducers in speech recognition
Example
use anyhow::Result;
use rustfst::prelude::*;
use rustfst::algorithms::determinize::{DeterminizeType, determinize};
use rustfst::algorithms::rm_epsilon::rm_epsilon;
fn main() -> Result<()> {
// Creates a empty wFST
let mut fst = VectorFst::<TropicalWeight>::new();
// Add some states
let s0 = fst.add_state();
let s1 = fst.add_state();
let s2 = fst.add_state();
// Set s0 as the start state
fst.set_start(s0)?;
// Add a transition from s0 to s1
fst.add_tr(s0, Tr::new(3, 5, 10.0, s1))?;
// Add a transition from s0 to s2
fst.add_tr(s0, Tr::new(5, 7, 18.0, s2))?;
// Set s1 and s2 as final states
fst.set_final(s1, 31.0)?;
fst.set_final(s2, 45.0)?;
// Iter over all the paths in the wFST
for p in fst.paths_iter() {
println!("{:?}", p);
}
// A lot of operations are available to modify/optimize the FST.
// Here are a few examples :
// - Remove useless states.
connect(&mut fst)?;
// - Optimize the FST by merging states with the same behaviour.
minimize(&mut fst)?;
// - Copy all the input labels in the output.
project(&mut fst, ProjectType::ProjectInput);
// - Remove epsilon transitions.
rm_epsilon(&mut fst)?;
// - Compute an equivalent FST but deterministic.
fst = determinize(&fst)?;
Ok(())
}
Benchmark with OpenFST
I did a benchmark some time ago on almost every linear fst algorithm and compared the results with OpenFst
. You can find the results here :
Spoiler alert: Rustfst
is faster on all those algorithms 😅
For the other algorithms, i'm finishing the implementation then will do another round of benchmarks.
At the moment, the main algorithm missing in Rustfst
is the composition that will be implemented shortly. All the important algorithms like minimization, determinization ... are already implemented but not benched and thus not (necessarily) optimized.
Documentation
The documentation of the last released version is available here : https://docs.rs/rustfst
Release process
- Use the script
update_version.sh
to update the version of every package. - Push
- Push a new tag with the prefix
rustfst-v
Example :
./update_version.sh 0.9.1-alpha.6
git commit -am "Release 0.9.1-alpha.6"
git push
git tag -a rustfst-v0.9.1-alpha.6 -m "Release rustfst 0.9.1-alpha.6"
git push --tags
Optionally, if this is a major release, create a GitHub release in the UI.
Projects contained in this repository
This repository contains two main projects:
rustfst
is the Rust re-implementation.rustfst-python
is the python binding ofrustfst
.
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT) or http://opensource.org/licenses/MIT)
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for rustfst_python-0.9.1a11-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f732250607a3412b1d79d35d78c6f1690165f69f53cd789a96c684cc5bbcf46 |
|
MD5 | 015bc2e4e85e5253f50e8d43b7c29c67 |
|
BLAKE2b-256 | 0798ef7fd94e625a4f49be525639eb3cdfd7d14e6d8b309bbd4c9f28ab8b41a1 |
Hashes for rustfst_python-0.9.1a11-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f08e0730b1bbc8beafd2f29c441bb525e9e3b711fddccee350d83406abe46c20 |
|
MD5 | 76e06bc5a9ab3b26a19c3f6e7c7dc05d |
|
BLAKE2b-256 | 677c48f2fd9891895631fc586ebddcdb7766a066ebd03e5e0cfa936b4bf9dbc3 |
Hashes for rustfst_python-0.9.1a11-pp39-pypy39_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bde8a53605e80843d4b45b624633b56c491a50d2485685e4c0d5fb22db704e2a |
|
MD5 | d5c60b9769225d53f1796f9369ef25a6 |
|
BLAKE2b-256 | 7db3626353b5f759b87f8710595254d9f23dbc864b68b04c58c7509d88eaee45 |
Hashes for rustfst_python-0.9.1a11-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3068a0176d1fb3a6627d108707e360b85e65fd2acc4e291c7769ce9b365e7fad |
|
MD5 | cca54ee4f21ddcd1de032a2c311be16a |
|
BLAKE2b-256 | 18b2b698f9b575bb57c6b1e5e773347363fa387c950443d424dd96bd2e3743f7 |
Hashes for rustfst_python-0.9.1a11-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc1206d55e5a6a1dcca5b0d5c46a363da3cbc3b263ea7808c220a98dc736d45a |
|
MD5 | 90388d33ffd80dc94909f4b1eb14331c |
|
BLAKE2b-256 | 12e1e6f36bb484e003a4421a88b2a8b7ca2669c46dc5b442558de85dd413f36b |
Hashes for rustfst_python-0.9.1a11-pp38-pypy38_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b7a1b17eada548faea06282bdaa199b8977f274e88e8a833bf4e352ed975dd9c |
|
MD5 | 5037148a0b8814711e06c8b2a46ba89f |
|
BLAKE2b-256 | 11893528422738eb3d2e588facd25a3d7b6fa78085a176aaf0663d40058ffdab |
Hashes for rustfst_python-0.9.1a11-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2426f814ead2d3954a347a0aa6cfb9642eb41e1721503479172ca45bf1009d42 |
|
MD5 | e78a7fd32f1db05b47ea294227b2f60c |
|
BLAKE2b-256 | 2c3168129a3e0b0619e8b7cfb5b7496c106115983e482c4721e2d968b2cc4452 |
Hashes for rustfst_python-0.9.1a11-pp37-pypy37_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f91d4d88faea9136b6e7ac5dbf2c43e665b539fde6333e9747519c0ed27e2bc |
|
MD5 | 4102d18c71ce2d22a9bd0529d5f0493e |
|
BLAKE2b-256 | c1c041fc5cd722685443fafd97f3a32aee8b833dbf3333ef51a52e895ed8230e |
Hashes for rustfst_python-0.9.1a11-pp37-pypy37_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68bf46d2e1b89741be1d4b5aa9bd36fa534f30cd5b3500adc09092dc885a741c |
|
MD5 | 2c2b6c8170f12eab055447a9e08dd063 |
|
BLAKE2b-256 | 9cb75dfd6850cad607687aeabcc06b7722c101f572df460d09bf357904a23940 |
Hashes for rustfst_python-0.9.1a11-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 94a9c6448b7cd1b5c7db3bed66ccbfde7b824809be6fe8070840afd9cb04b47c |
|
MD5 | 823363a047009206987c04815f5b211e |
|
BLAKE2b-256 | 8d1c620cfbcff31e7e1c48940c24ea4243ad4eea0a9555814d32cc7962805c49 |
Hashes for rustfst_python-0.9.1a11-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d9f6595cf7ac01c9ef9570951a9b9fcb173cbd80b46711da8d856886356d414 |
|
MD5 | 45e9695c84db0eafcede31faea73c97b |
|
BLAKE2b-256 | 373c0c13c0caab0f350f1ae263d050441348d494bee7b4073804787373edf3ae |
Hashes for rustfst_python-0.9.1a11-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b562dbcce13b3ddf4e56fa93b5e8179c4693bd8d738c4898e8416b16f57d228a |
|
MD5 | 174dd53d7f15f97c917fb18b2f0eb99d |
|
BLAKE2b-256 | ea6919f9cd306bf17af7885fe1bd32b0567e3b994e16da9e75ef13b8531b7fc8 |
Hashes for rustfst_python-0.9.1a11-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad19cd3e2a510ec1ab85f2c210d05dea1faa2cf096944f7ca0d611088602e956 |
|
MD5 | c34dc5bb02ed74dfc85635b677e04564 |
|
BLAKE2b-256 | 4392f1316581f98c16558713a7ab201ade8d3727661551c3244b294f9e2d39da |
Hashes for rustfst_python-0.9.1a11-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c83792acc2ec5625c312875b6a0c87c55553b8ec7323650d3e400612a106f62e |
|
MD5 | c330a0717019c3d973faaf5265d3600f |
|
BLAKE2b-256 | 6ebdf131ea8ba1f1dcd2f21e2b2a1a98e85ddfe06f4ce1d6454e3212aee6deee |
Hashes for rustfst_python-0.9.1a11-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5362957398b0cd949a68c8386d34c4e45cf1d209c0a8416392390cf4c58a1db5 |
|
MD5 | 111ccb59004754a4f40179bf952d5484 |
|
BLAKE2b-256 | 6a48f30796d44c80fb93515c41fe7109e351ade207b67d4cc932045dbdc48970 |
Hashes for rustfst_python-0.9.1a11-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bcaf094c28c6906e295cf269c82ff7bf6b1ac57b71f8de5d78a7554477904a43 |
|
MD5 | ac730b1fa61811a0fd9c6666b93fe528 |
|
BLAKE2b-256 | 9ebfc2b53f6dd6151be428f3f4c492b5365b693ca779f6e9a38bc9cf21b51409 |
Hashes for rustfst_python-0.9.1a11-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73c50d769e24155738c5cc721587a1cfd93c646c855a4ca90d4061b49dc2439d |
|
MD5 | 58c5da0377d2bead6c2c8f77713cd53e |
|
BLAKE2b-256 | 2f127c353186e0778a904cc0b4352038e9838fb411ed448a922bb1a78c4dee5e |
Hashes for rustfst_python-0.9.1a11-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9532c0f30ac342baf2fa4d4433ad1bcef8c1b2e346c133bb473f90d3d43e70bf |
|
MD5 | 3ee4f3d238749f4220cd1627cfe84ea9 |
|
BLAKE2b-256 | 02be9272b842e1e5802dc262a002ca5fca145e48154b1fa6c5ed1a224c19115e |
Hashes for rustfst_python-0.9.1a11-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 43b1b3d9da9e92f9a73933dc903a27fe670e7a23b87f60594e0df9bc75195a09 |
|
MD5 | d9e914e464a4d331f6918c8d3cae9209 |
|
BLAKE2b-256 | c61ae4e77b1aee031cca1cd427601d7dd0c2520a9941bbfd5135317502353743 |
Hashes for rustfst_python-0.9.1a11-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba3e97ec53b75cf535304b9fce8b96460b2bee124462ee15c3fa8f2d6231a1c0 |
|
MD5 | 28e967b3ce9be749dbe298fc9a121694 |
|
BLAKE2b-256 | 3f28f9cee4fdbcb7c19f9c200bce3ee2c72282afde366011ce36faad4585b0ac |
Hashes for rustfst_python-0.9.1a11-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6243a5eaa003f1e5349d68a06defd6a55ee86044a9be35f9d28d5fe5546f166 |
|
MD5 | 7848dea526eb5d974af8632395dbabc7 |
|
BLAKE2b-256 | 67ed1b6316281dd53ccb3876a639e0b2e5ab20099ad042fae4aa68e920a3ca46 |
Hashes for rustfst_python-0.9.1a11-cp37-cp37m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0dbf2d337f41a8bfa46fb89dc917f2b4d1ab90640a3dc732a77e0c710cb69465 |
|
MD5 | ba18d06f7de6199630854a48b85c6230 |
|
BLAKE2b-256 | 41291c2c996883791c111d80aef4494d81a64f46e5fd40b78cd3f71eaeec1eea |
Hashes for rustfst_python-0.9.1a11-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 783f7b4b48def0a9e59fdbfc2e540574fc3eb2374e187eef117b6debb206edf6 |
|
MD5 | 9837275ae213aa513e2fc0b9c095e66e |
|
BLAKE2b-256 | 61e3b934f69c70835402143d4e934c0a0636fc857b92aa4d933a4d73a55a4fc5 |
Hashes for rustfst_python-0.9.1a11-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85cc4734a6c6f8057d96c14208deecb7ab0319d8ff303fd55c18064b6b4bf008 |
|
MD5 | 9271f95996c6be16d8d70dc3644d1465 |
|
BLAKE2b-256 | 6256b3e64f936740cd2532f14de8393a3df07d3d8f09dc2e6a92343404b260dd |
Hashes for rustfst_python-0.9.1a11-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 991c9cfe74cd8d80b0e4ef385653175f095242d70f8478587f0dd4d99c24f7e6 |
|
MD5 | 7b4636424b6c06e94b2cfd08cb06c571 |
|
BLAKE2b-256 | e2365f6607fc53153f4dba8095091b686358c5a892c60c1848657ddc1e95fbd7 |