Finnish syllabifier and compound segmenter
Project description
FinnSyll
FinnSyll is a Python library that syllabifies words according to Finnish syllabification principles. It is also equipped with a Finnish compound splitter. More details/docs to come.
Installation
$ pip install FinnSyll
Basic usage
First, instantiate a FinnSyll object.
>>> from finnsyll import FinnSyll >>> f = FinnSyll()
To syllabify:
>>> f.syllabify('runoja') ['ru.no.ja'] # internal syllable boundaries are indicated with '.'
To segment compounds:
>>> f.split('sosiaalidemokraattien') 'sosiaali=demokraattien' # internal word boundaries are indicated with '='
Optional arguments
The syllabifier can be customized along two different parameters: variation and compound splitting.
variation
Instantiating a FinnSyll object with variation=True (default) will allow the syllabifier to return multiple syllabifications if variation is predicted. When variation=True, the syllabifier will return a list. Setting variation to False will cause the syllabifier to return a string containing the first predicted syllabification.
Variation:
>>> f = FinnSyll(variation=True) >>> f.syllabify('runoja') ['ru.no.ja'] >>> f.syllabify('vapaus') ['va.pa.us', 'va.paus']
No variation:
>>> f = FinnSyll(variation=False) >>> f.syllabify('runoja') 'ru.no.ja' >>> f.syllabify('vapaus') 'va.pa.us'
split_compounds
When instantiating a FinnSyll object with split_compounds=True (default), the syllabifier will first attempt to split the input into constituent words before syllabifying it. This forces the syllabifier to insert a syllable boundary in between identified constituent words. The syllabifier will skip this step if split_compounds is set to False.
Compound splitting:
>>> f = FinnSyll(split_compounds=True) >>> f.syllabify('rahoituserien') # rahoitus=erien ['ra.hoi.tus.e.ri.en']
No compound splitting:
>>> f = FinnSyll(split_compounds=False) >>> f.syllabify('rahoituserien') ['ra.hoi.tu.se.ri.en'] # incorrect
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for FinnSyll-2.0.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 364df4c6276c95d6327b169724c6d87778d310059ca82f5996577a63a996a95e |
|
MD5 | 237bbae5a706885f28864e363ba57562 |
|
BLAKE2b-256 | 93a9929b873616be5ec218bf33a3fde332f86aa6a61d306557c8f5a1549a6784 |