Skip to main content

Burmese text normalizer, wordbreak, converter, cleaner and phonemizer for speech related tasks.

Project description

BURMESE PHONEMIZER AND CLEANER(BPC)

Python package Total alerts Language grade: Python

Burmese Language data prepartion for speech related tasks.

Installation

$ pip install bpc

or

$ pip install git+git://github.com:1chimaruGin/Burmese_Phomizer_and_Cleaner.git

Usage

For text Cleaning

from bpc import Cleaner

cc = Cleaner()
cc.clean_text("မင်္ဂလာပါ? မင်္ဂလာပါ။ ၀န်းရံ ဝ၁၂၃၄ 5B")

# output: မင်္ဂလာပါ မင်္ဂလာပါ။ ဝန်းရံ ၀၁၂၃၄ ၅ဘီ

For phonemization

from bpc import BurmesePhoneme

bp = BurmesePhonemizer()
bp.text_to_phone("မင်္ဂလာပါ")

# output: ['m', 'ŋ', 'ɡ', 'l', 't', 's', 'p', 'ˈe']

For data preparation

from bpc.dataset import PrepareDataset

dataset = PrepareDataset()
dataset.prepare_data(path='path/to/dataset', method='kfold', save=True)

References

Citations

@inproceedings{watanabe2018espnet,
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  title={{ESPnet}: End-to-End Speech Processing Toolkit},
  year={2018},
  booktitle={Proceedings of Interspeech},
  pages={2207--2211},
  doi={10.21437/Interspeech.2018-1456},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1456
}

@article{Bernard2021,
  doi = {10.21105/joss.03958},
  url = {https://doi.org/10.21105/joss.03958},
  year = {2021},
  publisher = {The Open Journal},
  volume = {6},
  number = {68},
  pages = {3958},
  author = {Mathieu Bernard and Hadrien Titeux},
  title = {Phonemizer: Text to Phones Transcription for Multiple Languages in Python},
  journal = {Journal of Open Source Software}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bpc-0.1.3.tar.gz (30.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page