skip to navigation
skip to content

Not Logged In

udon 0.1

Normalizing English lengthened expression. e.g.

Latest Version: 0.1.1

udon

Udon is a text normalizer for lengthened English expression having repeating letters.

(e.g., Udon converts “cooooooooooooooollllllllllllll” to “cool”)

This module is based on the following paper:

Samuel Brody and Nicholas Diakopoulos. Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! using word lengthening to detect sentiment in microblogs. In EMNLP2011, pp. 562-570, 2011.

http://aclweb.org/anthology//D/D11/D11-1052.pdf

Installation

$ pip install udon

Usage

Import udon

>>> import udon

Normalize sentence

>>> udon.normalize_sentence('you are coooolll!!!')
you are cool!
  • normalize_sentence(str)

Normalize sentence

>>> udon.normalize_word('okayyyyy')
okay
  • normalize_word(str)

Shorten repeated substring until threshould without dictionary

>>> udon.cut_repeat('mamisaaaaaan', 1)
mamisan
>>> udon.cut_repeat('okayyyyy', 2)
okayy
  • cut_repeat(str, threshould)
    • Note that this method don’t use a lengthened expression normalize table (e.g., cooll -> cool). If you want to normalize such expression, use normalize_word() or normalize_sentence() method.

TODO

  • Support Japanese lengthened expressions

Contributions are welcome!

License

  • This module is licensed under MIT License.

CHANGES

0.1 (2014-03-14)

First release.

 
File Type Py Version Uploaded on Size
udon-0.1.tar.gz (md5) Source 2014-03-14 5KB
  • Downloads (All Versions):
  • 3 downloads in the last day
  • 23 downloads in the last week
  • 35 downloads in the last month