skip to navigation
skip to content

Corpse 0.2a

A corpus linguistics tool for Python 3.

corpse is a computational corpus linguistics tool. It has an effective tokenizing and preediting algorithm and objects to achieve any corporal research.

corpse has a tokenizing method which parses the text in lots of ways. In a default manner, algoritm parses the text to a list which contains tuples which is limited by sentence so that n-gram researches is able to be overcome easily.

corpse has three following objects: Language, Tag and Text. Language is simply language of a text so that any language can be found in a multilingual database. Tag is a system to “basicly” define what text is about. For instance, it is thought that a text-typical research based on Reiss’ theory can be applied in a database creating tags like “informational”, “expressional” or “operational”. And at leasy, Text is one of these objects.

Text object is simply the core object of module. However, it needs Tag objects, Language object and a “about” section. “about” argument can be also found in Language object’s arguments so as to define what an object is about “detailed”.

corpse v0.2a is still Alpha version and it is still experimental. So it is not useful in such big researches or projects. Also corpse will be developed only in Python 3 versions due to its unicode support.

File Type Py Version Uploaded on Size
Corpse-0.2a.tar.gz (md5) Source 2014-10-05 3KB