Ontonotes-5-parsing: parser of Ontonotes 5.0 to transform this corpus to a simple JSON format.
Project description
A simple parser of the famous Ontonotes 5 dataset https://catalog.ldc.upenn.edu/LDC2013T19
This dataset is very useful for experiments with NER, i.e. Named Entity Recognition. Besides, Ontonotes 5 includes three languages (English, Arabic, and Chinese), and this fact increases interest to use it in experiments with multi-lingual NER. But the source format of Ontonotes 5 is very intricate, in my view. Conformably, the goal of this project is the creation of a special parser to transform Ontonotes 5 into a simple JSON format. In this format, each annotated sentence is represented as a dictionary with five keys: text, morphology, syntax, entities, and language. In their’s turn, morphology, syntax, and entities are specified as dictionaries too, where each dictionary describes labels (part-of-speech labels, syntactical tags, or entity classes) and their bounds in the corresponded text.
You can read more detailed information about this Ontonotes 5 parser in the small documentation https://github.com/nsu-ai/ontonotes-5-parsing/blob/master/readme.md
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ontonotes-5-parsing-0.0.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 489c4a6b2915496c3e3b7419a00c909e28ff1fb0231f50da6ca8373350eadc14 |
|
MD5 | 51271f8ba528b574891f639c7a3ae985 |
|
BLAKE2b-256 | d763be0dc965ccd194eca9d4bbd92ad9c72dc122076e87bdf22aafe8a7b1bf55 |
Hashes for ontonotes_5_parsing-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5cbad455a9a53cbbce605f4dd3d5c2ef027611fbf479a000e34805e468d8cfc9 |
|
MD5 | f4ab68b2afa7e239820240b13aafe752 |
|
BLAKE2b-256 | f1a9a938f64892cceec678e225e3722d270677fe36ad25ba951358f1007ec910 |