Ontonotes-5-parsing: parser of Ontonotes 5.0 to transform this corpus to a simple JSON format.
Project description
A simple parser of the famous Ontonotes 5 dataset https://catalog.ldc.upenn.edu/LDC2013T19
This dataset is very useful for experiments with NER, i.e. Named Entity Recognition. Besides, Ontonotes 5 includes three languages (English, Arabic, and Chinese), and this fact increases interest to use it in experiments with multi-lingual NER. But the source format of Ontonotes 5 is very intricate, in my view. Conformably, the goal of this project is the creation of a special parser to transform Ontonotes 5 into a simple JSON format. In this format, each annotated sentence is represented as a dictionary with five keys: text, morphology, syntax, entities, and language. In their’s turn, morphology, syntax, and entities are specified as dictionaries too, where each dictionary describes labels (part-of-speech labels, syntactical tags, or entity classes) and their bounds in the corresponded text.
You can read more detailed information about this Ontonotes 5 parser in the small documentation https://github.com/nsu-ai/ontonotes-5-parsing/blob/master/readme.md
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ontonotes-5-parsing-0.0.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | e738739cddbce514c616a70d91df21fad30eee62b454e3a6d66f499afea899ae |
|
MD5 | 0708b5e1daa4b5689f8c7e2792125602 |
|
BLAKE2b-256 | b6ccb50f8896a53d1feb944eaace7e2305f04ae5ffddfb901e644e496747b963 |
Hashes for ontonotes_5_parsing-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 421f1a306a3b0f856f126d15df16cef435388fef2edde680814aa3e8f5cdee2c |
|
MD5 | 77989fe49dd79a1ad04405c35ff32aa2 |
|
BLAKE2b-256 | 4bb143db934bda9472bcc129f634963f3f2c0cf636fa3bbd3c8935b4d852adce |