skip to navigation
skip to content

corenlp-python 3.2.0-3

A Stanford Core NLP wrapper

Latest Version: 3.4.1-1

# A Python wrapper for the Java Stanford Core NLP tools

This is a fork of Dustin Smith's [stanford-corenlp-python]( a Python interface to [Stanford CoreNLP]( It can either use as python package, or run as a JSON-RPC server.

## Edited
* Update to Stanford CoreNLP v3.2.0
* Fix many bugs & improve performance
* Using jsonrpclib for stability and performance
* Can edit the constants as argument such as Stanford Core NLP directory
* Adjust parameters not to timeout in high load
* Fix a problem with long text input by Johannes Castner [stanford-corenlp-python](
* Packaging

## Requirements
* [pexpect](
* [unidecode](
* [jsonrpclib]( (optionally)

## Download and Usage

To use this program you must [download]( and unpack the zip file containing Stanford's CoreNLP package. By default, `` looks for the Stanford Core NLP folder as a subdirectory of where the script is being run.

In other words:

sudo pip install pexpect unidecode jsonrpclib # jsonrpclib is optional
git clone
cd corenlp-python

Then, to launch a server:

python corenlp/

Optionally, you can specify a host or port:

python corenlp/ -H -p 3456

That will run a public JSON-RPC server on port 3456.
And you can specify Stanford CoreNLP directory:

python corenlp/ -S stanford-corenlp-full-2013-06-20/

Assuming you are running on port 8080 and CoreNLP directory is `stanford-corenlp-full-2013-06-20/` in current directory, the code in `` shows an example parse:

import jsonrpclib
from simplejson import loads
server = jsonrpclib.Server("http://localhost:8080")

result = loads(server.parse("Hello world. It is so beautiful"))
print "Result", result

That returns a dictionary containing the keys `sentences` and (when applicable) `corefs`. The key `sentences` contains a list of dictionaries for each sentence, which contain `parsetree`, `text`, `tuples` containing the dependencies, and `words`, containing information about parts of speech, NER, etc:

{u'sentences': [{u'parsetree': u'(ROOT (S (VP (NP (INTJ (UH Hello)) (NP (NN world)))) (. !)))',
u'text': u'Hello world!',
u'tuples': [[u'dep', u'world', u'Hello'],
[u'root', u'ROOT', u'world']],
u'words': [[u'Hello',
{u'CharacterOffsetBegin': u'0',
u'CharacterOffsetEnd': u'5',
u'Lemma': u'hello',
u'NamedEntityTag': u'O',
u'PartOfSpeech': u'UH'}],
{u'CharacterOffsetBegin': u'6',
u'CharacterOffsetEnd': u'11',
u'Lemma': u'world',
u'NamedEntityTag': u'O',
u'PartOfSpeech': u'NN'}],
{u'CharacterOffsetBegin': u'11',
u'CharacterOffsetEnd': u'12',
u'Lemma': u'!',
u'NamedEntityTag': u'O',
u'PartOfSpeech': u'.'}]]},
{u'parsetree': u'(ROOT (S (NP (PRP It)) (VP (VBZ is) (ADJP (RB so) (JJ beautiful))) (. .)))',
u'text': u'It is so beautiful.',
u'tuples': [[u'nsubj', u'beautiful', u'It'],
[u'cop', u'beautiful', u'is'],
[u'advmod', u'beautiful', u'so'],
[u'root', u'ROOT', u'beautiful']],
u'words': [[u'It',
{u'CharacterOffsetBegin': u'14',
u'CharacterOffsetEnd': u'16',
u'Lemma': u'it',
u'NamedEntityTag': u'O',
u'PartOfSpeech': u'PRP'}],
{u'CharacterOffsetBegin': u'17',
u'CharacterOffsetEnd': u'19',
u'Lemma': u'be',
u'NamedEntityTag': u'O',
u'PartOfSpeech': u'VBZ'}],
{u'CharacterOffsetBegin': u'20',
u'CharacterOffsetEnd': u'22',
u'Lemma': u'so',
u'NamedEntityTag': u'O',
u'PartOfSpeech': u'RB'}],
{u'CharacterOffsetBegin': u'23',
u'CharacterOffsetEnd': u'32',
u'Lemma': u'beautiful',
u'NamedEntityTag': u'O',
u'PartOfSpeech': u'JJ'}],
{u'CharacterOffsetBegin': u'32',
u'CharacterOffsetEnd': u'33',
u'Lemma': u'.',
u'NamedEntityTag': u'O',
u'PartOfSpeech': u'.'}]]}],
u'coref': [[[[u'It', 1, 0, 0, 1], [u'Hello world', 0, 1, 0, 2]]]]}

Not to use JSON-RPC, load the module instead:

from corenlp import StanfordCoreNLP
corenlp_dir = "stanford-corenlp-full-2013-06-20/"
corenlp = StanfordCoreNLP(corenlp_dir) # wait a few minutes...
corenlp.raw_parse("Parse it")

If you need to parse long texts (more than 30-50 sentences), you must use a `batch_parse` function. It reads text files from input directory and returns a generator object of dictionaries parsed each file results:

from corenlp import batch_parse
corenlp_dir = "stanford-corenlp-full-2013-06-20/"
raw_text_directory = "sample_raw_text/"
parsed = batch_parse(raw_text_directory, corenlp_dir) # It returns a generator object
print parsed #=> [{'coref': ..., 'sentences': ..., 'file_name': 'new_sample.txt'}]

The function uses XML output feature of Stanford CoreNLP, and you can take all information by `raw_output` option. If true, CoreNLP's XML is returned as a dictionary without converting the format.

parsed = batch_parse(raw_text_directory, corenlp_dir, raw_output=True)

(note: The function requires xmltodict now, you should install it by `sudo pip install xmltodict`)

## Developer
* Hiroyoshi Komatsu []
* Johannes Castner []  
File Type Py Version Uploaded on Size
corenlp-python-3.2.0-3.tar.gz (md5) Source 2013-09-03 20KB