bllipparser

Python bindings for the BLLIP natural language parser

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Natural Language
- English
Operating System
- POSIX
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

The BLLIP parser (also known as the Charniak-Johnson parser or Brown Reranking Parser) is described in the paper Charniak and Johnson (Association of Computational Linguistics, 2005). This code provides a Python interface to the parser. Note that it does not contain any parsing models which must be downloaded separately (for example, WSJ self-trained parsing model). The primary maintenance for the parser takes place at GitHub.

Basic usage

The easiest way to construct a parser is with the load_unified_model_dir class method. A unified model is a directory that contains two subdirectories: parser/ and reranker/, each with the respective model files:

>>> from bllipparser import RerankingParser, tokenize
>>> rrp = RerankingParser.load_unified_model_dir('/path/to/model/')

Parsing a single sentence and reading information about the top parse with parse(). The parser produces an n-best list of the n most likely parses of the sentence (default: n=50). Typically you only want the top parse, but the others are available as well:

>>> nbest_list = rrp.parse('This is a sentence.')

Getting information about the top parse:

>>> print repr(nbest_list[0])
ScoredParse('(S1 (S (NP (DT This)) (VP (VBZ is) (NP (DT a) (NN sentence))) (. .)))', parser_score=-29.621201629004183, reranker_score=-7.9273829816098731)
>>> print nbest_list[0].ptb_parse
(S1 (S (NP (DT This)) (VP (VBZ is) (NP (DT a) (NN sentence))) (. .)))
>>> print nbest_list[0].parser_score
-29.621201629
>>> print nbest_list[0].reranker_score
-7.92738298161
>>> print len(nbest_list)
50

If you have an existing tokenizer, tokenization can also be specified by passing a list of strings:

>>> nbest_list = rrp.parse(['This', 'is', 'a', 'pretokenized', 'sentence', '.'])

The reranker can be disabled by setting rerank=False:

>>> nbest_list = rrp.parse('Parser only!', rerank=False)

Parsing text with existing POS tag (soft) constraints. In this example, token 0 (‘Time’) should have tag VB and token 1 (‘flies’) should have tag NNS:

>>> rrp.parse_tagged(['Time', 'flies'], possible_tags={0 : 'VB', 1 : 'NNS'})[0]
ScoredParse('(S1 (NP (VB Time) (NNS flies)))', parser_score=-53.94938875760073, reranker_score=-15.841407102717749)

You don’t need to specify a tag for all words: token 0 (‘Time’) should have tag VB and token 1 (‘flies’) is unconstrained:

>>> rrp.parse_tagged(['Time', 'flies'], possible_tags={0 : 'VB'})[0]
ScoredParse('(S1 (S (VP (VB Time) (NP (VBZ flies)))))', parser_score=-54.390430751112156, reranker_score=-17.290145080887005)

You can specify multiple tags for each token: token 0 (‘Time’) should have tag VB, JJ, or NN and token 1 (‘flies’) is unconstrained:

>>> rrp.parse_tagged(['Time', 'flies'], possible_tags={0 : ['VB', 'JJ', 'NN']})[0]
ScoredParse('(S1 (NP (NN Time) (VBZ flies)))', parser_score=-42.82904107213723, reranker_score=-12.865900776775314)

Use this if all you want is a tokenizer:

>>> tokenize("Tokenize this sentence, please.")
['Tokenize', 'this', 'sentence', ',', 'please', '.']

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Natural Language
- English
Operating System
- POSIX
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

2021.11.7

Nov 7, 2021

2016.9.11

Sep 12, 2016

2015.12.3

Dec 4, 2015

2015.08.18

Aug 18, 2015

2015.08.15

Aug 16, 2015

2015.07.23

Jul 23, 2015

2015.07.08

Jul 8, 2015

2015.01.11

Jan 12, 2015

2014.08.29b pre-release

Aug 29, 2014

2014.02.09

Feb 9, 2014

This version

2013.10.16-1

Oct 17, 2013

2013.10.16

Oct 16, 2013

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bllipparser-2013.10.16-1.tar.gz (475.6 kB view hashes)

Uploaded Oct 17, 2013 Source

Hashes for bllipparser-2013.10.16-1.tar.gz

Hashes for bllipparser-2013.10.16-1.tar.gz
Algorithm	Hash digest
SHA256	`9ca12eb677ed18db293d08f1c895406eeb405ff4bdfc864a6e04ea7efe06833d`
MD5	`3b9c8eb5197e8cb51e768a0e0555face`
BLAKE2b-256	`711e4fd93b81bc3dddc5b35fa33888b4ab33145ebce7d5ce78f7baec794f2465`