lexicalrichness

A small module to compute textual lexical richness

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Project description

LexicalRichness

A small python module to compute textual lexical richness measures

Installation

$ pip install lexicalrichness

Quickstart

>>> import lexicalrichness

# Generate object of readability statistics.
>>> text = """Measure of textual lexical diversity, computed as the mean length of sequential words in
                a text that maintains a minimum threshold TTR score.

                Iterates over words until TTR scores falls below a threshold, then increase factor
                counter by 1 and start over. McCarthy and Jarvis (2010, pg. 385) recommends a factor
                threshold in the range of [0.660, 0.750].
                (McCarthy 2005, McCarthy and Jarvis 2010)"""

# instantiate new text object (use use_TextBlob=True argument to use the textblob tokenizer)
>>> lex = lexicalrichness(text)

# Return word count.
>>> lex.words
57

# Return (unique) term count.
>>> lex.terms
39

# Return type-token ratio (TTR) of text.
>>> lex.ttr
0.6842105263157895

# Return root type-token ratio (RTTR) of text.
>>> lex.rttr
5.165676192553671

# Return corrected type-token ratio (CTTR) of text.
>>> lex.cttr
3.6526846651686067

# Return mean segmental type-token ratio (MSTTR).
>>> lex.msttr(segment_window=25)
0.88

# Return moving average type-token ratio (MATTR).
>>> lex.mattr(window_size=25)
0.8351515151515151

# Return Measure of Textual Lexical Diversity (MTLD).
>>> lex.mtld(threshold=0.72)
46.79226361031519

# Return hypergeometric distribution diversity (HD-D) measure.
>>> lex.hdd(draws=42)
0.7468703323966486

Attributes and properties

wordlist	list of words
words	number of words (w)
terms	number of unique terms (t)
tokenizer	tokenizer used
ttr	type-token ratio computed as t / w (Chotlos 1944, Templin 1957)
rttr	root TTR computed as t / sqrt(w) (Guiraud 1954, 1960)
cttr	corrected TTR computed as t / sqrt(2w) (Carrol 1964)
Herdan	log(t) / log(w) (Herdan 1960, 1964)
Summer	log(log(t)) / log(log(w)) Summer (1966)
Dugast	(log(w) ** 2) / (log(w) - log(t) Dugast (1978)
Maas	(log(w) - log(t)) / (log(w) ** 2) Maas (1972)

Methods

msttr	Mean segmental TTR (Johnson 1944)
mattr	Moving average TTR (Covington 2007, Covington and McFall 2010)
mtld	Measure of Lexical Diversity (McCarthy 2005, McCarthy and Jarvis 2010)
hdd	HD-D (McCarthy and Jarvis 2007)

Assessing method docstrings

>>> import inspect

# docstring for hdd (HD-D)
>>> print(inspect.getdoc(LexicalRichness.hdd))

Hypergeometric distribution diversity (HD-D) score.

For each term (t) in the text, compute the probabiltiy (p) of getting at least one appearance
of t with a random draw of size n < N (text size). The contribution of t to the final HD-D
score is p * (1/n). The final HD-D score thus sums over p * (1/n) with p computed for
each term t. Described in McCarthy and Javis 2007, p.g. 465-466.
(McCarthy and Jarvis 2007)

Parameters
__________
draws: int
    Number of random draws in the hypergeometric distribution (default=42).

Returns
_______
float

History

0.1.0 (2018-05-09)

First release on PyPI.

0.1.1 (2018-05-09)

Changed lexicalrichness filename to LexicalRichness

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

0.5.1

Aug 27, 2023

0.5.0

Mar 6, 2023

0.4.1

Feb 7, 2023

0.4.0

Jan 11, 2023

0.3.1

Dec 25, 2022

0.3.0

Oct 29, 2022

0.2.0

Aug 20, 2022

0.1.10

Aug 20, 2022

0.1.9

Jun 4, 2022

0.1.6

Jun 3, 2022

0.1.4

Nov 13, 2021

0.1.3

May 27, 2018

0.1.2

May 9, 2018

This version

0.1.1

May 9, 2018

0.1.0

May 9, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lexicalrichness-0.1.1.tar.gz (11.5 kB view hashes)

Uploaded May 9, 2018 Source

Hashes for lexicalrichness-0.1.1.tar.gz

Hashes for lexicalrichness-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`a5536d40985ad79f9d891acb6266ed2d3dfd838fba92bf706faf3f8d2293435f`
MD5	`1b7fb4353482c49e6088f5d578cb92c9`
BLAKE2b-256	`5cf257554457e97666a9bef85e762688f368c6654a1921afa5cc118ae5a470b4`