nameparser

A simple Python module for parsing human names into their individual components.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

A simple Python module for parsing human names into their individual components.

Attributes

HumanName.title

HumanName.first

HumanName.middle

HumanName.last

HumanName.suffix

Supports 3 comma placement variations for names of people in latin-based languages.

Title Firstname Middle Middle Lastname Suffix

Lastname, Title Firstname Middle Middle[,] Suffix [, Suffix]

Title Firstname M Lastname, Suffix [, Suffix]

Examples:

Doe-Ray, Col. John A. Jérôme III

Dr. Juan Q. Xavier de la Vega II

Juan Q. Xavier Velasquez y Garcia, Jr.

Capitalization Support

The HumanName class can try to guess the correct capitalization of name entered in all upper or lower case. It will not adjust the case of names entered in mixed case.

bob v. de la macdole-eisenhower phd -> Bob V. de la MacDole-Eisenhower Ph.D.

Over 100 unit tests with example names. Should be unicode safe but it’s fairly untested. Post a ticket and/or for names that fail and I will try to fix it.

HumanName instances will pass an equals (==) test if their lower case unicode representations are the same.

Output Format

The format of the strings returned with unicode() can be adjusted using standard python string formatting. The string’s format() method will be passed a dictionary of names.

>>> name = HumanName("Rev John A. Kenneth Doe III")
>>> unicode(name)
"Rev John A. Kenneth Doe III"
>>> name.string_format = "{last}, {title} {first} {middle}, {suffix}"
>>> unicode(name)
"Doe, Rev John A. Kenneth, III"

Usage

>>> from nameparser import HumanName
>>> name = HumanName("Dr. Juan Q. Xavier de la Vega III")
>>> name.title
u'Dr.'
>>> name.first
u'Juan'
>>> name.middle
u'Q. Xavier'
>>> name.last
u'de la Vega'
>>> name.suffix
u'III'
>>> name.full_name = "Doe-Ray, Col. John A. Jérôme III"
>>> name.title
u'Col.'
>>> name.first
u'John'
>>> name.middle
u'A. Jérôme'
>>> name.last
u'Doe-Ray'
>>> name.suffix
u'III'
>>> name.full_name = "Juan Q. Xavier Velasquez y Garcia, Jr."
>>> name.title
u''
>>> name.first
u'Juan'
>>> name.middle
u'Q. Xavier'
>>> name.last
u'Velasquez y Garcia'
>>> name.suffix
u'Jr.'
>>> name.middle = "Jason Alexander"
>>> name.middle
u'Jason Alexander'
>>> name
<HumanName : [
    Title: ''
    First: 'Juan'
    Middle: 'Jason Alexander'
    Last: 'Velasquez y Garcia'
    Suffix: 'Jr.'
]>
>>> name = HumanName("Dr. Juan Q. Xavier de la Vega III")
>>> name2 = HumanName("de la vega, dr. juan Q. xavier III")
>>> name == name2
True
>>> len(name)
5
>>> list(name)
['Dr.', 'Juan', 'Q. Xavier', 'de la Vega', 'III']
>>> name[1:-1]
[u'Juan', u'Q. Xavier', u'de la Vega']
>>> name = HumanName('bob v. de la macdole-eisenhower phd')
>>> name.capitalize()
>>> unicode(name)
u'Bob V. de la MacDole-Eisenhower Ph.D.'
>>> # Don't touch good names
>>> name = HumanName('Shirley Maclaine')
>>> name.capitalize()
>>> unicode(name)
u'Shirley Maclaine'

Customizing the Parser with Your Own Constants

Recognition of titles, prefixes, suffixes and conjunctions is provided by matching the lower case characters of a name piece with pre-defined sets located in nameparser.constants. You can adjust them to suite your needs by passing your own set of constants when instantiating a new HumanName object. Be sure to use the lower case representation with no punctuation.

prefixes_c = PREFIXES

titles_c = TITLES

suffixes_c = SUFFIXES

conjunctions_c = CONJUNCTIONS

capitalization_exceptions_c = CAPITALIZATION_EXCEPTIONS

Example

>>> from nameparser import HumanName
>>> from nameparser.constants import PREFIXES
>>>
>>> prefixes_c = PREFIXES | set(['te'])
>>> hn = HumanName(prefixes_c=prefixes_c)
>>> hn.full_name = "Te Awanui-a-Rangi Black"
>>> hn
<HumanName : [
    Title: ''
    First: 'Te Awanui-a-Rangi'
    Middle: ''
    Last: 'Black'
    Suffix: ''
]>

Contributing via Google Code

Feel free to post new issues to the Google Code project. The easiest way to submit changes is to create a clone of the Google project and commit changes to your clone with mercurial. I’ll happily pull changes that include tests from any clone. Create your clone here:

http://code.google.com/p/python-nameparser/source/clones

Then checkout your clone:

hg clone https://code.google.com/r/your-clone-name

Make your changes, add your tests, then push them to your clone.

hp push -b default

Then file a pull request in Google Code. To pull new changes from the canonical repository and apply them to your working directory:

hg pull -u https://code.google.com/r/python-nameparser

Testing

Run tests.py to see if your changes broke anything.

./tests.py

You can also pass a string as the first argument to see how a specific name will be parsed.

$ ./tests.py "Secretary of State Hillary Rodham-Clinton"
<HumanName : [
    Title: 'Secretary of State'
    First: 'Hillary'
    Middle: ''
    Last: 'Rodham-Clinton'
    Suffix: ''
]>

Naming Practices and Resources

US_Census_Surname_Data_2000

Naming_practice_guide_UK_2006

Wikipedia_Naming_conventions

Wikipedia_List_Of_Titles

Release Log

0.2.8 - Oct 25, 2013

Add support for Python 3.3+. Thanks to @corbinbs.

0.2.7 - Feb 13, 2013

Fix bug with multiple conjunctions in title

add legal and crown titles

0.2.6 - Feb 12, 2013

Fix python 2.6 import error on logging.NullHandler

0.2.5 - Feb 11, 2013

Set logging handler to NullHandler

Remove ‘ben’ from PREFIXES because it’s more common as a name than a prefix.

Deprecate BlankHumanNameError. Do not raise exceptions if full_name is empty string.

0.2.4 - Feb 10, 2013

Adjust logging, don’t set basicConfig. Fix #10 and #26.

Fix handling of single lower case initials that are also conjunctions, e.g. “john e smith”. Re #11.

Fix handling of initials with no space separation, e.g. “E.T. Jones”. Fix #11.

Do not remove period from first name, when present.

Remove ‘e’ from PREFIXES because it is handled as a conjunction.

Python 2.7+ required to run the tests. Mark known failures.

tests/test.py can now take an optional name argument that will return repr() for that name.

0.2.3 - Fix overzealous “Mac” regex

0.2.2 - Fix parsing error

0.2.0

Significant refactor of parsing logic. Handle conjunctions and prefixes before parsing into attribute buckets.

Support attribute overriding by assignment.

Support multiple titles.

Lowercase titles constants to fix bug with comparison.

Move documentation to README.rst, add release log.

0.1.4 - Use set() in constants for improved speed. setuptools compatibility - sketerpot

0.1.3 - Add capitalization feature - twotwo

0.1.2 - Add slice support

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.1.3

Sep 21, 2023

1.1.2

Nov 14, 2022

1.1.1

Jan 29, 2022

1.1.0

Jan 4, 2022

1.0.6

Feb 8, 2020

1.0.5

Dec 12, 2019

1.0.4

Jun 27, 2019

1.0.3

Apr 19, 2019

1.0.2

Oct 26, 2018

1.0.1

Sep 1, 2018

1.0.0

Aug 31, 2018

0.5.8

Aug 20, 2018

0.5.7

Jun 16, 2018

0.5.6

Jan 15, 2018

0.5.5

Jan 11, 2018

0.5.4

Dec 7, 2017

0.5.3

Jun 28, 2017

0.5.2

Mar 20, 2017

0.5.1

Aug 12, 2016

0.5.0

Aug 10, 2016

0.4.1

Jul 26, 2016

0.4.0

Jun 2, 2016

0.3.16

Mar 24, 2016

0.3.15

Mar 21, 2016

0.3.14

Mar 19, 2016

0.3.13

Mar 15, 2016

0.3.12

Mar 14, 2016

0.3.11

Oct 18, 2015

0.3.10

Sep 20, 2015

0.3.9

Sep 5, 2015

0.3.8

Sep 3, 2015

0.3.7

Aug 31, 2015

0.3.6

Aug 6, 2015

0.3.5

Aug 4, 2015

0.3.4

Mar 2, 2015

0.3.3

Aug 4, 2014

0.3.2

Jul 17, 2014

0.3.1

Jul 5, 2014

0.3.0

Jul 4, 2014

0.2.10

May 17, 2014

0.2.9

Apr 2, 2014

This version

0.2.8

Oct 25, 2013

0.2.7

Feb 14, 2013

0.2.6

Feb 13, 2013

0.2.5

Feb 12, 2013

0.2.4

Feb 11, 2013

0.2.3

Oct 7, 2012

0.2.2

Aug 24, 2012

0.2.0

Jan 16, 2012

0.1.4

Jan 13, 2012

0.1.3

Feb 4, 2011

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nameparser-0.2.8.tar.gz (11.8 kB view hashes)

Uploaded Oct 25, 2013 Source

Hashes for nameparser-0.2.8.tar.gz

Hashes for nameparser-0.2.8.tar.gz
Algorithm	Hash digest
SHA256	`3c02900fa66280d8e414a25ce1c43c18d9108b92d50c9fafa2ae2aa31d7b318c`
MD5	`e497a1875dcd4be1c65a63a0ebf8b096`
BLAKE2b-256	`96dcf0c71e8b1643180454f6f40000ba4b016fcba6f0e9ab94e105adfac59f05`