Skip to main content

A maximum-strength name parser for record linkage.

Project description

Nominally Logo

nominally: a maximum-strength name parser for record linkage

License: AGPL 3.0+ Distributed via PyPI Maintainability rated at Code Climate Builds at CircleCI Test coverage at Coveralls Documentation at Read the Docs Latest commit at GitHub

🔗 Names

Nominally simplifies and parses a personal name written in Western name order into six core fields: title, first, middle, last, suffix, and nickname.

Typically, nominally is used to parse entire lists or pd.Series of names en masse. This package includes a command line tool to parse a single name for convenient one-off testing and examples.

Nominally produces fields intended for comparisons between or within datasets. As such, names come out formatted for data without regard to human syntactic preference: de von ausfern, mr johann g rather than Mr. Johann G. de von Ausfern.

📓 Getting Started

Call parse_name() to parse out the six core fields:

$ python -q
>>> from nominally import parse_name
>>> parse_name("Blankinsop, Jr., Mr. James 'Jimmy'")
{
  'title': 'mr',
  'first': 'james',
  'middle': '',
  'last': 'blankinsop',
  'suffix': 'jr',
  'nickname': 'jimmy'
}

Dive into the Name class to parse and recreate a string...

>>> from nominally import Name
>>> n = Name("DR. PEACHES BARTKOWICZ")
>>> n
Name({'title': 'dr', 'first': 'peaches', 'middle': '', 'last': 'bartkowicz', 'suffix': '', 'nickname': ''})
>>> str(n)
'bartkowicz, dr peaches'

...or use the dict...

>>> dict(n)
{
  'title': 'dr',
  'first': 'peaches',
  'middle': '',
  'last': 'bartkowicz',
  'suffix': '',
  'nickname': ''
}
>>> list(n.values())
['dr', 'peaches', '', 'bartkowicz', '', '']

...or retrieve a more elaborate set of attributes...

>>> n.report()
{
  'raw': 'DR. PEACHES BARTKOWICZ',
  'cleaned': {'dr peaches bartkowicz'},
  'parsed': 'bartkowicz, dr peaches',
  'list': ['dr', 'peaches', '', 'bartkowicz', '', ''],
  'title': 'dr',
  'first': 'peaches',
  'middle': '',
  'last': 'bartkowicz',
  'suffix': '',
  'nickname': ''
}

...or capture individual attributes.

>>> n.first
'peaches'
>>> n['last']
'bartkowicz'
>>> n.get('title')
'dr'
>>> n.raw
'DR. PEACHES BARTKOWICZ'

🖥️ Command Line

For a quick report, invoke the nominally command line tool:

$ nominally "DR. PEACHES BARTKOWICZ"
       raw: DR. PEACHES BARTKOWICZ
   cleaned: dr. peaches bartkowicz
    parsed: bartkowicz, dr peaches
      list: ['dr', 'peaches', '', 'bartkowicz', '', '']
     title: dr
     first: peaches
    middle:
      last: bartkowicz
    suffix:
  nickname:

🔬 Worked Examples

Binder hosts live Jupyter notebooks walking through examples of nominally.

     csv.ipynb on mybinder.org

     pandas_simple.ipynb on mybinder.org

These notebooks and additional examples reside in the Nominally Examples repository.

🧙‍ Author

Matt VanEseltine

https://pypi.org/user/matvan/

matvan@umich.edu

https://github.com/vaneseltine

https://twitter.com/vaneseltine

https://stackoverflow.com/users/7846185/matt-vaneseltine

💡 Acknowledgements

Nominally started as a fork of the python-nameparser package, and has benefitted considerably from this origin⸺especially the wealth of examples and tests developed for python-nameparser.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nominally-1.1.0.tar.gz (29.0 kB view hashes)

Uploaded Source

Built Distribution

nominally-1.1.0-py3-none-any.whl (33.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page