skip to navigation
skip to content

ctnamecleaner 0.10.1

Replace village names and commonly-misspelled Connecticut town names with real town/city names.

# CT Name Cleaner

Resolve village and coloquial Connecticut town names, as well as common misspellings of Connecticut town names to their official town names.

This is based on an R package of the same name by my colleague Andrew Ba Tran.

This installs a command line script, ctclean, as well as a library particularly meant for use within Jupyter notebooks.

by Jake Kara,

### Latest version


### Installation

pip install ctnamecleaner

### Command line util


$ ctclean NewPreston WASHINGTON $ ctclean “New Preston” WASHINGTON

When nothing is found, return None:

$ ctclean NotGonnaFindItsVille None

Set a custom value to return on error with the –error or -e flag:

$ ctclean NotGonnaFindItsVille –error “Ruh Roh” Ruh Roh

### Use with Pandas dataframes

See HELP.txt in this directory and the Notebook in the demo/ folder in this repo for an example of translating an entire column with the clean, clean_col and the clean_dataframe() method. clean_dataframe uses pandas’ DataFrame.join() method, so it’s faster than using the cean() method and applying it with a lambda function yourself.

### Extending with other data

Not in CT? Want to map other things? Just make a spreadsheet and put it anywhere, online or locally, that Pandas .read_csv() can open, and then use the constructor to customize the lookup class.

>>> l = lookup.Lookup(csv_url="http://path/to/your/sheet",

### Contents of HELP.txt

Below this point is auto documentation from the lookup class generated from

Help on module ctlookup.lookup in ctlookup:

ctlookup.lookup - Main module for CT Name Cleaner


class Lookup
Lookup class for CT place names, or any other DF for that matter

Methods defined here:

__init__(self, raw_name_col=’name’, clean_name_col=’’, csv_url=None, use_inet_csv=False)
Constructor for Lookup

No need to use parameters unless you are specifying a different
source URL.

raw_name_col : string, optional
The name of the column with input names, like “New Preston”

Only use if you’re using a different source spreadsheet.

clean_name_col : string, optional
The name of the column with out names, like “Washington”

Only use if you’re using a different source spreadsheet.

csv_url : string, optional
A valid local file or remote url to use as an alternative
source spreadsheet.

use_inet_csv : boolean, optional
Force a reload of the spreadsheet from the web to reflect any
new additions since it was bundled with this python package.

Defaults to False. The list doesn’t change too much anymore.

clean(self, raw_name, error=None)
Get a clean place name (e.g. input “New Preston” and get

raw_name : string
The input name of the place, such as a village or a
common misspelling of a town name

error : obj, optional
The default to return if no match is found

Defaults to None

String or the value of None (or anything specified with the error
parameter) if no match is found

clean_col(self, series, error=None)
Clean a Pandas Series of place names

series : Pandas Series
A series containing place names that need to be cleaned

error : obj, optional
Value to use if no match is found for a given place.

Defaults to None

Meant as a less opinionated version of clean_dataframe

clean_dataframe(self, df, town_col, error=None)
Clean an entire column of place names


df : Pandas DataFrame
Dataframe containing to clean

town_col : valid column label
Label of column containing town names to clean

error : obj, optional
Default value to use when no match is found.

Defaults to None

I plan to deprecate this but leave it in place for
backward-compatibility. Use clean_col instead.
File Type Py Version Uploaded on Size
ctnamecleaner-0.10.1.tar.gz (md5) Source 2018-03-03 13KB