Skip to main content

Geograpy ======== Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city. Install & Setup --------------- Grab the package using ``pip`` (this will take a few minutes) :: pip install geograpy Geograpy uses `NLTK <http://www.nltk.org/>`__ for entity recognition, so you'll also need to download the models we're using. Fortunately there's a command that'll take care of this for you. :: geograpy-nltk Basic Usage ----------- Import the module, give some text or a URL, and presto. :: import geograpy url = 'http://www.bbc.com/news/world-europe-26919928' places = geograpy.get_place_context(url=url) Now you have access to information about all the places mentioned in the linked article. - ``places.countries`` *contains a list of country names* - ``places.regions`` *contains a list of region names* - ``places.cities`` *contains a list of city names* - ``places.other`` *lists everything that wasn't clearly a country, region or city* Note that the ``other`` list might be useful for shorter texts, to pull out information like street names, points of interest, etc, but at the moment is a bit messy when scanning longer texts that contain possessive forms of proper nouns (like "Russian" instead of "Russia"). But Wait, There's More ---------------------- In addition to listing the names of discovered places, you'll also get some information about the relationships between places. - ``places.country_regions`` *regions broken down by country* - ``places.country_cities`` *cities broken down by country* - ``places.address_strings`` *city, region, country strings useful for geocoding* Last But Not Least ------------------ While a text might mention many places, it's probably focused on one or two, so Geograpy also breaks down countries, regions and cities by number of mentions. - ``places.country_mentions`` - ``places.region_mentions`` - ``places.city_mentions`` Each of these returns a list of tuples. The first item in the tuple is the place name and the second item is the number of mentions. For example: :: [('Russian Federation', 14), (u'Ukraine', 11), (u'Lithuania', 1)] If You're Really Serious ------------------------ You can of course use each of Geograpy's modules on their own. For example: :: from geograpy import extraction e = extraction.Extractor(url='http://www.bbc.com/news/world-europe-26919928') e.find_entities() # You can now access all of the places found by the Extractor print e.places Place context is handled in the ``places`` module. For example: :: from geograpy import places pc = places.PlaceContext(['Cleveland', 'Ohio', 'United States']) pc.set_countries() print pc.countries #['United States'] pc.set_regions() print pc.regions #['Ohio'] pc.set_cities() print pc.cities #['Cleveland'] print pc.address_strings #['Cleveland, Ohio, United States'] And of course all of the other information shown above (``country_regions`` etc) is available after the corresponding ``set_`` method is called. Credits ------- Geograpy uses the following excellent libraries: - `NLTK <http://www.nltk.org/>`__ for entity recognition - `newspaper <https://github.com/codelucas/newspaper>`__ for text extraction from HTML - `jellyfish <https://github.com/sunlightlabs/jellyfish>`__ for fuzzy text match - `pycountry <https://pypi.python.org/pypi/pycountry>`__ for country/region lookups Geograpy uses the following data sources: - `GeoLite2 <http://dev.maxmind.com/geoip/geoip2/geolite2/>`__ for city lookups - `ISO3166ErrorDictionary <https://github.com/bodacea/countryname/blob/master/countryname/databases/ISO3166ErrorDictionary.csv>`__ for common country mispellings *via `Sara-Jayne Terp <https://github.com/bodacea>`__* Hat tip to `Chris Albon <https://github.com/chrisalbon>`__ for the name.

Project description

Geograpy
========

Extract place names from a URL or text, and add context to those names -- for
example distinguishing between a country, region or city.

## Install & Setup

Grab the package using `pip` (this will take a few minutes)

pip install geograpy

Geograpy uses [NLTK](http://www.nltk.org/) for entity recognition, so you'll also need
to download the models we're using. Fortunately there's a command that'll take
care of this for you.

geograpy-nltk

## Basic Usage

Import the module, give some text or a URL, and presto.

import geograpy
url = 'http://www.bbc.com/news/world-europe-26919928'
places = geograpy.get_place_context(url=url)

Now you have access to information about all the places mentioned in the linked
article.

* `places.countries` _contains a list of country names_
* `places.regions` _contains a list of region names_
* `places.cities` _contains a list of city names_
* `places.other` _lists everything that wasn't clearly a country, region or city_

Note that the `other` list might be useful for shorter texts, to pull out
information like street names, points of interest, etc, but at the moment is
a bit messy when scanning longer texts that contain possessive forms of proper
nouns (like "Russian" instead of "Russia").

## But Wait, There's More

In addition to listing the names of discovered places, you'll also get some
information about the relationships between places.

* `places.country_regions` _regions broken down by country_
* `places.country_cities` _cities broken down by country_
* `places.address_strings` _city, region, country strings useful for geocoding_

## Last But Not Least

While a text might mention many places, it's probably focused on one or two, so
Geograpy also breaks down countries, regions and cities by number of mentions.

* `places.country_mentions`
* `places.region_mentions`
* `places.city_mentions`

Each of these returns a list of tuples. The first item in the tuple is the place
name and the second item is the number of mentions. For example:

[('Russian Federation', 14), (u'Ukraine', 11), (u'Lithuania', 1)]

## If You're Really Serious

You can of course use each of Geograpy's modules on their own. For example:

from geograpy import extraction

e = extraction.Extractor(url='http://www.bbc.com/news/world-europe-26919928')
e.find_entities()

# You can now access all of the places found by the Extractor
print e.places

Place context is handled in the `places` module. For example:

from geograpy import places

pc = places.PlaceContext(['Cleveland', 'Ohio', 'United States'])

pc.set_countries()
print pc.countries #['United States']

pc.set_regions()
print pc.regions #['Ohio']

pc.set_cities()
print pc.cities #['Cleveland']

print pc.address_strings #['Cleveland, Ohio, United States']

And of course all of the other information shown above (`country_regions` etc)
is available after the corresponding `set_` method is called.


## Credits

Geograpy uses the following excellent libraries:

* [NLTK](http://www.nltk.org/) for entity recognition
* [newspaper](https://github.com/codelucas/newspaper) for text extraction from HTML
* [jellyfish](https://github.com/sunlightlabs/jellyfish) for fuzzy text match
* [pycountry](https://pypi.python.org/pypi/pycountry) for country/region lookups

Geograpy uses the following data sources:

* [GeoLite2](http://dev.maxmind.com/geoip/geoip2/geolite2/) for city lookups
* [ISO3166ErrorDictionary](https://github.com/bodacea/countryname/blob/master/countryname/databases/ISO3166ErrorDictionary.csv) for common country mispellings _via [Sara-Jayne Terp](https://github.com/bodacea)_

Hat tip to [Chris Albon](https://github.com/chrisalbon) for the name.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geograpy-0.2.4.tar.gz (1.3 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page