prospecting

predict lead score for list of prospects

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

![status](https://img.shields.io/badge/status-wip-lightgrey.svg) ![license](https://img.shields.io/badge/license-MIT-blue.svg)



# ⚒ Prospecting ⚒ 

This project started as an effort to predict a 'prospect score' for each business in a list of current and (predominantly) potential customers.

While the initial goal was to provide a list to help prioritize sales opportunities (ex. rank order prospects by state), I also had some ideas about tying in Google Sheets to help with my typical ML workflow (data profiling > clean/transform > performance reporting > delivery of final predictions > revisiting column treatments >> etc). OAuth 2.0 is used for Google API authentication when using the `SheetsApi` and `DriveApi` classes, and the usual Sheets sharing options exist if you want to invite collaborators.

I'll be updating this README and documentation in general...In the interim - as an example of how Google Sheets is used, the following table outlines the spreadsheets and tabs which I have found to be useful. While I cannot share my original prospecting dataset, I used an old [Innocentive challenge dataset](https://github.com/reidbradley/prospecting/blob/master/data/README.md) as an example.

| spreadsheet | sheet | note
| --- | --- | ---
| [**projectname_metadata**](https://docs.google.com/spreadsheets/d/17R9V5tefzFzMXBi2i9SOybhqwzF7PSlse9OO99BfDxQ/) | _metadata_ | Control logic for column processing treatments; used by Python to inform how each column is processed. The functions in `process.py` rely on information from this tab.
| | _raw_descr_ | Descriptive information about raw data (`df_raw`)
| | _clean_descr_ | Descriptive information about cleaned dataset (`df_clean`)
| --- | --- | ---
| [**projectname_model_reporting**](https://docs.google.com/spreadsheets/d/1dG5lQfqthqshz45Rs94VLSSWmSrS60b1iw7cT4Rqevs/) | _session_report_ | Summarizes model performance, plan to make this the main performance tab. A "session" represents an instance of a "ModelSession" class instance which is used to share access to train/test sets.
| | _cv_results_ | If GridSearchCV is used, the `GridSearchCV.cv_results_` reports are saved here (shows performance by fold for each parameter set evaluated)
| | _model_types_ | A simple lookup table, used by Python script as a reference when building the report for the `session_report` tab
| | _\_plots_ | <a href="https://docs.google.com/spreadsheets/d/1dG5lQfqthqshz45Rs94VLSSWmSrS60b1iw7cT4Rqevs/pubchart?oid=1358454056&format=interactive"><img src="https://docs.google.com/spreadsheets/d/1dG5lQfqthqshz45Rs94VLSSWmSrS60b1iw7cT4Rqevs/pubchart?oid=1358454056&format=image" alt="performance report" height="115px"></a> <a href="https://docs.google.com/spreadsheets/d/1dG5lQfqthqshz45Rs94VLSSWmSrS60b1iw7cT4Rqevs/pubchart?oid=6448021&format=interactive"><img src="https://docs.google.com/spreadsheets/d/1dG5lQfqthqshz45Rs94VLSSWmSrS60b1iw7cT4Rqevs/pubchart?oid=6448021&format=image" alt="performance report subset" height="115px"></a>
| --- | --- | ---
| **projectname_predictions** | _predictions_ | Final predictions, with probabilities
| | _lookupmaster_ | A lookup table with master list of prospects / entities of interest, or misc information to join with predictions
| | _README_ | Intend to use as an FYI tab, to provide overview of health of predictions made (ex. highlight number of correct/incorrect predictions, etc)



## Overview

* The project directory contains:
```
.
â”œâ”€â”€ .dockerignore
â”œâ”€â”€ .gitattributes # For CRLF correction
â”œâ”€â”€ .gitignore
â”œâ”€â”€ credentials/ # Not necessarily best practice, but convenient
â”‚Â Â â”œâ”€â”€ README.md
â”‚Â Â â””â”€â”€ certs/
â”‚ â””â”€â”€ README.md
â”œâ”€â”€ data/
â”‚Â Â â”œâ”€â”€ README.md
â”‚Â Â â””â”€â”€ tmp/ # Logs saved here
â”‚ â”œâ”€â”€ README.md
â”‚ â””â”€â”€ joblib/ # Used by scikit learn when running in Docker container
â”‚ â””â”€â”€ README.md
â”œâ”€â”€ Dockerfile # See README_detail.md for more info
â”œâ”€â”€ LICENSE.md
â”œâ”€â”€ jupyter_notebook_config.py # See README_detail.md for more info
â”œâ”€â”€ mplimporthook.py # Used by Dockerfile
â”œâ”€â”€ notebooks/ # Jupyter Notebooks
â”œâ”€â”€ prospecting
â”‚Â Â â”œâ”€â”€ __init__.py
â”‚Â Â â”œâ”€â”€ env.py # Check here for environment variables required
â”‚Â Â â”œâ”€â”€ utils.py
â”‚Â Â â”œâ”€â”€ api.py # Google Sheets and Google Drive API classes
â”‚Â Â â”œâ”€â”€ process.py # Data cleaning functions, relies on info in metadata tab
â”‚Â Â â”œâ”€â”€ model.py
â”‚Â Â â”œâ”€â”€ report.py
â”‚Â Â â”œâ”€â”€ errors.py
â”‚Â Â â””â”€â”€ version.py
â”œâ”€â”€ README.md
â”œâ”€â”€ requirements_nonconda.txt # Used by Dockerfile
â”œâ”€â”€ scripts
â”‚Â Â â””â”€â”€ hash_jupyter_pw.py # Create hashed password to use with Docker container
â”œâ”€â”€ start-notebook.sh # Used by Dockerfile
â”œâ”€â”€ start.sh # Used by Dockerfile
â””â”€â”€ start-singleuser.sh # Used by Dockerfile
```
⚒ ⚒ ⚒

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.5

Aug 14, 2017

This version

0.1.2

Feb 13, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

prospecting-0.1.2-py3-none-any.whl (23.1 kB view hashes)

Uploaded Feb 13, 2017 Python 3

Hashes for prospecting-0.1.2-py3-none-any.whl

Hashes for prospecting-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6a3b9a0cfb2693ccbd46363e337dafa12286af5b1a97139f5c789da4a9bb1796`
MD5	`49ec3372b57b3744f1d87339db724e8d`
BLAKE2b-256	`6d021f2617131a856aaf5a78e8f818fd93f8c44d400b3edd6984cfb657795348`