auto_ml

Automated machine learning for production and analytics

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

# auto_ml
> Get a trained and optimized machine learning predictor at the push of a button (and, admittedly, an extended coffee break while your computer does the heavy lifting and you get to claim "compiling" https://xkcd.com/303/).

## Installation

- `pip install auto_ml`

OR

- `git clone https://github.com/ClimbsRocks/auto_ml`
- `pip install -r requirements.txt`

## Getting Started

```
from auto_ml import Predictor

col_desc_dictionary = {col_to_predict: 'output'}

ml_predictor = Predictor(type_of_estimator='classifier', column_descriptions=col_desc_dictionary)
# Can pass in type_of_estimator='regressor' as well

ml_predictor.train(list_of_dictionaries)
# Wait for the machine to learn all the complex and beautiful patterns in your data...

ml_predictor.predict(new_data)
# Where new_data is also a list of dictionaries
```

### Advice

Before you go any further, try running the code. Load up some dictionaries in Python, where each dictionary is a row of data. Make a `column_descriptions` dictionary that tells us which attribute name in each row represents the value we're trying to predict. Pass all that into `auto_ml`, and see what happens!

Everything else in these docs assumes you have done at least the above. Start there and everything else will build on top. But this part gets you the output you're probably interested in, without unnecessary complexity.

## Docs

The full docs are available at https://auto_ml.readthedocs.io
Again though, I'd strongly recommend running this on an actual dataset before referencing the docs any futher.

## What this project does

Automates the whole machine learning process, making it super easy to use for both analytics, and getting real-time predictions in production.

A quick overview of buzzwords, this project automates:

- Analytics (pass in data, and auto_ml will tell you the relationship of each variable to what it is you're trying to predict).
- Feature Engineering (particularly around dates, and soon, NLP).
- Robust Scaling (turning all values into their scaled versions between the range of 0 and 1, in a way that is robust to outliers, and works with sparse matrices).
- Feature Selection (picking only the features that actually prove useful).
- Data formatting (turning a list of dictionaries into a sparse matrix, one-hot encoding categorical variables, taking the natural log of y for regression problems).
- Model Selection (which model works best for your problem).
- Hyperparameter Optimization (what hyperparameters work best for that model).
- Ensembling Subpredictors (automatically training up models to predict smaller problems within the meta problem).
- Ensembling Weak Estimators (automatically training up weak models on the larger problem itself, to inform the meta-estimator's decision).
- Big Data (feed it lots of data).
- Unicorns (you could conceivably train it to predict what is a unicorn and what is not).
- Ice Cream (mmm, tasty...).
- Hugs (this makes it much easier to do your job, hopefully leaving you more time to hug those those you care about).

#### Passing in your own feature engineering function

You can pass in your own function to perform feature engineering on the data. This will be called as the first step in the pipeline that `auto_ml` builds out.

You will be passed the entire X dataset (not the y dataset), and are expected to return the entire X dataset in the same order.

The advantage of including it in the pipeline is that it will then be applied to any data you want predictions on later. You will also eventually be able to run GridSearchCV over any parameters you include here.

Limitations:
You cannot alter the length or ordering of the X dataset, since you will not have a chance to modify the y dataset. If you want to perform filtering, perform it before you pass in the data to train on.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

2.9.10

Feb 22, 2018

2.9.9

Feb 9, 2018

2.9.8

Jan 21, 2018

2.9.7

Jan 21, 2018

2.9.6

Jan 21, 2018

2.9.5

Jan 18, 2018

2.9.4

Dec 8, 2017

2.9.3

Dec 7, 2017

2.9.2

Dec 6, 2017

2.9.1

Dec 6, 2017

2.9.0

Nov 29, 2017

2.8.5

Nov 17, 2017

2.8.4

Nov 9, 2017

2.8.3

Nov 9, 2017

2.8.2

Nov 9, 2017

2.8.1

Nov 9, 2017

2.8.0

Nov 8, 2017

2.7.7

Oct 12, 2017

2.7.6

Oct 5, 2017

2.7.5

Sep 30, 2017

2.7.4

Sep 25, 2017

2.7.3

Sep 17, 2017

2.7.2

Sep 16, 2017

2.7.1

Sep 14, 2017

2.7.0

Sep 12, 2017

2.6.0

Sep 9, 2017

2.5.0

Jul 23, 2017

2.4.0

Jul 14, 2017

2.3.5

Jul 9, 2017

2.3.4

Jul 5, 2017

2.3.3

Jul 4, 2017

2.3.2

Jun 30, 2017

2.3.1

Jun 16, 2017

2.3.0

Jun 14, 2017

2.2.3

Jun 13, 2017

2.2.2

Jun 13, 2017

2.2.1

Jun 13, 2017

2.2.0

Jun 6, 2017

2.1.9

Jun 2, 2017

2.1.8

May 25, 2017

2.1.7

May 25, 2017

2.1.6

May 24, 2017

2.1.5

May 18, 2017

2.1.4

May 11, 2017

2.1.3

May 4, 2017

2.1.2

May 3, 2017

2.1.1

Apr 20, 2017

2.1.0

Apr 19, 2017

2.0.1

Apr 5, 2017

2.0.0

Apr 4, 2017

1.11.2

Mar 15, 2017

1.11.1

Mar 14, 2017

1.11.0

Mar 14, 2017

1.10.0

Mar 11, 2017

1.9.7

Mar 2, 2017

1.9.6

Jan 12, 2017

1.9.5

Jan 9, 2017

1.9.4

Jan 9, 2017

1.9.3

Jan 8, 2017

1.9.2

Jan 2, 2017

1.9.1

Dec 14, 2016

1.9

Dec 10, 2016

1.8

Nov 24, 2016

1.7

Nov 11, 2016

1.6.3

Nov 2, 2016

1.6.2

Nov 2, 2016

1.6.1

Nov 2, 2016

1.6

Nov 2, 2016

1.5.2

Nov 1, 2016

1.5.1

Nov 1, 2016

1.5

Oct 28, 2016

1.4

Oct 17, 2016

1.3

Oct 11, 2016

1.2.1

Oct 7, 2016

1.2.0

Sep 29, 2016

1.1.0

Sep 2, 2016

This version

1.0.0

Aug 29, 2016

0.5.0

Aug 26, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auto_ml-1.0.0.tar.gz (32.8 kB view hashes)

Uploaded Aug 29, 2016 Source

Built Distribution

auto_ml-1.0.0-py2.py3-none-any.whl (25.9 kB view hashes)

Uploaded Aug 29, 2016 Python 2 Python 3

Hashes for auto_ml-1.0.0.tar.gz

Hashes for auto_ml-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`75f0de68dc48abb00eda2e538ef3b79aeb64332d7e0d155af23ddeb094a25aee`
MD5	`e7122535e716fde1f34cf029bf369c04`
BLAKE2b-256	`8b1620a02680424fb047170a7458d1374006a88cde931bc34d8166f0e4aaa6ab`

Hashes for auto_ml-1.0.0-py2.py3-none-any.whl

Hashes for auto_ml-1.0.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`eca5c9607c61195fa91f77a7b3512fa8a5ff42018f9436e6996f1ff4ffddd95d`
MD5	`6339d775d871648035c25c1b3ba6ec01`
BLAKE2b-256	`f9092e0f410460c2f3f978f8d21e92419a49e4232d38608b2da5a75b6205b61a`