numerox

Numerox is a Numerai tournament toolbox written in Python

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering

Project description

Numerox is a Numerai tournament toolbox written in Python.

All you have to do is create a model. Take a look at model.py for examples.

Once you have a model numerox will do the rest. First download the Numerai dataset and then load it (there is no need to unzip it):

>>> import numerox as nx
>>> nx.download_dataset('numerai_dataset.zip')
>>> data = nx.load_zip('numerai_dataset.zip')
>>> data
region    live, test, train, validation
rows      884544
era       98, [era1, eraX]
x         50, min 0.0000, mean 0.4993, max 1.0000
y         mean 0.499961, fraction missing 0.3109

Let’s use the logistic regression model in numerox to run 5-fold cross validation on the training data:

>>> model = nx.model.logistic()
>>> prediction1 = nx.backtest(model, data, verbosity=1)
logistic(inverse_l2=1e-05)
      logloss   auc     acc     ystd
mean  0.692974  0.5226  0.5159  0.0023  |  region   train
std   0.000224  0.0272  0.0205  0.0002  |  eras     85
min   0.692360  0.4550  0.4660  0.0020  |  consis   0.7647
max   0.693589  0.5875  0.5606  0.0027  |  75th     0.6931

OK, results are good enough for a demo so let’s make a submission file for the tournament:

>>> prediction2 = nx.production(model, data)
logistic(inverse_l2=1e-05)
      logloss   auc     acc     ystd
mean  0.692993  0.5157  0.5115  0.0028  |  region   validation
std   0.000225  0.0224  0.0172  0.0000  |  eras     12
min   0.692440  0.4853  0.4886  0.0028  |  consis   0.7500
max   0.693330  0.5734  0.5555  0.0028  |  75th     0.6931
>>> prediction2.to_csv('logistic.csv')  # 6 decimal places by default

There is no overlap in ids between prediction1 (train) and prediction2 (tournament) so you can add (concatenate) them if you’re into that and let’s go ahead and save the result:

>>> prediction = prediction1 + prediction2
>>> prediction.save('logloss_1e-05.pred')  # HDF5

Once you have run and saved several predictions, you can make a report:

>>> report = nx.report.load_report('/round79', extension='pred')
>>> report.performance(data['train'], sort_by='logloss')
logloss   auc     acc     ystd    consis (train; 85 eras)
0.692455  0.5215  0.5149  0.0219  0.6824        logistic_1e-03
0.692487  0.5224  0.5159  0.0121  0.7294        logistic_1e-04
0.692565  0.5236  0.5162  0.0086  0.7294  extratrees_nfeature7
0.692581  0.5206  0.5143  0.0253  0.6000        logistic_1e-02
0.692629  0.5240  0.5164  0.0074  0.7294  extratrees_nfeature5
0.692704  0.5200  0.5140  0.0273  0.5412        logistic_1e-01
0.692747  0.5232  0.5162  0.0055  0.7647  extratrees_nfeature3
0.692831  0.5238  0.5163  0.0042  0.7647  extratrees_nfeature2
0.692974  0.5226  0.5159  0.0023  0.7647        logistic_1e-05

The lowest logloss on the train data was by logistic_1e-03. Let’s look at its per era performance on the validation data:

>>> report.performance_per_era(data['validation'], 'logistic_1e-03')
logistic_1e-03
       logloss   auc     acc     ystd
era86  0.691499  0.5322  0.5296  0.0220
era87  0.689715  0.5552  0.5371  0.0219
era88  0.692501  0.5189  0.5167  0.0220
era89  0.694544  0.4954  0.4916  0.0218
era90  0.691133  0.5349  0.5230  0.0221
era91  0.692794  0.5140  0.5061  0.0218
era92  0.694579  0.4933  0.4906  0.0217
era93  0.694098  0.4983  0.4954  0.0218
era94  0.688417  0.5752  0.5591  0.0218
era95  0.691734  0.5265  0.5224  0.0216
era96  0.693184  0.5119  0.5092  0.0215
era97  0.693276  0.5077  0.5089  0.0215

Both the production and backtest functions are just very thin wrappers around the run function:

>>> prediction = nx.run(model, splitter, verbosity=2)

where splitter iterates through fit, predict splits of the data. Numerox comes with five splitters:

tournament_splitter fit: train; predict: tournament (production)
validation_splitter fit: train; predict validation
cheat_splitter fit: train+validation; predict tournament
cv_splitter k-fold cross validation across train eras (backtest)
split_splitter single split of train data across eras

For example, here’s how you would reproduce the backtest function:

>>> splitter = nx.cv_splitter(data, kfold=5, seed=0)
>>> prediction = nx.run(model, splitter)

and the production function:

>>> splitter = nx.tournament_splitter(data)
>>> prediction = nx.run(model, splitter)

Warning

This preview release has minimal unit tests coverage (yikes!) and the code has seen little use. The next release will likely break any code you write using numerox—the api is not yet stable. Please report any bugs or such at github.

Data class

You can create a data object from the zip archive provided by Numerai:

>>> import numerox as nx
>>> data = nx.load_zip('numerai_dataset.zip')
>>> data
region    live, test, train, validation
rows      884544
era       98, [era1, eraX]
x         50, min 0.0000, mean 0.4993, max 1.0000
y         mean 0.499961, fraction missing 0.3109

But that is slow (~7 seconds) which is painful for dedicated overfitters. Let’s create an HDF5 archive:

>>> data.save('numerai_dataset.hdf')
>>> data2 = nx.load_data('numerai_dataset.hdf')

That loads quickly (~0.2 seconds, but takes more disk space than the unexpanded zip archive).

Data indexing is done by rows, not columns:

>>> data[data.y == 0]
region    train, validation
rows      304813
era       97, [era1, era97]
x         50, min 0.0000, mean 0.4993, max 1.0000
y         mean 0.000000, fraction missing 0.0000

You can also index with special strings. Here are two examples:

>>> data['era92']
region    validation
rows      6048
era       1, [era92, era92]
x         50, min 0.0308, mean 0.4993, max 1.0000
y         mean 0.500000, fraction missing 0.0000

>>> data['tournament']
region    live, test, validation
rows      348831
era       13, [era86, eraX]
x         50, min 0.0000, mean 0.4992, max 1.0000
y         mean 0.499966, fraction missing 0.7882

If you wish to extract more than one era (I hate these eras):

>>> data.era_isin(['era92', 'era93'])
region    validation
rows      12086
era       2, [era92, era93]
x         50, min 0.0177, mean 0.4993, max 1.0000
y         mean 0.500000, fraction missing 0.0000

You can do the same with regions:

>>> data.region_isin(['test', 'live'])
region    live, test
rows      274966
era       1, [eraX, eraX]
x         50, min 0.0000, mean 0.4992, max 1.0000
y         mean nan, fraction missing 1.0000

Or you can remove regions (or eras):

>>> data.region_isnotin(['test', 'live'])
region    train, validation
rows      609578
era       97, [era1, era97]
x         50, min 0.0000, mean 0.4993, max 1.0000
y         mean 0.499961, fraction missing 0.0000

You can concatenate data objects (as long as the ids don’t overlap) by adding them together. Let’s add validation era92 to the training data:

>>> data['train'] + data['era92']
region    train, validation
rows      541761
era       86, [era1, era92]
x         50, min 0.0000, mean 0.4993, max 1.0000
y         mean 0.499960, fraction missing 0.0000

Or, let’s go crazy:

>>> nx.concat_data([data['live'], data['era1'], data['era92']])
region    live, train, validation
rows      19194
era       3, [era1, eraX]
x         50, min 0.0000, mean 0.4992, max 1.0000
y         mean 0.499960, fraction missing 0.3544

You can pull out numpy arrays (copies, not views) like so data.ids, data.era, data.region, data.x, data.y.

Numerox comes with a small dataset to play with:

>>> nx.play_data()
region    live, test, train, validation
rows      8795
era       98, [era1, eraX]
x         50, min 0.0259, mean 0.4995, max 0.9913
y         mean 0.502646, fraction missing 0.3126

It is about 1% of a regular Numerai dataset, so contains around 60 rows per era.

Install

This is what you need to run numerox:

python
setuptools
numpy
pandas
pytables
sklearn
requests
nose

Install with pipi:

$ sudo pip install numerox

After you have installed numerox, run the unit tests (please report any failures):

>>> import numerox as nx
>>> nx.test()

Resources

Ask usage questions on rocket.chat
Report bugs on github.

License

Numerox is distributed under the the GPL v3+. See LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

4.1.8

Dec 8, 2020

4.1.7

Aug 25, 2020

4.1.6

May 12, 2020

4.1.5

May 11, 2020

4.1.4

May 10, 2020

4.1.3

May 10, 2020

4.1.2

May 9, 2020

4.1.1

Jan 17, 2020

4.0.0

Jul 12, 2019

3.7.0

Mar 7, 2019

3.6.0

Dec 14, 2018

3.5.0

Nov 9, 2018

3.4.0

Oct 29, 2018

3.3.0

Oct 20, 2018

3.2.0

Oct 18, 2018

3.1.0

Sep 24, 2018

3.0.0

Sep 11, 2018

2.7.0

Aug 29, 2018

2.6.0

Aug 23, 2018

2.5.0

Aug 20, 2018

2.4.0

Aug 15, 2018

2.3.0

Aug 10, 2018

2.2.0

Aug 9, 2018

2.1.0

Jun 27, 2018

2.0.1

Jun 13, 2018

2.0.0

Jun 11, 2018

1.6.0

Jun 4, 2018

1.5.0

May 23, 2018

1.4.0

May 11, 2018

1.3.0

May 1, 2018

1.2.0

Apr 25, 2018

1.1.0

Apr 23, 2018

1.0.0

Apr 13, 2018

0.9.0

Apr 9, 2018

0.8.0

Feb 19, 2018

0.7.0

Feb 12, 2018

0.6.0

Feb 8, 2018

0.5.0

Feb 5, 2018

0.4.0

Jan 29, 2018

0.3.1

Jan 22, 2018

0.3.0

Jan 15, 2018

0.2.0

Jan 8, 2018

0.1.2

Jan 2, 2018

0.1.1

Dec 21, 2017

0.1.0

Dec 16, 2017

0.0.9

Dec 10, 2017

0.0.8

Dec 1, 2017

0.0.7

Nov 27, 2017

0.0.6

Nov 22, 2017

0.0.5

Nov 21, 2017

0.0.4

Nov 20, 2017

0.0.3

Nov 17, 2017

0.0.2

Nov 16, 2017

This version

0.0.1

Nov 14, 2017

0.0.1.dev2 pre-release

Nov 5, 2017

0.0.1.dev1 pre-release

Nov 3, 2017

0.0.1.dev0 pre-release

Nov 3, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

numerox-0.0.1.tar.gz (1.8 MB view hashes)

Uploaded Nov 14, 2017 Source

Hashes for numerox-0.0.1.tar.gz

Hashes for numerox-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`e557d1aed03088a4166a61dbaea46cc5be9d55c9d1df2d3170eae091e639744b`
MD5	`0dc815a210a0f52d8f9fee4e2c5ee4b4`
BLAKE2b-256	`bccc77b1e2bcf357833d6a44c899cd866fe399aa4f28d12d5a583042f4e55566`

numerox 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Warning

Data class

Install

Resources

License

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

numerox 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Warning

Data class

Install

Resources

Sponsor

License

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution