Python client for oka repository

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

test Python version

oka - Client for OKA repository

Latest version as a package

Current code

API documentation

Overview

oka is the client for Oka repository. It also provides utilities to process data.

Installation

...as a standalone lib

# Set up a virtualenv. 
python3 -m venv venv
source venv/bin/activate

# Install from PyPI...
pip install --upgrade pip
pip install -U oka
pip install -U oka[full]  # use the flag 'full' for extra functionality (recommended)

# ...or, install from updated source code.
pip install git+https://github.com/davips/rabizao/oka

...from source

sudo apt install python3.8-venv python3.8-dev python3.8-distutils # For Debian-like systems.
git clone https://github.com/rabizao/oka.git
cd oka
python3.8 -m venv venv
source venv/bin/activate
pip install -e .

Usage

Hello world

from oka import Oka, generate_token, toy_df

# Create a pandas dataframe.
df = toy_df()
print(df.head())
"""
   attr1  attr2  class
0    5.1    6.4      0
1    1.1    2.5      1
2    6.1    3.6      0
3    1.1    3.5      1
4    3.1    2.5      0
"""

# Login.
token = generate_token("http://localhost:5000")
client = Oka(token, "http://localhost:5000")

# Store.
id = client.send(df)

# Store again.
id = client.send(df)
"""
Content already stored for id iJ_e4463c51904e9efb800533d25082af2a7bf77
"""

# Fetch.
df = client.get(id)

print(df.head())
"""
   attr1  attr2  class
0    5.1    6.4      0
1    1.1    2.5      1
2    6.1    3.6      0
3    1.1    3.5      1
4    3.1    2.5      0
"""

DataFrame by hand

import pandas as pd
from oka import Oka, generate_token

# Create a pandas dataframe.
df = pd.DataFrame(
    [[1, 2, "+"],
     [3, 4, "-"]],
    index=["row 1", "row 2"],
    columns=["col 1", "col 2", "class"],
)
print(df.head())
"""
       col 1  col 2 class
row 1      1      2     +
row 2      3      4     -
"""

# Login.
token = generate_token("http://localhost:5000")
client = Oka(token, "http://localhost:5000")

# Store.
id = client.send(df)

# Store again.
id = client.send(df)
"""
Content already stored for id f7_6b9deafec2562edde56bfdc573b336b55cb16
"""

# Fetch.
df = client.get(id)

print(df.head())
"""
       col 1  col 2 class
row 1      1      2     +
row 2      3      4     -
"""

Machine Learning workflow

from pprint import pprint

from idict import idict, let
from idict.function.classification import fit, predict
from idict.function.evaluation import split
from sklearn.ensemble import RandomForestClassifier as RF

d = idict.fromtoy() >> split >> let(fit, algorithm=RF, Xin="Xtr", yin="ytr") >> let(predict, Xin="Xts")
print(d.z)
"""
[1 0 1 0 1 1 0]
"""

pprint(d.history)
"""
{'fit--------------------------------idict': {'code': 'def f(algorithm=None, '
                                                      "config={}, Xin='X', "
                                                      "yin='y', "
                                                      "output='model', "
                                                      '**kwargs):\n'
                                                      'obj = '
                                                      'algorithm(**config)\n'
                                                      'obj.fit(kwargs[Xin], '
                                                      'kwargs[yin])\n'
                                                      'return {output: obj, '
                                                      "'_history': ...}",
                                              'description': 'Induce a model.',
                                              'name': 'fit',
                                              'parameters': {'Xin': 'Xtr',
                                                             'algorithm': <class 'sklearn.ensemble._forest.RandomForestClassifier'>,
                                                             'config': {},
                                                             'output': 'model',
                                                             'yin': 'ytr'}},
 'predict----------------------------idict': {'code': "def f(input='model', "
                                                      "Xin='X', yout='z', "
                                                      '**kwargs):\n'
                                                      'return {yout: '
                                                      'kwargs[input].predict(kwargs[Xin]), '
                                                      "'_history': ...}",
                                              'description': 'Predict values '
                                                             'according to a '
                                                             'model.',
                                              'name': 'predict',
                                              'parameters': {'Xin': 'Xts',
                                                             'input': 'model',
                                                             'yout': 'z'}},
 'split------------------------------idict': {'code': "def f(input=['X', 'y'], "
                                                      'seed=0, test_pct=33, '
                                                      '**kwargs):\n'
                                                      "if input != ['X', "
                                                      "'y']:\n"
                                                      '    raise '
                                                      'Exception(f"Not '
                                                      'implemented for input '
                                                      "different than ['X', "
                                                      '\'y\']: {input}")\n'
                                                      'from '
                                                      'sklearn.model_selection '
                                                      'import '
                                                      'train_test_split\n'
                                                      'args = '
                                                      '[kwargs[input[i]] for i '
                                                      'in range(len(input))]\n'
                                                      'Xtr, Xts, ytr, yts = '
                                                      'train_test_split(*args, '
                                                      'test_size=test_pct / '
                                                      '100, shuffle=True, '
                                                      'stratify=args[1], '
                                                      'random_state=seed)\n'
                                                      "return {'Xtr':Xtr, \n"
                                                      " 'ytr':ytr,  "
                                                      "'Xts':Xts,  'yts':yts,  "
                                                      "'_history':...}",
                                              'description': 'Split data in '
                                                             'two sets.',
                                              'name': 'split',
                                              'parameters': {'input': ['X',
                                                                       'y'],
                                                             'seed': 0,
                                                             'test_pct': 33}}}
"""

More info

Aside from the papers on identification and on similarity (not ready yet), the PyPI package and GitHub repository,

A lower level perspective is provided in the API documentation.

Grants

This work was supported by Fapesp under supervision of Prof. André C. P. L. F. de Carvalho at CEPID-CeMEAI (Grants 2013/07375-0 – 2019/01735-0).

.>>>>>>>>> outros <<<<<<<<<<<.

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.211211.2

Dec 12, 2021

0.211208.2

Dec 9, 2021

0.211208.1

Dec 8, 2021

0.211202.4

Dec 3, 2021

0.211202.3

Dec 3, 2021

0.211202.1

Dec 3, 2021

0.211130.3

Nov 30, 2021

0.211130.2

Nov 30, 2021

0.211130.1

Nov 30, 2021

0.211129.1

Nov 29, 2021

0.211126.5

Nov 29, 2021

This version

0.211126.4

Nov 28, 2021

0.211126.3

Nov 28, 2021

0.2108.0

Dec 8, 2021

0.2107.9

Jul 17, 2021

0.2107.8

Jul 17, 2021

0.2102.7

Feb 28, 2021

0.2102.5

Feb 28, 2021

0.2102.4

Feb 28, 2021

0.2102.3

Feb 28, 2021

0.2102.2

Feb 22, 2021

0.4a0 pre-release

Dec 16, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oka-0.211126.4.tar.gz (20.6 kB view hashes)

Uploaded Nov 28, 2021 Source

Built Distribution

oka-0.211126.4-py3-none-any.whl (19.4 kB view hashes)

Uploaded Nov 28, 2021 Python 3

Hashes for oka-0.211126.4.tar.gz

Hashes for oka-0.211126.4.tar.gz
Algorithm	Hash digest
SHA256	`7012fbc97d6db96c7c34514db65e95c9b4c61c6442d4ae8ad3f0e926fccb4029`
MD5	`70a2a6028959998beadb2b19a73edcfe`
BLAKE2b-256	`867e983a45ebf3343b15f8a6dab3cf2e4c1cc4e684c7bd287ca6b175892fad59`

Hashes for oka-0.211126.4-py3-none-any.whl

Hashes for oka-0.211126.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`082f938c82bf3755aa05a891d45efed419bc3610c6c95dca5e161a4c8018ec67`
MD5	`c5c60f4bff99d902194af6d9c04937e8`
BLAKE2b-256	`d2b44164a2d0ee1fd42bc86dda69442dc4518f039c7ecfbac36cf42d7315f79a`