One-stop time series analysis tool, supporting time series data preprocessing, feature engineering, model training, model evaluation, model prediction, etc. Based on spinesTS and darts.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

PipelineTS

[中文文档]

One-stop time series analysis tool, supporting time series data preprocessing, feature engineering, model training, model evaluation, model prediction, etc. Based on spinesTS and darts.

Installation

# if you don't want to use the prophet model
# run this
python -m pip install PipelineTS[core]

# if you want to use all models
# run this
python -m pip install PipelineTS[all]

Quick Start [notebook]

list all available models

from PipelineTS.dataset import LoadWebSales

init_data = LoadWebSales()[['date', 'type_a']]

valid_data = init_data.iloc[-30:, :]
data = init_data.iloc[:-30, :]
accelerator = 'auto'  # Specify Computing Device

from PipelineTS.pipeline import ModelPipeline

# list all models
ModelPipeline.list_all_available_models()

[output]:
['prophet',
 'auto_arima',
 'catboost',
 'lightgbm',
 'xgboost',
 'wide_gbrt',
 'd_linear',
 'n_linear',
 'n_beats',
 'n_hits',
 'tcn',
 'tft',
 'gau',
 'stacking_rnn',
 'time2vec',
 'multi_output_model',
 'multi_step_model',
 'transformer',
 'random_forest',
 'tide']

Training

from sklearn.metrics import mean_absolute_error

pipeline = ModelPipeline(
    time_col='date',
    target_col='type_a',
    lags=30,
    random_state=42,
    metric=mean_absolute_error,
    metric_less_is_better=True,
    accelerator=accelerator,  # Supported values for accelerator: `auto`, `cpu`, `tpu`, `cuda`, `mps`.
)

# training all models
pipeline.fit(data, valid_data=valid_data)

# use best model to predict next 30 steps data point
res = pipeline.predict(30)

Training and prediction of a single model

Without predict specify series [notebook]

Code

from PipelineTS.dataset import LoadMessagesSentDataSets
import pandas as pd

# ------------------- Data Preprocessing -------------------
# convert time col, the date column is assumed to be date_col
time_col = 'date'
target_col = 'ta'
lags = 60  # Ahead of the window size, the data will be split into multiple sequences of lags for training
n = 40 # How many steps to predict, in this case how many days to predict

# you can also load data with pandas
# init_data = pd.read_csv('/path/to/your/data.csv')
init_data = LoadMessagesSentDataSets()[[time_col, target_col]]

init_data[time_col] = pd.to_datetime(init_data[time_col], format='%Y-%m-%d')

# split trainning set and test set
valid_data = init_data.iloc[-n:, :]
data = init_data.iloc[:-n, :]
print("data shape: ", data.shape, ", valid data shape: ", valid_data.shape)
data.tail(5)

# data visualization
from PipelineTS.plot import plot_data_period
plot_data_period(
    data.iloc[-300:, :], 
    valid_data, 
    time_col=time_col, 
    target_col=target_col, 
    labels=['Train data', 'Valid_data']
)

# training and predict
from PipelineTS.nn_model import TiDEModel
tide = TiDEModel(
    time_col=time_col, target_col=target_col, lags=lags, random_state=42, 
    quantile=0.9, enable_progress_bar=False, enable_model_summary=False
)
tide.fit(data)
tide.predict(n)

With predict specify series [notebook]

Code

from PipelineTS.dataset import LoadMessagesSentDataSets
import pandas as pd

# ------------------- Data Preprocessing -------------------
# convert time col, the date column is assumed to be date_col
time_col = 'date'
target_col = 'ta'
lags = 60  # Ahead of the window size, the data will be split into multiple sequences of lags for training
n = 40 # How many steps to predict, in this case how many days to predict

# you can also load data with pandas
# init_data = pd.read_csv('/path/to/your/data.csv')
init_data = LoadMessagesSentDataSets()[[time_col, target_col]]

init_data[time_col] = pd.to_datetime(init_data[time_col], format='%Y-%m-%d')

# split trainning set and test set
valid_data = init_data.iloc[-n:, :]
data = init_data.iloc[:-n, :]
print("data shape: ", data.shape, ", valid data shape: ", valid_data.shape)
data.tail(5)

# data visualization
from PipelineTS.plot import plot_data_period
plot_data_period(
    data.iloc[-300:, :], 
    valid_data, 
    time_col=time_col, 
    target_col=target_col, 
    labels=['Train data', 'Valid_data']
)

# training and predict
from PipelineTS.nn_model import TiDEModel
tide = TiDEModel(
    time_col=time_col, target_col=target_col, lags=lags, random_state=42, 
    quantile=0.9, enable_progress_bar=False, enable_model_summary=False
)
tide.fit(data)
tide.predict(n, data=valid_data)

ModelPipeline Module

# If you need to configure the model
from xgboost import XGBRegressor
from catboost import CatBoostRegressor
from PipelineTS.pipeline import ModelPipeline, PipelineConfigs

# If you want to try multiple configurations of a model at once for comparison or tuning purposes, you can use `PipelineConfigs`.
# This feature allows for customizing the models returned by each `ModelPipeline.list_all_available_models()` call.
# The first one is the name of the model, which needs to be in the list of available models provided by ModelPipeline.list_all_available_models(). 
# If you want to customize the name of the model, then the second argument can be a string of the model name, 
# otherwise, the second one is of type dict. The dict can have three keys: 'init_configs', 'fit_configs', 'predict_configs', or any combination of them. 
# The remaining keys will be automatically filled with default parameters.
# Among them, 'init_configs' represents the initialization parameters of the model, 'fit_configs' represents the parameters during model training, 
# and 'predict_configs' represents the parameters during model prediction.

pipeline_configs = PipelineConfigs([
    ('lightgbm', 'lightgbm_linear_tree', {'init_configs': {'verbose': -1, 'linear_tree': True}}),
    ('multi_output_model', {'init_configs': {'verbose': -1}}),
    ('multi_step_model', {'init_configs': {'verbose': -1}}),
    ('multi_output_model', {
        'init_configs': {'estimator': XGBRegressor, 'random_state': 42, 'kwargs': {'verbosity': 0}}
    }
     ),
    ('multi_output_model', {
        'init_configs': {'estimator': CatBoostRegressor, 'random_state': 42, 'verbose': False}
    }
     ),
])

	model_name	model_name_after_rename	model_configs
0	lightgbm	lightgbm_linear_tree	{'init_configs': {'verbose': -1, 'linear_tree': True}, 'fit_configs': {}, 'predict_configs': {}}
1	multi_output_model	multi_output_model_1	{'init_configs': {'verbose': -1}, 'fit_configs': {}, 'predict_configs': {}}
2	multi_output_model	multi_output_model_2	{'init_configs': {'estimator': <class 'xgboost.sklearn.XGBRegressor'>, 'random_state': 42, 'kwargs': {'verbosity': 0}}, 'fit_configs': {}, 'predict_configs': {}}
3	multi_output_model	multi_output_model_3	{'init_configs': {'estimator': <class 'catboost.core.CatBoostRegressor'>, 'random_state': 42, 'verbose': False}, 'fit_configs': {}, 'predict_configs': {}}
4	multi_step_model	multi_step_model_1	{'init_configs': {'verbose': -1}, 'fit_configs': {}, 'predict_configs': {}}

Non-Interval Forecasting [notebook]

from sklearn.metrics import mean_absolute_error

from PipelineTS.pipeline import ModelPipeline

pipeline = ModelPipeline(
    time_col=time_col, 
    target_col=target_col, 
    lags=lags, 
    random_state=42, 
    metric=mean_absolute_error, 
    metric_less_is_better=True,
    configs=pipeline_configs,
    include_init_config_model=False,
    use_standard_scale=False,  # False for MinMaxScaler, True for StandardScaler, None means no data be scaled
    # include_models=['d_linear', 'random_forest', 'n_linear', 'n_beats'],  # specifying the model used
    # exclude_models=['catboost', 'tcn', 'transformer'],  # exclude specified models
    # Note that `include_models` and `exclude_models` cannot be specified simultaneously.
    accelerator=accelerator,
    # Now we can directly input the "modelname__'init_params'" parameter to instantiate the models in ModelPipeline.
    # Note that it is double underline. 
    # When it is duplicated with the ModelPipeline class keyword parameter, the ModelPipeline clas keyword parameter is ignored
    d_linear__lags=50,
    n_linear__random_state=1024,
    n_beats__num_blocks=3,
    random_forest__n_estimators=200,
    n_hits__accelerator='cpu', # Since using mps backend for n_hits model on mac gives an error, cpu backend is used as an alternative
    tft__accelerator='cpu', # tft, same question, but if you use cuda backend, you can just ignore this two configurations.
)

pipeline.fit(data, valid_data)

Get the model parameters in ModelPipeline

# Gets all configurations for the specified model， default to best model
pipeline.get_model_all_configs(model_name='wide_gbrt')

Plotting the forecast results

# use best model to predict next 30 steps data point
prediction = pipeline.predict(n, model_name=None)  # You can use `model_name` to specify the pre-trained model in the pipeline when using Python.

plot_data_period(init_data.iloc[-100:, :], prediction, 
                 time_col=time_col, target_col=target_col)

Interval prediction [notebook]

from sklearn.metrics import mean_absolute_error

from PipelineTS.pipeline import ModelPipeline

pipeline = ModelPipeline(
    time_col=time_col,
    target_col=target_col,
    lags=lags,
    random_state=42,
    metric=mean_absolute_error,
    metric_less_is_better=True,
    configs=pipeline_configs,
    include_init_config_model=False,
    use_standard_scale=False,
    with_quantile_prediction=True,  # turn on the quantile prediction switch, if you like
    accelerator=accelerator,
    # models=['wide_gbrt']  # Specify the model
    n_hits__accelerator='cpu',
    tft__accelerator='cpu',
)

pipeline.fit(data, valid_data)

Plotting the forecast results

# use best model to predict next 30 steps data point
prediction = pipeline.predict(n, model_name=None)  # You can use `model_name` to specify the pre-trained model in the pipeline when using Python.

plot_data_period(init_data.iloc[-100:, :], prediction, 
                 time_col=time_col, target_col=target_col)

Model and ModelPipeline saving and loading

from PipelineTS.io import load_model, save_model

# save
save_model(path='/path/to/save/your/fitted_model_or_pipeline.zip', model=pipeline)
# load
pipeline = load_model('/path/to/save/your/fitted_model_or_pipeline.zip')

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.3.11

Jan 24, 2024

0.3.9

Oct 31, 2023

0.3.8

Oct 31, 2023

0.3.7

Oct 27, 2023

0.3.6

Oct 27, 2023

0.3.5

Oct 20, 2023

0.3.4

Oct 17, 2023

0.3.3

Oct 17, 2023

0.3.2

Oct 17, 2023

0.3.1

Oct 17, 2023

0.3.0

Oct 17, 2023

0.2.3

Oct 13, 2023

0.2.2

Oct 12, 2023

0.2.1

Oct 12, 2023

0.2.0

Oct 11, 2023

0.1.5

Oct 9, 2023

0.1.1

Oct 9, 2023

0.1.0

Oct 9, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PipelineTS-0.3.11.tar.gz (40.4 kB view hashes)

Uploaded Jan 24, 2024 Source

Built Distribution

PipelineTS-0.3.11-py3-none-any.whl (59.5 kB view hashes)

Uploaded Jan 24, 2024 Python 3

Hashes for PipelineTS-0.3.11.tar.gz

Hashes for PipelineTS-0.3.11.tar.gz
Algorithm	Hash digest
SHA256	`8e4a38d5984ffc7af86c0e37b919cd0c37a582e989ec8dbe58c2cab6081846db`
MD5	`d9ddca304f31a971e191d3fb445704c9`
BLAKE2b-256	`bfb7ade788e31cdb7df077fa1407eb7effa8e3bc439b9987339d5cf3fe7f5a30`

Hashes for PipelineTS-0.3.11-py3-none-any.whl

Hashes for PipelineTS-0.3.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`80b1cfb03892cd7df5b1469edca2659f50420b29037a2396f33465a193e915a2`
MD5	`c5a1083b8d6a3a2d0d438eee67d00cac`
BLAKE2b-256	`d2eea4649f2921d2e9b9bf9ebc23a7386d66b89f55004224a48943b04f81192c`