Skip to main content

Temporian is a Python package for feature engineering of temporal data, focusing on preventing common modeling errors and providing a simple and powerful API, a first-class iterative development experience, and efficient and well-tested implementations of common and not-so-common temporal data preprocessing functions.

Project description

Temporian logo

pypi docs tests formatting publish

Note Temporian development is in alpha.

Temporian is an open-source Python library for preprocessing ⚡ and feature engineering 🛠 temporal data 📈 for machine learning applications 🤖. It is a library tailor-made to address the unique characteristics and complexities of time-related data, such as time-series and transactional data.

Temporal data is any form of data that represents a state in time. In Temporian, temporal datasets contain events, which consists of values for one or more attributes at a given timestamp. Common examples of temporal data are transaction logs, sensor signals, and weather patterns. For more, see What is Temporal data.

Key features

  • Unified data processing 📈: Temporian operates natively on many forms of temporal data, including multivariate time-series, multi-index time-series, and non-uniformly sampled data.

  • Iterative and interactive development 📊: Users can easily analyze temporal data and visualize results in real-time with iterative tools like notebooks. When prototyping, users can iteratively preprocess, analyze, and visualize temporal data in real-time with notebooks. In production, users can easily reuse, apply, and scale these implementations to larger datasets.

  • Avoids future leakage 😰: Future leakage occurs during model training when a model is exposed to data from future events, which leaks information that would otherwise not be available to the model and can result in overfitting. Temporian operators do not create leakage by default. Users can also use Temporian to programmatically detect whether specific signals were exposed to future leakages.

  • Flexible runtime ☁️: Temporian programs can run seamlessly in-process in Python, on large datasets using Apache Beam.

  • Highly optimized 🔥: Temporian's core is implemented and optimized in C++, so large amounts of data can be handled in-process. In some cases, Temporian is 1000x faster than other libraries.

QuickStart

Installation

Temporian is available on PyPI. Install it with pip:

pip install temporian

Minimal example

The following example uses a dataset, sales.csv, which contains transactional data. Here is a preview of the data:

$ head sales.csv
timestamp,store,price,count
2022-01-01,CA,27.42,61.9
2022-01-01,TX,98.55,18.02
2022-01-02,CA,32.74,14.93
2022-01-15,TX,48.69,83.99
...

The following code calculates the weekly sales for each store, visualizes the output with a plot, and exports the data to a CSV file.

import temporian as tp

input_data = tp.from_csv("sales.csv")

per_store = input_data.set_index("store")
weekly_sum = per_store["price"].moving_sum(window_length=tp.duration.days(7))

# Plot the result
weekly_sum.plot()

# Save the results
tp.to_csv(weekly_sum, "store_sales_moving_sum.csv")

Check the Getting Started tutorial to try it out!

Next steps

New users should refer to the 3 minutes to Temporian page, which provides a quick overview of the key concepts and operations of Temporian.

After reading the 3 minute guide, visit the User Guide for a deep dive into the major concepts, operators, conventions, and practices of Temporian. For a hands-on learning experience, work through the Tutorials or refer to the API reference.

Documentation

The documentation 📚 is available at temporian.readthedocs.io. The 3 minutes to Temporian ⏰️ is the best way to start.

Contributing

Contributions to Temporian are welcome! Check out the contributing guide to get started.

Credits

Temporian is developed in collaboration between Google and Tryolabs.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

temporian-0.1.3.tar.gz (194.5 kB view hashes)

Uploaded Source

Built Distributions

temporian-0.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (537.8 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

temporian-0.1.3-cp311-cp311-macosx_12_0_x86_64.whl (493.2 kB view hashes)

Uploaded CPython 3.11 macOS 12.0+ x86-64

temporian-0.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (537.9 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

temporian-0.1.3-cp310-cp310-macosx_12_0_x86_64.whl (491.8 kB view hashes)

Uploaded CPython 3.10 macOS 12.0+ x86-64

temporian-0.1.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (538.2 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

temporian-0.1.3-cp39-cp39-macosx_12_0_x86_64.whl (492.0 kB view hashes)

Uploaded CPython 3.9 macOS 12.0+ x86-64

temporian-0.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (537.0 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

temporian-0.1.3-cp38-cp38-macosx_12_0_x86_64.whl (491.8 kB view hashes)

Uploaded CPython 3.8 macOS 12.0+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page