pandarallel

An easy to use library to speed up computation (by parallelizing on multi CPUs) with pandas.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

# pandaral·lel
An easy to use library to speed up computation (by parallelizing on multi CPUs) with [pandas](https://pandas.pydata.org/).

## Requirements
- [pandas](https://pypi.org/project/pandas/)
- [pyarrow](https://pypi.org/project/pyarrow/)

## Warnings
- The V1.0 of this library is not yet released. API is able to change at any time.
- Parallelization has a cost (instanciating new processes, transmitting data via shared memory, etc ...), so parallelization is efficiant only if the amount of computation to parallelize is high enough. For very little amount of data, using parallezation not always worth it.
- Functions applied should NOT be lambda functions.

```python
import pandarallel
from math import sin

# FORBIDDEN
df.parallel_apply(lambda x: sin(x**2), axis=1)

# ALLOWED
def func(x):
return sin(x**2)

df.parallel_apply(func, axis=1)

```

## Examples
An example of each API is available in [examples.ipynb](https://github.com/nalepae/pandarallel/blob/master/examples.ipynb).

## Benchmark
For the `Dataframe.apply` example in [examples.ipynb](https://github.com/nalepae/pandarallel/blob/master/examples.ipynb), here is the comparative benchmark with "standard" `apply` and with `progress_apply` (error bars are too small to be displayed).
Computer used for this benchmark:
- OS: Linux Ubuntu 16.04
- Hardware: Intel Core i7 @ 3.40 GHz (8 cores, **but 4 "truely parallelizable" CPUs**)
- Number of workers (parallel processes) used: 4

![Benchmark](https://github.com/nalepae/pandarallel/blob/master/docs/apply_vs_parallel_apply.png)

For this given example, `parallel_apply` runs approximatively 3.7 faster than the "standard" `apply`.

## API
First, you have to import `pandarallel` (don't forget the double _l_):
```python
import pandarallel
```
### DataFrame.parallel_apply

If `df` is a pandas DataFrame, and `func` a function to apply to this DataFrame, replace
```python
df.apply(func, axis=1)
```
by
```python
df.parallel_apply(func, axis=1)
```

_Note: ``apply`` with ``axis=0`` is not yet implemented._

### Series.parallel_map
If `series` is a pandas Series (aka a DataFrame column), and `func` a function to apply to this Series, replace
```python
series.map(func)
```
by
```python
series.parallel_map(func)
```

### DataFrame.groupby.parallel_apply
If `df` is a pandas DataFrame, `col_name` is the name of a column of this DataFrame and `func` a function to apply to this column, replace
```python
df.groupby(col_name).apply(func)
```
by
```python
df.groupby(col_name).parallel_apply(func)
```

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.6.5

May 2, 2023

1.6.4

Jan 15, 2023

1.6.3

Aug 9, 2022

1.6.2

Aug 3, 2022

1.6.1

Mar 15, 2022

1.6.0

Mar 14, 2022

1.5.8

Mar 12, 2022

1.5.7

Mar 3, 2022

1.5.6

Mar 3, 2022

1.5.5

Feb 6, 2022

1.5.4

Oct 17, 2021

1.5.3

Oct 4, 2021

1.5.2

Feb 4, 2021

1.5.1

Aug 25, 2020

1.5.0

Aug 24, 2020

1.4.8

Apr 5, 2020

1.4.7

Apr 5, 2020

1.4.6

Mar 1, 2020

1.4.5

Jan 20, 2020

1.4.4

Jan 1, 2020

1.4.3

Jan 1, 2020

1.4.2

Nov 28, 2019

1.4.1

Nov 11, 2019

1.4.0

Nov 9, 2019

1.3.4

Nov 2, 2019

1.3.3

Oct 6, 2019

1.3.2

Aug 3, 2019

1.3.1

Aug 2, 2019

1.3.0

Jul 23, 2019

1.2.0

Jul 9, 2019

1.1.1

May 13, 2019

1.1.0

Apr 4, 2019

1.0.0

Apr 1, 2019

0.1.7

Mar 31, 2019

0.1.6

Mar 31, 2019

0.1.5

Mar 26, 2019

0.1.4

Mar 24, 2019

0.1.3

Mar 24, 2019

0.1.2

Mar 16, 2019

0.1.1

Mar 11, 2019

This version

0.1.0

Mar 10, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandarallel-0.1.0.tar.gz (3.5 kB view hashes)

Uploaded Mar 10, 2019 Source

Hashes for pandarallel-0.1.0.tar.gz

Hashes for pandarallel-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e1206292271366c6936bd50475933068bad262559e21477005b808f5f93d2479`
MD5	`746f5c85ba791348b6f4c80806aa2d57`
BLAKE2b-256	`c9671245bb2b72b7b427c39c70d4b56f332f9abbaaffb57cad29f558158108b3`