parallelbar

Parallel processing with progress bars

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Parallelbar

Parallelbar displays the progress of tasks in the process pool for methods such as map, imap and imap_unordered. Parallelbar is based on the tqdm module and the standard python multiprocessing library.

Installation

pip install parallelbar
or
pip install --user git+https://github.com/dubovikmaster/parallelbar.git

Example

from parallelbar import progress_imap, progress_map, progress_imapu
from parallelbar.tools import cpu_bench, fibonacci

Let's create a list of 100 numbers and test progress_map with default parameters on a toy function cpu_bench:

tasks = [1_000_000 + i for i in range(100)]

%%time
list(map(cpu_bench, tasks))

Wall time: 52.6 s

Ok, by default this works on one core of my i7-9700F and it took 52 seconds. Let's parallelize the calculations for all 8 cores and look at the progress. This can be easily done by replacing standart function map with progress_map.

if __name__=='__main__':
    progress_map(cpu_bench, tasks)

Core progress:

Great! We got an acceleration of 6 times! We were also able to observe the process What about the progress on the cores of your cpu?

if __name__=='__main__':
    progress_map(cpu_bench, tasks, core_progress=True)

Ofcourse you can specify the number of cores and chunk_size:

if __name__=='__main__':
    tasks = [5_000_00 + i for i in range(100)]
    progress_map(cpu_bench, tasks, n_cpu=4, chunk_size=1, core_progress=True)

You can also easily use progress_imap and progress_imapu analogs of the imap and imap_unordered methods of the Pool() class

%%time
if __name__=='__main__':
    tasks = [20 + i for i in range(15)]
    result = progress_imap(fibonacci, tasks, chunk_size=1, core_progress=False)

Wall time: 2.08 s

result

Problems of the naive approach

Why can't I do something simpler? Let's take the standard imap method and run through it in a loop with tqdm and take the results from the processes:

from multiprocessing import Pool
from tqdm.auto import tqdm

if __name__=='__main__':
    with Pool() as p:
        tasks = [20 + i for i in range(15)]
        pool = p.imap(fibonacci, tasks)
        result = []
        for i in tqdm(pool, total=len(tasks)):
            result.append(i)

It looks good, doesn't it? But let's do the following, make the first task very difficult for the core. To do this, I will insert the number 38 at the beginning of the tasks list. Let's see what happens

if __name__=='__main__':
    with Pool() as p:
        tasks = [20 + i for i in range(15)]
        tasks.insert(1, 38)
        pool = p.imap_unordered(fibonacci, tasks)
        result = []
        for i in tqdm(pool, total=len(tasks)):
            result.append(i)

This is a fiasco. Our progress hung on the completion of the first task and then at the end showed 100% progress. Let's try to do the same experiment only for the progress_imap function:

if __name__=='__main__':
    with Pool() as p:
        tasks = [20 + i for i in range(15)]
        tasks.insert(1, 38)
        result = progress_imap(fibonacci, tasks)

The progress_imap function takes care of collecting the result and closing the process pool for you. In fact, the naive approach described above will work for the standard imap_unordered method. But it does not guarantee the order of the returned result. This is often critically important.

License

MIT license

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

2.4

Nov 14, 2023

2.3.1

Nov 10, 2023

1.3.1

Feb 7, 2023

1.3.0

Feb 7, 2023

1.2.0

Dec 21, 2022

1.1.3

Dec 20, 2022

1.0.2

Nov 16, 2022

0.3.0

Jul 30, 2022

0.2.15

Jul 27, 2022

0.2.14

Jul 25, 2022

0.2.10

Jul 22, 2022

0.2.8

Jul 22, 2022

0.2.6 yanked

Jul 21, 2022

Reason this release was yanked:

bug

0.1.19

Oct 20, 2021

0.1.18

Sep 7, 2021

This version

0.1.17

Sep 6, 2021

0.1.16

Aug 25, 2021

0.1.13

Aug 6, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parallelbar-0.1.17.tar.gz (4.7 kB view hashes)

Uploaded Sep 6, 2021 Source

Built Distribution

parallelbar-0.1.17-py3-none-any.whl (5.4 kB view hashes)

Uploaded Sep 6, 2021 Python 3

Hashes for parallelbar-0.1.17.tar.gz

Hashes for parallelbar-0.1.17.tar.gz
Algorithm	Hash digest
SHA256	`0a0a4f8d28b5c5bfe88f47bdadb559c4c0c4ffaa4845009618642756b178d758`
MD5	`003c29298bdbc6698c938038815c0b3f`
BLAKE2b-256	`14045a17b55bcd989ea4ee9f91274c05ef91ed561a54a9512191ec7a5e01a6f8`

Hashes for parallelbar-0.1.17-py3-none-any.whl

Hashes for parallelbar-0.1.17-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a3d174be522c89f1841341c34614a30e6be33263c1ee217da013938ec967b743`
MD5	`5ffb28068216b3a3f130cd84a9091ecd`
BLAKE2b-256	`4fc8e0392d45b14b8b27d5e2758a8282590269de9684e1db72bfdb0d9d0fdc6e`