multicore

Simpler Python multiprocess coding. Persistent workers, memory maps for minimum overhead.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Intended Audience
- Developers
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Internet :: WWW/HTTP :: Dynamic Content

Project description

Python Multicore

A module that makes it easy to parallelize Python code.

Installation

Install or add multicore to your Python path.

Python supports multi-threading but the global interpreter lock (GIL) prevents us from utilising all CPU cores for CPU heavy tasks. The recommended approach is to use Python’s multiprocessing library to work around the GIL, but that has its own set of challenges, notably the ability to share data between sub-processes is limited.

The goal of the multicore library is to make it as simple as possible to parallelize code while incurring the least amount of overhead.

Features

Persistent pool of workers enabling persistent database connections.
Memory maps for inter process communication. Much faster than multiprocess’s own IPC or even pipes.
Can take system load average into account to decide whether parallelization is worth it at any given time.

Architecture

Python Multicore is effectively an in-memory queue that is processed by a fixed set of workers. It uses memory mapping to avoid the latency imposed by using a queing system such as celery.

Usage

Let’s render 100 users. Always break a large task into smaller tasks, but not too small! If the ranges are too small then tasks aren’t worth the effort because the overhead becomes too much.:

import time

from multicore import initialize, shutdown, Task
from multicore.utils import ranges


# Note the scoping of the "items" variable and the functions
items = range(100)


def as_string(item):
    return str(item)


def expensive_as_string(item):
    time.sleep(0.01)
    return str(item)


def multi_expensive_as_string(start, end):
    return ",".join([expensive_as_string(item) for item in items[start:end]])


if __name__ == "__main__":

    # Needs to be called only once for lifetime of process
    initialize()

    # Example 1: trivial (and slightly pointless) usage
    task = Task()
    for i in range(20):
        task.run(as_string, i)
    print(", ".join(task.get()))

    # Example 2: divide job optimally using ranges function
    task = Task()
    for start, end in ranges(items):
        # Note we don't pass items because pickling is expensive and defeats
        # the purpose of the exercize.
        task.run(multi_expensive_as_string, start, end)
    print(", ".join(task.get()))

    # Stop the multicore workers
    shutdown()

The Task constructor accepts an optional parameter max_load_average. If the load average for the last minute is larger than a defined threshold then None is returned and your code must cater for the sequential code path. Note that the threshold is specified as for a single core machine, so typically less than one.

The run method accepts an optional parameter serialization_format with value pickle (the default), json or string. Pickle is slow and safe. If you know what type of data you have (you should!) set this as appropriate.

The run method also accepts an optional parameter use_dill with default value False. Dill is a library that can often pickle things that can’t be pickled by the standard pickler but it is slightly slower.

Hedley Roos

Changelog

0.1.1

Update readme with a working example.
Guard against attempting to re-use a completed task.

0.1

Initial release.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Intended Audience
- Developers
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Internet :: WWW/HTTP :: Dynamic Content

Release history Release notifications | RSS feed

This version

0.1.1

Aug 13, 2017

0.1

Aug 7, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multicore-0.1.1.tar.gz (23.1 kB view hashes)

Uploaded Aug 13, 2017 Source

Built Distribution

multicore-0.1.1-py2.7.egg (15.7 kB view hashes)

Uploaded Aug 13, 2017 Source

Hashes for multicore-0.1.1.tar.gz

Hashes for multicore-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`c45d3f594afcdeb162cc38db2a16567cc82bfad04d346956ea42088fc62b6186`
MD5	`5fc435be02bb933b43bc592b06e0f73c`
BLAKE2b-256	`dd08aa2a4f2d602d0c2e32e26fc0e2e7f4e97b0aeda6bf4c7fa74b6aaff9f2fb`

Hashes for multicore-0.1.1-py2.7.egg

Hashes for multicore-0.1.1-py2.7.egg
Algorithm	Hash digest
SHA256	`b87846c0ad7881590827041e4a27008f1708a1bdc90650d6eba5d5a5ac70959e`
MD5	`abac231f12d6466a3353a489fe413088`
BLAKE2b-256	`53d9b3c89d9c2c93d367589ff79de3f27c2820b105bcffffe0ce1f68f52ba9c8`