A profiler for Numba

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Profila: a profiler for Numba

This profiler is sponsored by my book on writing fast low-level code in Python, which uses Numba for most of its examples.

Here's what Profila output looks like:

$ python -m profila annotate -- scripts_for_tests/simple.py
# Total samples: 328 (54.9% non-Numba samples, 1.8% bad samples)

## File `/home/itamarst/devel/profila/scripts_for_tests/simple.py`
Lines 10 to 15:

  0.3% |     for i in range(len(timeseries)):
       |         # This should be the most expensive line:
 38.7% |         result[i] = (7 + timeseries[i] / 9 + (timeseries[i] ** 2) / 7) / 5
       |     for i in range(len(result)):
       |         # This should be cheaper:
  4.3% |         result[i] -= 1

Installation

Currently tested on Linux only; macOS support may be added in the future.

You'll need gdb installed. On Ubuntu or Debian you can do:

apt-get install gdb

On RedHat-based systems:

dnf install gdb

Install this library using pip:

pip install profila

Usage

If you usually run your script like this:

$ python yourscript.py --arg1=200

Instead run it like this:

$ python -m profila annotate -- yourscript.py --arg1=200

Sampling is done every 10 milliseconds, so you need to make sure your Numba code runs for a sufficiently long time. For example, you can run your function in a loop until a number of seconds has passed:

from time import time

@njit
def myfunc():
    # ...

start = time()
# Run for 3 seconds:
while (time() - start) < 3:
    myfunc()

The limitations of profiling output

1. The compiled code isn't the same as the input code

Compiled languages like Numba do optimization passes and transform the code to make it faster. That means the running code doesn't necessarily map one to one to the original code; different lines might be combined, for example.

As far as I can tell Numba does give you a reasonable mapping, but you can't assume the source code maps one to one to executed code.

2. Adding the necessary info can change the performance of your code

In order to profile, additional info needs to be added during compilation; specifically, the NUMBA_DEBUGINFO env variable is set. This might change runtime characteristics slightly, because it increases the memory size of the compiled code.

3. Compiled code is impacted by CPU effects that aren't visible in profiling

Instruction-level parallelism, branch mispredictions, SIMD, and the CPU memory caches all have a significant impact on runtime performance, but they don't show up in profiling. I'm writing a book about this if you want to learn more.

Development

To contribute to this library, first checkout the code. Then create a new virtual environment:

cd profila
python -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

pip install -e '.[test]'

To run the tests:

pytest

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.1

Feb 20, 2024

0.2.0

Jan 31, 2024

This version

0.1.1

Jan 30, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

profila-0.1.1.tar.gz (15.1 kB view hashes)

Uploaded Jan 30, 2024 Source

Built Distribution

profila-0.1.1-py3-none-any.whl (11.3 kB view hashes)

Uploaded Jan 30, 2024 Python 3

Hashes for profila-0.1.1.tar.gz

Hashes for profila-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`c95712861f2256a17c905c82c1f1de01f708a40a748f1f9550c3818d8db24007`
MD5	`1ed58d7bb5ebf0c464318d7b53723c16`
BLAKE2b-256	`d167285c332801a8780458d59d62b929d198fb97f26a4f54af751f1a50b4af1e`

Hashes for profila-0.1.1-py3-none-any.whl

Hashes for profila-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5b12b93ba0e709250990e5e40f5c8a16e6dc98c89c75df293832eac01fa2b745`
MD5	`40e4fcb14f3580dbb521c21089f1209e`
BLAKE2b-256	`65b3828bbb4387cdb7a3f562b0f7f31abc6b69fb4a9cbea20d8277d611a4aa7f`