Skip to main content

A library of high-performant graph algorithms.

Project description

graph_mate

Python binding for the set of graph crates.

graph_mate is a Python API that provides a collection of high-performant graph algorithms. It provides implementations for directed and undirected graphs.

Graphs can be created programatically or read from the Graph500 input format.

The library is implemented in Rust and uses rayon for running graph creation and algorithm execution in parallel without holding on to the Global Interpreter Lock or using multiprocessing.

The graph itself is implemented as a Compressed-Sparse-Row (CSR) data structure which is tailored for fast and concurrent access to the graph topology.

graph_mate provides a few graph algorithms. The algorithm implementations are designed to run efficiently on large-scale graphs with billions of nodes and edges.

Note: The development is mainly driven by Neo4j developers. However, the library is not an official product of Neo4j.

Usage

Installation

graph_mate is available on PyPI:

pip install graph-mate

It is currently build for x86_64 on Windows/Mac/Linux and as universal for Apple Silicon Macs.

If you need to use graph_mate for a different architecture or system, please refer to the manual installation.

Usage

What is a graph?

A graph consists of nodes and edges where edges connect exactly two nodes. A graph can be either directed, i.e., an edge has a source and a target node or undirected where there is no such distinction.

In a directed graph, each node u has outgoing and incoming neighbors. An outgoing neighbor of node u is any node v for which an edge (u, v) exists. An incoming neighbor of node u is any node v for which an edge (v, u) exists.

In an undirected graph there is no distinction between source and target node. A neighbor of node u is any node v for which either an edge (u,v) or (v, u) exists.

How to create graphs

Currently there are two ways to build a graph. You can either load graphs in the Graph500 format, for example by downloading them from the LDBC Graphalytics site. Alternatively, you can provide a numpy edge list using from_numpy.

import graph_mate as gm
import numpy as np

# Let's load a small graph:
#    (a)-->(b)-->(c)-->(d), (a)-->(c), (b)-->(d)
# To load from an edge list, we need to create a 2d numpy array of `uint32`s.
edge_list = np.array([
    # (a)-->(b)
    [0, 1],
    # (a)-->(c)
    [0, 2],
    # (b)-->(c)
    [1, 2],
    # (b)-->(d)
    [1, 3],
    # (c)-->(d)
    [2, 3]
], dtype=np.uint32)

To build a directed graph, you would create a graph_mate.DiGraph and an undirected on with graph_mate.Graph.

# We can load a directed graph from the edge list
directed = gm.DiGraph.from_numpy(edge_list)

# Or we can load an undirected graph
undirected = gm.Graph.from_numpy(edge_list)

To make assertions easier, we can create graphs with a sorted adjacency list by providing an optional second argument of type graph_mate.Layout.

directed = gm.DiGraph.from_numpy(edge_list, gm.Layout.Sorted)

undirected = gm.Graph.from_numpy(edge_list, gm.Layout.Sorted)

When loading from a numpy edge list, the data is not shared but copied into the graph. The numpy arrays can be deleted afterwards.

We can inspect the graph with a few methods.

assert directed.node_count() == 4
assert directed.edge_count() == 5

assert directed.out_degree(1) == 2
assert directed.in_degree(1) == 1

assert np.array_equal(directed.out_neighbors(1), [2, 3])
assert np.array_equal(directed.in_neighbors(1), [0])

Neighbors are returned as a numpy array view into the graph, which means we cannot modify the array.

neighbors = directed.out_neighbors(1)
try:
    neighbors[0] = 42
except ValueError as e:
    assert str(e) == "assignment destination is read-only"

In order to use the neighbors as a Python list and not a numpy array, we can use copy_ methods.

neighbors = directed.copy_out_neighbors(1)

assert neighbors == [2, 3]
assert type(neighbors) == list

For undirected graphs, we don't have directional methods for the degree or the neighbors.

assert undirected.node_count() == 4
assert undirected.edge_count() == 5

assert undirected.degree(1) == 3

assert np.array_equal(undirected.neighbors(1), [0, 2, 3])

How to run algorithms

In the following we will demonstrate running Page Rank, a graph algorithm to determine the importance of nodes in a graph based on the number and quality of their incoming edges. Page Rank requires a directed graph and returns the rank value for each node.

# https://en.wikipedia.org/wiki/PageRank#/media/File:PageRanks-Example.svg

graph = gm.DiGraph.from_numpy(np.array([
    (1,2), # B->C
    (2,1), # C->B
    (4,0), # D->A
    (4,1), # D->B
    (5,4), # E->D
    (5,1), # E->B
    (5,6), # E->F
    (6,1), # F->B
    (6,5), # F->E
    (7,1), # G->B
    (7,5), # F->E
    (8,1), # G->B
    (8,5), # G->E
    (9,1), # H->B
    (9,5), # H->E
    (10,1), # I->B
    (10,5), # I->E
    (11,5), # J->B
    (12,5), # K->B
], dtype=np.uint32))

pr_result = graph.page_rank(max_iterations=10, tolerance=1e-4, damping_factor=0.85)

assert pr_result.ran_iterations == 10

expected = np.array([
    0.024064068,
    0.3145448,
    0.27890152,
    0.01153846,
    0.029471997,
    0.06329483,
    0.029471997,
    0.01153846,
    0.01153846,
    0.01153846,
    0.01153846,
    0.01153846,
    0.01153846,
], dtype=np.float32)

assert np.array_equal(pr_result.scores(), expected)

Example Notebooks

For more examples and demos, please refer to the notebooks in the notebooks directory.

Developing

The Python extension is written using PyO3 together with maturin.

One-time setup

# Run the command from the extension directory, not the git root
# cd crates/mate

python -m venv .env
source .env/bin/activate
pip install -r requirements/dev.txt

Once-per-new-terminal setup

Make sure that you're activating the venv in every new terminal where you want to develop.

source .env/bin/activate

Building the extension

Build in debug mode.

maturin develop

Build in release mode.

maturin develop --release

Rebuild the extension in release mode 2 seconds after the last file change. This is an optional step.

cargo watch --shell 'maturin develop --release' --delay 2

Testing

Running the tests

pytest tests

Formatting and linting

# Runs code formatter https://pypi.org/project/black/
black tests

# Sort imports using https://pypi.org/project/isort/
isort tests

# Verify with https://pypi.org/project/flake8/
flake8 tests

# Very types using http://mypy-lang.org
mypy .

License: MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graph_mate-0.1.1.tar.gz (83.6 kB view hashes)

Uploaded Source

Built Distributions

graph_mate-0.1.1-cp38-abi3-win_amd64.whl (345.9 kB view hashes)

Uploaded CPython 3.8+ Windows x86-64

graph_mate-0.1.1-cp38-abi3-win32.whl (323.6 kB view hashes)

Uploaded CPython 3.8+ Windows x86

graph_mate-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view hashes)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

graph_mate-0.1.1-cp38-abi3-manylinux_2_12_i686.manylinux2010_i686.whl (1.4 MB view hashes)

Uploaded CPython 3.8+ manylinux: glibc 2.12+ i686

graph_mate-0.1.1-cp38-abi3-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (952.7 kB view hashes)

Uploaded CPython 3.8+ macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

graph_mate-0.1.1-cp38-abi3-macosx_10_7_x86_64.whl (492.8 kB view hashes)

Uploaded CPython 3.8+ macOS 10.7+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page