No project description provided

These details have not been verified by PyPI

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

ConnectorX

Load data from to , the fastest way.

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

What you need is one line of code:

import connectorx as cx

cx.read_sql("postgresql://username:password@server:port/database", "SELECT * FROM lineitem")

Optionally, you can accelerate the data loading using parallelism by specifying a partition column.

import connectorx as cx

cx.read_sql("postgresql://username:password@server:port/database", "SELECT * FROM lineitem", partition_on="l_orderkey", partition_num=10)

The function will partition the query by evenly splitting the specified column to the amount of partitions. ConnectorX will assign one thread for each partition to load and write data in parallel. Currently, we support partitioning on integer columns for SPJA queries.

Check out more detailed usage and examples here.

Installation

pip install connectorx

Performance

We compared different solutions in Python that provides the read_sql function, by loading a 10x TPC-H lineitem table (8.6GB) from Postgres into a DataFrame, with 4 cores parallelism.

Time chart, lower is better.

time chart

Memory consumption chart, lower is better.

memory chart

In conclusion, ConnectorX uses up to 3x less memory and 11x less time.

How does ConnectorX achieve a lightening speed while keeping the memory footprint low?

We observe that existing solutions more or less do data copy multiple times when downloading the data. Additionally, implementing a data intensive application in Python brings additional cost.

ConnectorX is written in Rust and follows "zero-copy" principle. This allows it to make full use of the CPU by becoming cache and branch predictor friendly. Moreover, the architecture of ConnectorX ensures the data will be copied exactly once, directly from the source to the destination.

Detailed Usage and Examples

API

connectorx.read_sql(conn: str, query: Union[List[str], str], *, return_type: str = "pandas", protocol: str = "binary", partition_on: Optional[str] = None, partition_range: Optional[Tuple[int, int]] = None, partition_num: Optional[int] = None)

Run the SQL query, download the data from database into a Pandas dataframe.

Parameters

conn(str): Connection string uri. Currently only PostgreSQL is supported.
query(string or list of string): SQL query or list of SQL queries for fetching data.
return_type(string, optional(default "pandas")): The return type of this function. Currently only "pandas" is supported.
partition_on(string, optional(default None)): The column to partition the result.
partition_range(tuple of int, optional(default None)): The value range of the partition column.
partition_num(int, optional(default None)): The number of partitions to generate.

Examples

Read a DataFrame from a SQL using a single thread

import connectorx as cx

postgres_url = "postgresql://username:password@server:port/database"
query = "SELECT * FROM lineitem"

cx.read_sql(postgres_url, query)

Read a DataFrame parallelly using 10 threads by automatically partitioning the provided SQL on the partition column (partition_range will be automatically queried if not given)

import connectorx as cx

postgres_url = "postgresql://username:password@server:port/database"
query = "SELECT * FROM lineitem"

cx.read_sql(postgres_url, query, partition_on="partition_col", partition_num=10)

Read a DataFrame parallelly using 2 threads by manually providing two partition SQLs (the schemas of all the query results should be same)

import connectorx as cx

postgres_url = "postgresql://username:password@server:port/database"
queries = ["SELECT * FROM lineitem WHERE partition_col <= 10", "SELECT * FROM lineitem WHERE partition_col > 10"]

cx.read_sql(postgres_url, queries)

Next Plan

Checkout our discussions to participate in deciding our next plan!

Project details

These details have not been verified by PyPI

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.3.3a2 pre-release

Mar 30, 2024

0.3.3a1 pre-release

Sep 8, 2023

0.3.2

Aug 28, 2023

0.3.2a8 pre-release

Aug 28, 2023

0.3.2a7 pre-release

Jul 5, 2023

0.3.2a6 pre-release

Jun 12, 2023

0.3.2a5 pre-release

Apr 18, 2023

0.3.2a4 pre-release

Apr 17, 2023

0.3.2a3 pre-release

Apr 7, 2023

0.3.2a2 pre-release

Jan 26, 2023

0.3.2a1 pre-release

Dec 29, 2022

0.3.1

Oct 31, 2022

0.3.0

May 11, 2022

0.2.5

Mar 29, 2022

0.2.4

Mar 3, 2022

0.2.3

Dec 21, 2021

0.2.2

Nov 23, 2021

0.2.1

Oct 6, 2021

0.2.0

Jul 27, 2021

This version

0.1.1

Apr 27, 2021

0.1.0

Apr 23, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

connectorx-0.1.1-cp39-cp39-win_amd64.whl (1.8 MB view hashes)

Uploaded Apr 27, 2021 CPython 3.9 Windows x86-64

connectorx-0.1.1-cp39-cp39-manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded Apr 27, 2021 CPython 3.9

connectorx-0.1.1-cp39-cp39-macosx_10_15_intel.whl (2.6 MB view hashes)

Uploaded Apr 27, 2021 CPython 3.9 macOS 10.15+ intel

connectorx-0.1.1-cp38-cp38-win_amd64.whl (1.8 MB view hashes)

Uploaded Apr 27, 2021 CPython 3.8 Windows x86-64

connectorx-0.1.1-cp38-cp38-manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded Apr 27, 2021 CPython 3.8

connectorx-0.1.1-cp38-cp38-macosx_10_15_intel.whl (2.6 MB view hashes)

Uploaded Apr 27, 2021 CPython 3.8 macOS 10.15+ intel

connectorx-0.1.1-cp37-cp37m-win_amd64.whl (1.8 MB view hashes)

Uploaded Apr 27, 2021 CPython 3.7m Windows x86-64

connectorx-0.1.1-cp37-cp37m-manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded Apr 27, 2021 CPython 3.7m

connectorx-0.1.1-cp37-cp37m-macosx_10_15_intel.whl (2.6 MB view hashes)

Uploaded Apr 27, 2021 CPython 3.7m macOS 10.15+ intel

Hashes for connectorx-0.1.1-cp39-cp39-win_amd64.whl

Hashes for connectorx-0.1.1-cp39-cp39-win_amd64.whl
Algorithm	Hash digest
SHA256	`d7aefceca7f6594292b40508958d02bb9a1ad77bd6560587b6b7310d773d1f79`
MD5	`16c4194bb4b3f93f514e0f93abb4a089`
BLAKE2b-256	`4c3549ba1e9baff93c44ddf53d9a9f29078ad63b01d48029b0cd01f1d14da942`

Hashes for connectorx-0.1.1-cp39-cp39-manylinux2014_x86_64.whl

Hashes for connectorx-0.1.1-cp39-cp39-manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`544d6bc8396deaabc65e3b52a5dbe98031227a8b312d86f92363557d133a3dc8`
MD5	`5f8cf2d85cea9eff36359f0fa3357f75`
BLAKE2b-256	`6a0fc1d655f50351d6100b0d5f9c27d94e400d6f4ae98b950150301fb3a36cfa`

Hashes for connectorx-0.1.1-cp39-cp39-macosx_10_15_intel.whl

Hashes for connectorx-0.1.1-cp39-cp39-macosx_10_15_intel.whl
Algorithm	Hash digest
SHA256	`e40dd3b4943a2d46305e5cac2c848c3084872792f1fea3c80c74325650945f81`
MD5	`ebce0176d48e90ff97cbeb494dee2b72`
BLAKE2b-256	`bee2d88cebd577ea39c16413dda3fa6f0a9cf076d59a4771af8784e975e4f2bd`

Hashes for connectorx-0.1.1-cp38-cp38-win_amd64.whl

Hashes for connectorx-0.1.1-cp38-cp38-win_amd64.whl
Algorithm	Hash digest
SHA256	`d37f6fb585cf7b07a46a649628676dd0941b75ce86dbd57763373d62455d58c0`
MD5	`f33b4db812d0446573140c40de40a354`
BLAKE2b-256	`e54e97f99230492a611e05f88663cad31c15169939fe6a74fa9b1930ac980521`

Hashes for connectorx-0.1.1-cp38-cp38-manylinux2014_x86_64.whl

Hashes for connectorx-0.1.1-cp38-cp38-manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`97c4e55609e1958623d62e26567769af8a9ef5c0bb61887d9432c18844e40308`
MD5	`9a02b1bd9324e85ba01d0c9a6ed60fb4`
BLAKE2b-256	`093e4124024fb4011b141583bb0a92013db9412124353b48905662bdc856e82b`

Hashes for connectorx-0.1.1-cp38-cp38-macosx_10_15_intel.whl

Hashes for connectorx-0.1.1-cp38-cp38-macosx_10_15_intel.whl
Algorithm	Hash digest
SHA256	`7032f3d77ea749a7c14e0d484980f6e996e96b2eadf9a581e6d008c9516eb5e7`
MD5	`72e2a39a07f89117a6ab4482660483bf`
BLAKE2b-256	`be2ac517d25440e6257fd6136a5e4e2bc6012757c310b760ce1a5912ec9c0f80`

Hashes for connectorx-0.1.1-cp37-cp37m-win_amd64.whl

Hashes for connectorx-0.1.1-cp37-cp37m-win_amd64.whl
Algorithm	Hash digest
SHA256	`aeecc432d02960fae1e584d246ef5f84ae65469aefc16fe6bbc1069573240d69`
MD5	`8d41eedbe5def52d38295633aa9b231b`
BLAKE2b-256	`7f8eb6487c98279ada9c48724b88610c75444aa853cdee9df2a169be02d291c3`

Hashes for connectorx-0.1.1-cp37-cp37m-manylinux2014_x86_64.whl

Hashes for connectorx-0.1.1-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`5b7cb91ee8acc84eae9a968051b1c6e834d7c03179b93d40d52b8ef10dd3f342`
MD5	`8603f63d7ae9d4bf6cce8bd11867d8ab`
BLAKE2b-256	`3c253469754c84d2040542cb8b1513e044ad1dbf61b3e4a37417e790b5b28f34`

Hashes for connectorx-0.1.1-cp37-cp37m-macosx_10_15_intel.whl

Hashes for connectorx-0.1.1-cp37-cp37m-macosx_10_15_intel.whl
Algorithm	Hash digest
SHA256	`8a7d8885bffe693727858269c4a1ce859e62604a5b9a898e783b554ba981481b`
MD5	`10f19e329a6cd3a14f9b5bf7dd57261e`
BLAKE2b-256	`d36719ad8153c8fe0fd8bc1fd7df4b9684c2e3e1fb2533375e3532c0500ddb7e`