jx-bigquery - JSON Expressions for BigQuery

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

jx-bigquery

JSON Expressions for BigQuery

Status

Feb 2020 - Active but incomplete: Can insert tidy JSON documents into BigQuery while managing the schema.

Overview

The library is intended to manage multiple Big Query tables to give the illusion of one table with a dynamicly managed schema.

Background

partition - Big data is split into separate containers based on age. This allows queries on recent data to use less resources, and allows old data to be dropped quickly
cluster - A "cluster" is another name for the sorted order of the data in a partition. Sorting by the most commonly lookup will make queries faster
id - The set of columns that identifies the document

Configuration

table - Any name you wish to give to this table series
top_level_fields - BigQuery demands that control columns are top-level. Define them here.
partition -
- field - The dot-delimited field used to partition the tables (must be time)
- expire - When BigQuery will automatically drop your data.
id - The identification of documents
- field - the set of columns to uniquely identify this document
- version - column used to determine age of a document; replacing newer with older
cluster - Columns used to sort the partitions
schema - name: type dictionary - needed when there is no data and BigQuery demands column definitions
sharded - boolean - set to true if you allow this library to track multiple tables. It allows for schema migration (expansion only), and for faster insert from a multitude of machines
account_info - The information BigQuery provides to connect

Example

{
    "table": "my_table_name",
    "top_level_fields": {},
    "partition": {
        "field": "submit_time",
        "expire": "2year"
    },
    "id": {
        "field": "id",
        "version": "last_modified"
    },
    "cluster": [
        "id",
        "last_modified"
    ],
    "schema": {
        "id": "integer",
        "submit_time": "time",
        "last_modified": "time"
    },
    "sharded": true,
    "account_info": {
        "private_key_id": {
            "$ref": "env://BIGQUERY_PRIVATE_KEY_ID"
        },
        "private_key": {
            "$ref": "env://BIGQUERY_PRIVATE_KEY"
        },
        "type": "service_account",
        "project_id": "my-project-id",
        "client_email": "me@my_project.iam.gserviceaccount.com",
        "client_id": "12345",
        "auth_uri": "https://accounts.google.com/o/oauth2/auth",
        "token_uri": "https://oauth2.googleapis.com/token",
        "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
        "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/my-project.iam.gserviceaccount.com"
    }
}

Usage

Setup Dataset with an application name

    destination = bigquery.Dataset(
        dataset=application_name, 
        kwargs=settings
    ).get_or_create_table(settings.destination)

Insert documents as you please

    destination.extend(documents)

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

3.85.20198

Jul 16, 2020

3.83.20197

Jul 15, 2020

3.81.20196

Jul 14, 2020

3.80.20196

Jul 14, 2020

3.78.20194

Jul 12, 2020

3.62.20101

Apr 10, 2020

3.61.20093

Apr 2, 2020

3.59.20089

Mar 29, 2020

3.55.20074

Mar 14, 2020

This version

3.48.20042

Feb 11, 2020

3.47.20042

Feb 11, 2020

3.45.20031

Jan 31, 2020

3.42.20031

Jan 31, 2020

3.38.20029

Jan 29, 2020

3.34.20028

Jan 28, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jx-bigquery-3.48.20042.tar.gz (29.7 kB view hashes)

Uploaded Feb 11, 2020 Source

Hashes for jx-bigquery-3.48.20042.tar.gz

Hashes for jx-bigquery-3.48.20042.tar.gz
Algorithm	Hash digest
SHA256	`c9e22d07fbf3cb5032ecaefa849ef422e2b5c7f87615bdfd3dd13b0b834d94a2`
MD5	`0cb090458be65f7a9b4b4fcad2331e22`
BLAKE2b-256	`d95ec35409b2f4274a950ae8a0a629b3167cfd927dc543b7d2ee38db7657aa18`