migbq

read rdbms table data and upload to bigquery

None

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

# migbq

RDBMS Table data upload to Bigquery table.

## Requirement

* Python
- CPython 2.7.x

* RDBMS (below, DB)
- Microsoft SQL Server
- Mysql (development)

* Table Spec
- All table must have Numeric Primary Key Field

* DB User Grant
- SELECT, INSERT, UPDATE, CREATE
- can access DB's metadata ([INFORMATION_SCHEMA] database)
- some metadata tables create in source RDBMS
- (If you don't want create table in source, you can use sqlite. fork this project and edit source)

* Google Cloud SDK
- install Google Cloud SDK must be required
- https://cloud.google.com/sdk/downloads
- https://cloud.google.com/sdk/gcloud/reference/auth/login

* Pymssql freetds
- http://www.pymssql.org/en/stable/

## Install

```
export PYMSSQL_BUILD_WITH_BUNDLED_FREETDS=1
pip install migbq
```

## Usage

### write Configuration File

* like embulk ( http://www.embulk.org )

### Example

#### general congif file
* config.yml

```yml
in:
type: mssql
host: localhost
user: USER
password: PASSWORD
port: 1433
database: DATABASE
tables:
- tbl
- tbl2
- tbl3
batch_size: 50000
temp_csv_path: /temp/pymig_csv
temp_csv_path_complete: /temp/pymig_csv_complete
out:
type: bigquery
project: GCP_PROJECT
dataset: BQ_DATASET
```

#### jinja2 template

* config.j2.yml
- variable is enviromant variable only.
- file extension is **.j2.yml**

```yml
in:
type: mssql
{% include "mssql-connect.yml" %}
tables:
- tbl
- tbl2
- tbl3
batch_size: 50000
temp_csv_path: /temp/pymig_csv
temp_csv_path_complete: /temp/pymig_csv_complete
out:
type: bigquery
project: {{ env.GCP_PROJECT }}
dataset: BQ_DATASET
```

### Run

#### (1) Execute

```bash
bqmig run config.yml
```

#### (2) Check Job Complete

```bash
bqmig check config.yml
```

#### (3) Check table count equals

```bash
bqmig sync config.yml
```

* Primary Key base count check.

### Run Forever

* you can add crontab
* migbq have exclusive process lock. so you can add crontab every minute.
* you must add both **run** and **check**

## Description

### run command

**[1]** select RDBMS table metadata
- get table primary key name in RDBMS metadata table.
- get column name and type fields in RDBMS metadata table.

**[2]** select RDBMS Primary key value range
- get min / max PK of table

**[3]** select data in primary key range
- select with pk min and min + batch_size

```sql
select * from tbl where idx => 1 and idx < 100;
```

- create file **pymig-tbl-idx-1-100**
- gzip csv

**[4]** upload csv file to bigquery
- direct upload to bigquery table. not upload to GCS (quota exceed can occur)

**[5]** Repeat 1~4 until over the max primary key.

For example, batch_size : 100, max pk is 321, then rdbms query execute like below.

```sql

select * from tbl where idx => 1 and idx < 100;
select * from tbl where idx => 100 and idx < 200;
select * from tbl where idx => 200 and idx < 300;
select * from tbl where idx => 300 and idx < 400;

-- end

```

### check command

* check bigquery jobid end.
* retry fail job.

### Log file of program

* log file create in config file's sub directory [log]

### Pid file of program

* pid file provide unique process for unique command. created at below directory. exclusive file lock.

```
/tmp
```

### load metadata table

#### META: migrationmetadata

* one row insert when each 'select' runs

| field name | type | description | smaple value | etc |
| ----: |--------|----------------------------------------|-----------------|-------------|
| tableName | STRING | target [tableName] | tbl | Primary Key |
| firstPk | INTEGER | [tableName]'s Min Primary Key value | 1 | |
| lastPk | INTEGER | [tableName]'s Max Primary Key value | 123 | |
| currentPk | STRING | [tableName]'s read complete Primary Key value | 20 | |
| regDate | DATETIME| this row's insert date | 2017-11-29 01:02:03 | |
| modDate | DATETIME| firstPk, lastPk modify date | 2017-11-29 01:02:03 | |
| endDate | DATETIME| currentPk reach lastPk date | 2017-11-29 11:22:33 | |
| pkName | STRING | [tableNames]'s Primary Key Name | idx | |
| rowCnt | INTEGER | [tableNames]'s count(*) | 123 | |
| pageTokenCurrent | STRING | not use now | tbl | |
| pageTokenNext | STRING | not use now | tbl | |

#### LOG: migrationmetadatalog

* sequance
- run : insert a row to this table when 'select [tableName]' executed
- run : update a row to this table when bigquery jobId created
- check : update a row to this table's jobComplete and checkComplete when bigquery jobId call ends

| field name | type | description | smaple value | etc |
| ----: |--------|----------------------------------------|-----------------|-------------|
| idx | BigInt | PK | 1 | Primary Key Auto Increment |
| tableName | STRING | [tableName] | tbl | Primary Key |
| regDate | DATETIME | row insert date | 2017-11-29 01:02:03 | |
| endDate | DATETIME | when jobId is 'DONE' | 2017-11-29 11:22:33 | |
| pkName | STRING | [tableNames]'s Primary Key Name | idx | |
| cnt | INTEGER | bigquery api : statistics.load.outputRows | 123 | |
| pkUpper | INTEGER | each 'select' executed : [PKName] <= [pkUpper] | 100 | |
| pkLower | INTEGER | each 'select' executed : [PKName] > [pkLower] | 0 | |
| pkCurrent | INTEGER | same as pkUpper | 99 | |
| jobId | STRING | bigquery upload job jobId | job-adf132f31rf3f | |
| errorMessage | STRING | when jodId check result is 'ERROR', then write this | ERROR:bigquery quota exceed | |
| checkComplete | INTEGER | check command | 1 | |
| jobComplete | INTEGER | check command jobId check complete. success=1, fail=-1 | 1 | |
| pageToken | STRING | use as etc | | |

## loadmap

* parallel loading not supported.

Project details

None

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.85

Mar 14, 2018

0.0.84

Mar 14, 2018

0.0.82

Mar 12, 2018

0.0.81

Mar 6, 2018

0.0.80

Feb 28, 2018

0.0.79

Feb 5, 2018

0.0.78

Feb 5, 2018

0.0.77

Feb 2, 2018

0.0.76

Feb 2, 2018

0.0.75

Feb 2, 2018

0.0.73

Jan 22, 2018

0.0.72

Jan 12, 2018

0.0.71

Jan 2, 2018

0.0.70

Jan 2, 2018

0.0.69

Jan 2, 2018

0.0.68

Dec 27, 2017

0.0.67

Dec 21, 2017

0.0.66

Dec 21, 2017

0.0.65

Dec 20, 2017

0.0.64

Dec 20, 2017

0.0.63

Dec 20, 2017

0.0.62

Dec 20, 2017

0.0.61

Dec 20, 2017

0.0.60

Dec 20, 2017

0.0.59

Dec 20, 2017

0.0.58

Dec 20, 2017

0.0.57

Dec 20, 2017

0.0.56

Dec 20, 2017

0.0.55

Dec 20, 2017

0.0.54

Dec 20, 2017

0.0.53

Dec 20, 2017

0.0.52

Dec 19, 2017

0.0.51

Dec 14, 2017

0.0.50

Dec 14, 2017

0.0.49

Dec 13, 2017

0.0.48

Dec 13, 2017

0.0.47

Dec 12, 2017

0.0.46

Dec 12, 2017

0.0.45

Dec 12, 2017

0.0.44

Dec 11, 2017

0.0.43

Dec 9, 2017

0.0.42

Dec 9, 2017

0.0.41

Dec 9, 2017

This version

0.0.40

Dec 7, 2017

0.0.39

Dec 7, 2017

0.0.38

Dec 6, 2017

0.0.37

Dec 6, 2017

0.0.36

Dec 6, 2017

0.0.35

Dec 6, 2017

0.0.34

Dec 6, 2017

0.0.32

Dec 5, 2017

0.0.31

Dec 4, 2017

0.0.30

Dec 4, 2017

0.0.29

Dec 4, 2017

0.0.28

Nov 30, 2017

0.0.27

Nov 29, 2017

0.0.26

Nov 29, 2017

0.0.25

Nov 29, 2017

0.0.24

Nov 29, 2017

0.0.23

Nov 29, 2017

0.0.22

Nov 29, 2017

0.0.21

Nov 28, 2017

0.0.20

Nov 28, 2017

0.0.19

Nov 28, 2017

0.0.18

Nov 28, 2017

0.0.17

Nov 28, 2017

0.0.16

Nov 28, 2017

0.0.15

Nov 28, 2017

0.0.14

Nov 27, 2017

0.0.13

Nov 27, 2017

0.0.12

Nov 27, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

migbq-0.0.40.tar.gz (35.4 kB view hashes)

Uploaded Dec 7, 2017 Source

Built Distribution

migbq-0.0.40-py2.py3-none-any.whl (43.5 kB view hashes)

Uploaded Dec 7, 2017 Python 2 Python 3

Hashes for migbq-0.0.40.tar.gz

Hashes for migbq-0.0.40.tar.gz
Algorithm	Hash digest
SHA256	`1811697829e1990e0205f4f428ffb98f2bc2ef8401bb58a7e0c75718e3eef11f`
MD5	`aa0ff20d7237dfa28b9cb001b0bd9ad5`
BLAKE2b-256	`fbbd737a27dd0bdefdb4fad5f1fb18e90ea66c445292a647d507705c4e2b227e`

Hashes for migbq-0.0.40-py2.py3-none-any.whl

Hashes for migbq-0.0.40-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`25db0cd17c2d7c68b0929744010ada8890cabc4c3eaab77562b51b141e229481`
MD5	`8018fff7323114419cbe004c6144b0de`
BLAKE2b-256	`4c51ca912055bad6b14801df0bb32996a83b631da364e8fad1d63a34c7eb9e92`