spark_datax_tools

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

spark_datax_tools

spark_datax_tools is a Python library that implements for dataX schemas

Installation

The code is packaged for PyPI, so that the installation consists in running:

pip install spark-datax-tools

Usage

wrapper take DataX

Nomenclature Datax
================================
table_name = "t_pmfi_lcl_suppliers_purchases"
origen = "host"
destination = "hdfs"
datax_generated_nomenclature(table_name=table_name, 
                             origen=origen, 
                             destination=destination, 
                             output=True)




List of adaptaders
================================
datax_list_adapters()




Generated Ticket Adapter
============================================================
adapter_id = "ADAPTER_HDFS_OUTSTAGING"
parameter = {"uuaa":"na8z"}
datax_generated_ticket_adapter(adapter_id=adapter_id, 
                               parameter=parameter, 
                               is_dev=True
)
                               
                               
                               
Generated Ticket Transfer
============================================================
folder="CR-PEMFIMEN-T02"	
job_name="PMFITP4012"
crq="CRQ100000"
periodicity="mensual"
hour="10AM"
weight="50MB"
origen="host"
destination="hdfs"

datax_generated_ticket_transfer(
    folder=folder,	    
    job_name=job_name,    
    crq=crq,
    periodicity=periodicity,    
    hour=hour,    
    weight=weight	,    
    table_name=table_name,    
    origen=origen,
    destination=destination,
    is_dev=True
)
                               
     
                               
Generated Schema JSON Artifactory
============================================================
path_json = "lclsupplierspurchases.output.schema"
is_schema_origen_in = True
schema_type = "host"
convert_string = False

datax_generated_schema_artifactory( 
    path_json=path_json,
    is_schema_origen_in=schema_type,
    schema_type=schema_type,
    convert_string=convert_string
)
           
   
   
   
Generated Schema Json Datum
============================================================
spark = SparkSession.builder.master("local[*]").appName("SparkAPP").getOrCreate()
path="fields_pe_datum2.csv"
table_name="t_pmfi_lcl_suppliers_purchases"
origen="host"
destination="hdfs"
storage_zone="master"

datax_generated_schema_datum(
    spark=spark,
    path=path,
    table_name=table_name,
    origen=origen,
    destination=destination,
    storage_zone=storage_zone,
    convert_string=False
)

License

Apache License 2.0.

New features v1.0

BugFix

choco install visualcpp-build-tools

Reference

Jonathan Quiza github.
Jonathan Quiza RumiMLSpark.
Jonathan Quiza linkedin.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.6.6

Mar 1, 2024

0.6.5

Mar 1, 2024

0.6.4

Feb 28, 2024

0.6.3

Feb 28, 2024

0.6.2

Feb 28, 2024

0.6.1

Feb 28, 2024

0.5.9

Feb 26, 2024

0.5.8

Feb 14, 2024

0.5.5

Feb 14, 2024

0.5.4

Feb 11, 2024

0.5.3

Feb 1, 2024

0.5.2

Jan 17, 2024

0.5.1

Jan 17, 2024

0.5.0

Jul 6, 2023

0.4

Jun 27, 2023

0.3.2

Jun 25, 2023

0.3.1

Jun 17, 2023

0.3.0

Jun 12, 2023

0.2.0

Jun 12, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark_datax_tools-0.6.6.tar.gz (14.5 kB view hashes)

Uploaded Mar 1, 2024 Source

Built Distribution

spark_datax_tools-0.6.6-py3-none-any.whl (16.5 kB view hashes)

Uploaded Mar 1, 2024 Python 3

Hashes for spark_datax_tools-0.6.6.tar.gz

Hashes for spark_datax_tools-0.6.6.tar.gz
Algorithm	Hash digest
SHA256	`58cb3d673ba009a42acefb1bf44040bd8f302c7df8f072077fbcd25e368ab840`
MD5	`4f7754bccd11ed025c10d8e9a069ac81`
BLAKE2b-256	`4654cf6694dd99a1b2db1ce295467bad80dec0380750927ac88a6c5076dc45a0`

Hashes for spark_datax_tools-0.6.6-py3-none-any.whl

Hashes for spark_datax_tools-0.6.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`711d026f2e2385d762addbf338e4674e37be4cf8aae1b2bd15cb34e11bdf2643`
MD5	`3c8f3c224dc80f8b6541fff8b867e0af`
BLAKE2b-256	`bd2c253d79fd9589855fc9ee7997f5b7e6eec16f39263af6eea35d2307600d59`