ETL processes with easy config
Project description
Overview
mETL is an ETL device which has been especially designed to load elective data necessary for CEu. Obviously, the programme can be used in a more general way, it can be used to load practically any kind of data. The programme was designed with Python, taking into maximum consideration the optimal memory usage after having assessed the Brewery device’s capabilities.
Capabilities
The actual version supports the most widespread file formats with data migration and data migration packages. These include:
Source- types:
CSV, TSV, XLS, Google SpreadSheet, Fixed width file
PostgreSQL, MySQL, Oracle, SQLite, Microsoft SQL Server
JSON, XML, YAML
Target- types:
CSV, TSV, XLS - with file continuation as well
Fixed width file
PostgreSQL, MySQL, Oracle, SQLite, Microsoft SQL Server - with the purpose of modification as well
JSON, XML, YAML
During the develpoment of the programme we tried to provide the whole course of processing with the most widespread transformation steps, programme structures and mutation steps. In light of this, the programme by default possesses the following transformations:
Add: Adds an arbitrary number to a value.
Clean: Removes the different types of punctuation marks. (dots, commas, etc.)
ConvertType: Modifies the type of the field to another type.
Homogenize: Converts the accentuated letters to unaccentuated ones. (NFKD format)
LowerCase: Converts to lower case.
Map: Changes the value of a field to anothe value.
RemoveWordsBySource: Using another source, it removes certain words.
ReplaceByRegexp: Makes a change (replaces) by a regular expression.
ReplaceWordsBySource: Replaces words using another source.
Set: Sets a certain value.
Split: Separates words by spaces and leaves a given interval.
Stem: Brings words to a stem. (root)
Strip: Removes the unnecessary spaces and/or other characters from the beginning and ending of the value.
Sub: Subtracts a given number from a given value.
Title: Capitalizes the first letter of every word.
UpperCase: Converts to upper case.
Four groups are differentiated in case of manipulations:
Modifier
Modifiers are those objects that are given a whole line (record) and revert with a whole line. However, during their processes they make changes to values with the usage of the related values of different fields.
JoinByKey: Merge and join two different record.
Order: Orders lines according to the given conditions.
Set: Sets a value with the use of fix value scheme, function or another source.
SetWithMap: Sets a value in case of a complicated type with a given map.
TransformField: During manipulation, regular field transformation can be achieved with this command .
Filter
Their function is primarily filtering. It is used when we would like to evaluate or get rid of incomlete or faulty records as a result of an earlier tranformation.
DropByCondition: The fate of the record depends on a condition.
DropBySource: The fate is decided by whether or not the record is in another file.
DropField: Does not decrease the number of records but field can be deleted with it.
KeepByCondition: The fate of the record depends on a condition.
Expand
It is used for enlargement if we would like to add more values to the present given source.
Append: Pasting a new source file identical to the used one after the actual one being used.
AppendBySource: A new file source may be pasted after the original one.
Field: Collects coloumns as parameters and puts them into another coloumn with the coloumns’ values.
BaseExpander: Class used for enlargement, primarily used when we would like to multiply a record.
ListExpander: Splits list-type elements and puts them into separate lines.
Melt: Fixes given coloumns and shows the rest of the coloumns as key-value pairs.
Aggregator
Aggregators are used to connect and arrange data.
Avg: Used to determine the mean average.
Count: Used to calculate figures.
Sum: Used to determine sums.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.