Powerful [R2]RML engine to create RDF knowledge graphs from heterogeneous data sources.
Project description
Morph-KGC is an engine that constructs RDF knowledge graphs from heterogeneous data sources with the R2RML and RML mapping languages. Morph-KGC is built on top of pandas and it leverages mapping partitions to significantly reduce execution times and memory consumption for large data sources.
Features :sparkles:
- Supports the R2RML and RML mapping languages.
- User-friendly mappings with YARRRML.
- Transformation functions with RML-FNML, including Python user-defined functions.
- RDF-star generation with RML-star.
- RML views over tabular data sources and JSON files.
- Integration with RDFLib, Oxigraph and Kafka.
- Optimized to materialize large knowledge graphs.
- Remote data and mapping files.
- Input data formats:
- Relational databases: MySQL, PostgreSQL, Oracle, Microsoft SQL Server, MariaDB, SQLite.
- Tabular files: CSV, TSV, Excel, Parquet, Feather, ORC, Stata, SAS, SPSS, ODS.
- Hierarchical files: JSON, XML.
- In-memory data structures: Python Dictionaries, DataFrames.
- Cloud data lake solutions: Databricks.
Documentation :bookmark_tabs:
Tutorial :woman_teacher:
Learn quickly with the tutorial in Google Colaboratory!
Getting Started :rocket:
PyPi is the fastest way to install Morph-KGC:
pip install morph-kgc
We recommend to use virtual environments to install Morph-KGC.
To run the engine via command line you just need to execute the following:
python3 -m morph_kgc config.ini
Check the documentation to see how to generate the configuration INI file. Here you can also see an example INI file.
It is also possible to run Morph-KGC as a library with RDFLib, Oxigraph and Kafka:
import morph_kgc
# generate the triples and load them to an RDFLib graph
g_rdflib = morph_kgc.materialize('/path/to/config.ini')
# work with the RDFLib graph
q_res = g_rdflib.query('SELECT DISTINCT ?classes WHERE { ?s a ?classes }')
# generate the triples and load them to Oxigraph
g_oxigraph = morph_kgc.materialize_oxigraph('/path/to/config.ini')
# work with Oxigraph
q_res = g_oxigraph.query('SELECT DISTINCT ?classes WHERE { ?s a ?classes }')
# the methods above also accept the config as a string
config = """
[DataSource1]
mappings: /path/to/mapping/mapping_file.rml.ttl
db_url: mysql+pymysql://user:password@localhost:3306/db_name
"""
g_rdflib = morph_kgc.materialize(config)
License :unlock:
Morph-KGC is available under the Apache License 2.0.
Author & Contact :mailbox_with_mail:
Ontology Engineering Group, Universidad Politécnica de Madrid.
Citing :speech_balloon:
If you used Morph-KGC in your work, please cite the SWJ paper:
@article{arenas2024morph,
title = {{Morph-KGC: Scalable knowledge graph materialization with mapping partitions}},
author = {Arenas-Guerrero, Julián and Chaves-Fraga, David and Toledo, Jhon and Pérez, María S. and Corcho, Oscar},
journal = {Semantic Web},
publisher = {IOS Press},
issn = {2210-4968},
year = {2024},
doi = {10.3233/SW-223135},
volume = {15},
number = {1},
pages = {1-20}
}
Sponsor :shield:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for morph_kgc-2.7.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 241c9526f41cce20310f5ad45c2af4841f330f2c7dd6ada347077bbccae32963 |
|
MD5 | 2764d83b9aecfe5761e529145afaecc3 |
|
BLAKE2b-256 | 35241c042fc0a4b9c0bf6d7fe267e9f1f51124f1eb7f8e5d0e90064672df7093 |