Project description

wiki2neo

Produce Neo4j import CSVs from Wikipedia database dumps to build a graph of links between Wikipedia pages.

Installation

$ pip install wiki2neo

Usage

Usage: wiki2neo [OPTIONS] [WIKI_XML_INFILE]

  Parse Wikipedia pages-articles-multistream.xml dump into two Neo4j import
  CSV files:

      Node (Page) import, headers=["title:ID", "id"]
      Relationships (Links) import, headers=[":START_ID", ":END_ID"]

  Reads from stdin by default, pass [WIKI_XML_INFILE] to read from file.

Options:
  -p, --pages-outfile FILENAME  Node (Pages) CSV output file  [default:pages.csv]
  -l, --links-outfile FILENAME  Relationships (Links) CSV output file [default: links.csv]
  --help                        Show this message and exit.

Import resulting CSVs into Neo4j:
$ neo4j-admin import --nodes:Page pages.csv \
        --relationships:LINKS_TO links.csv \
        --ignore-duplicate-nodes --ignore-missing-nodes --multiline-fields

Downloads from Wikipedia are in compressed xml.bz2 format. Simplest usage is to pip extraction output straight into wiki2neo:

$ bzcat pages-articles-multistream.xml.dz2 | wiki2neo

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.3

Feb 23, 2019

0.0.2

Feb 19, 2019

0.0.1

Feb 19, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wiki2neo-0.0.3.tar.gz (2.7 kB view hashes)

Uploaded Feb 23, 2019 Source

Built Distribution

wiki2neo-0.0.3-py2.py3-none-any.whl (2.1 kB view hashes)

Uploaded Feb 23, 2019 Python 2 Python 3

Hashes for wiki2neo-0.0.3.tar.gz

Hashes for wiki2neo-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`b5aa13d070b7bb184c663b873de121e240fb0e1db18ec55bba32915a2302733d`
MD5	`4d2f1514934c378c887e69613c0e43e9`
BLAKE2b-256	`42bf0aafbbffef69b36aacf69afc5177eceebd262f9122461224d35ab596e0ba`

Hashes for wiki2neo-0.0.3-py2.py3-none-any.whl

Hashes for wiki2neo-0.0.3-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`be81aa623ad52ead3415a26bb2fcaa3ad1e72173c516f9f2512485a163c01090`
MD5	`c56855c07b4a7202f87ae7ae39a1401c`
BLAKE2b-256	`a95a93dc8634b60808a00fa2b8eab2e54e48fa396817c27c1e653ed11fbf7885`