Skip to main content

Query Language for Wikipedia

Project description

WikipediaQL: querying structured data from Wikipedia

WikipediaQL is an experimental query language and, executable script, and Python library for querying structured data from Wikipedia. It looks like this:

$ wikipedia_ql --page "Guardians of the Galaxy (film)" \
    '{
      page@title as "title";
      section[heading="Cast"] as "cast" >> {
          li >> text:matches("^(.+?) as (.+?):") >> {
              text-group[group=1] as "actor";
              text-group[group=2] as "character"
          }
      };
      section[heading="Critical response"] >> {
          sentence:contains("Rotten Tomatoes") as "RT ratings" >> {
              text:matches("\d+%") as "percent";
              text:matches("(\d+) (critic|review)") >> text-group[group=1] as "reviews";
              text:matches("[\d.]+/10") as "overall"
          }
      }
    }'

title: Guardians of the Galaxy (film)
RT ratings:
  overall: 7.8/10
  percent: 92%
  reviews: '334'
cast:
- actor: Chris Pratt
  character: Peter Quill / Star-Lord
- actor: Zoe Saldaña
  character: Gamora
...

Read full README.md on GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikipedia_ql-0.0.6.tar.gz (25.5 kB view hashes)

Uploaded Source

Built Distribution

wikipedia_ql-0.0.6-py3-none-any.whl (23.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page