Skip to main content

Extract social media links from websites

Project description

Extract Social Media

https://img.shields.io/pypi/v/extract-social-media.svg https://img.shields.io/pypi/pyversions/extract-social-media.svg https://img.shields.io/travis/fluquid/extract-social-media.svg Coverage Status Requirements Status

Extract social media links from websites.

Many websites reference their facebook, twitter, linkedin, youtube accounts and these can be invaluable to gather 360 degree information about a company.

This library allows to extract links or handles for the most commonly used international social media networks.

  • Free software: MIT license

  • Python versions: 2.7, 3.4+

Features

  • Extract social media links/handles from html content

  • Attempts to extract links/handles also from widgets, scripts, etc.

  • Supports most widely used social networks

    • facebook

    • linkedin

    • twitter

    • youtube

    • github

    • google plus

    • pinterest

    • instagram

    • snapchat

    • flipboard

    • flickr

    • weibo

    • periscope

    • telegram

    • soundcloud

    • feedburner

    • vimeo

    • slideshare

    • vkontakte

    • xing

Quickstart

import requests
from html_to_etree import parse_html_bytes
res = requests.get('https://techcrunch.com/contact/')
tree = parse_html_bytes(res.content, res.headers.get('content-type'))

set(find_links_tree(tree))

{'http://pinterest.com/techcrunch/',
 'http://www.youtube.com/user/techcrunch',
 'http://www.linkedin.com/company/techcrunch',
 'https://www.facebook.com/techcrunch',
 'https://flipboard.com/@techcrunch',
 'http://instagram.com/techcrunch',
 'https://plus.google.com/+TechCrunch',
 'https://instagram.com/techcrunch',
 'https://twitter.com/techcrunch'}

Caveats

  • currently finds all social media links on a page

    • need to look into finding most relevant links based on link location, link context, company name, etc.

Credits

This package was created with Cookiecutter and the fluquid/cookiecutter-pypackage project template.

History

0.2.0 (2017-06-08)

  • better test coverage

  • accepting data-href

0.1.0 (unreleased)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extract-social-media-0.2.0.tar.gz (18.0 kB view hashes)

Uploaded Source

Built Distribution

extract_social_media-0.2.0-py2.py3-none-any.whl (5.7 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page