Scrape top GitHub repositories and users based on keyword
Project description
Top Github Users Scraper
Scrape top Github repositories and users based on keywords.
Installation
pip install top-github-scraper
Usage
Get Top Github Repositories' URLs
from top_github_scraper import get_top_urls
get_top_repos(keyword="machine learning", stop_page=20)
After running the script above, a file named
top_repo_urls_<keyword>_<start_page>_<end_page>.json
will be saved to your current directory.
Get Top Github Repositories' Information
from top_github_scraper import get_top_urls
get_top_urls("machine learning", stop_page=20)
After running the script above, 2 files named
top_repo_urls_<keyword>_<start_page>_<end_page>.json
top_repo_info_<keyword>_<start_page>_<end_page>.json
will be saved to your current directory.
Get Top Github Users' Profiles
from top_github_scraper import get_top_users
get_top_users("machine learning", stop_page=20)
After running the script above, 3 files named
top_repo_urls_<keyword>_<start_page>_<end_page>.json
top_repo_info_<keyword>_<start_page>_<end_page>.json
top_user_info_<keyword>_<start_page>_<end_page>.csv
will be saved to your current directory.
Parameters
- get_top_urls
keyword
: str Keyword to search for (.i.e, machine learning)save_path
: str, optional where to save the output file, by default"top_repo_urls"
start_page
: int, optional page number to start scraping from, by default0
stop_page
: int, optional page number of the last page to scrape, by default50
- get_top_repos
keyword
: str Keyword to search for (.i.e, machine learning)max_n_top_contributors
: int number of top contributors in each repository to scrape from, by default10
start_page
: int, optional page number to start scraping from, by default0
stop_page
: int, optional page number of the last page to scrape, by default50
url_save_path
: str, optional where to save the output file of URLs, by default"top_repo_urls"
repo_save_path
: str, optional where to save the output file of repositories' information, by default"top_repo_info"
- get_top_users
keyword
: str Keyword to search for (.i.e, machine learning)max_n_top_contributors
: int number of top contributors in each repository to scrape from, by default10
start_page
: int, optional page number to start scraping from, by default0
stop_page
: int, optional page number of the last page to scrape, by default50
url_save_path
: str, optional where to save the output file of URLs, by default"top_repo_urls"
repo_save_path
: str, optional where to save the output file of repositories' information, by default"top_repo_info"
user_save_path
: str, optional where to save the output file of users' profiles, by default"top_user_info"
How the Data is Scraped
top-github-scraper
scrapes the owners as well as the contributors of the top repositories that pop up in the search when searching for a specific keyword on GitHub.
For each user, top-github-scraper
scrapes 16 data points:
login
: usernameurl
: URL of the usercontributions
: Number of contributions to the repository that the user is scraped fromstargazers_count
: Number of stars of the repository that the user is scraped fromforks_count
: Number of forks of the repository that the user is scraped fromtype
: Whether this account is a user or an organizationname
: Name of the usercompany
: User's companylocation
: User's locationemail
: User's emailhireable
: Whether the user is hireablebio
: Short description of the userpublic_repos
: Number of public repositories the user has (including forked repositories)public_gists
: Number of public repositories the user has (including forked gists)followers
: Number of followers the user hasfollowing
: Number of people the user is following
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for top-github-scraper-0.1.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5eb3bd7f529c802d26aea9aee8fcd69691bf2c4d73df87413acbef361de4899 |
|
MD5 | abcd66b22c57f3084c581953426ae4f5 |
|
BLAKE2b-256 | 5879e6a0550d8eb442042cb4adcd670e0e5ae1fef055304b3ed136a6557ec0e9 |
Hashes for top_github_scraper-0.1.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 699950c48a9120eda928b2e1d3a653e5f90baf58b392154d1d07e610c054bef1 |
|
MD5 | ee613d7cd024e5111233a40b22675c59 |
|
BLAKE2b-256 | 2f10903c2c0a7df0a96ffde40f3eddcdb76edfa76b6763e486d589d342a1ee56 |