pystock-crawler

Crawl stock historical data.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python
- Python :: 2.7
Topic
- Internet :: WWW/HTTP
- Software Development :: Libraries :: Python Modules

Project description

https://badge.fury.io/py/pystock-crawler.png

https://travis-ci.org/eliangcs/pystock-crawler.png?branch=master

https://coveralls.io/repos/eliangcs/pystock-crawler/badge.png?branch=master

pystock-crawler is a utility for crawling historical data of US stocks, including:

Ticker symbols listed in NYSE and NASDAQ from NASDAQ
Daily prices from Yahoo Finance
Fundamentals from 10-Q and 10-K filings on SEC EDGAR

Example Output

NYSE ticker symbols:

DDD   3D Systems Corporation
MMM   3M Company
WBAI  500.com Limited
...

Apple’s daily prices:

symbol,date,open,high,low,close,volume,adj_close
AAPL,2014-04-28,572.80,595.75,572.55,594.09,23890900,594.09
AAPL,2014-04-25,564.53,571.99,563.96,571.94,13922800,571.94
AAPL,2014-04-24,568.21,570.00,560.73,567.77,27092600,567.77
...

Google’s fundamentals:

symbol,end_date,amend,period_focus,doc_type,revenues,op_income,net_income,eps_basic,eps_diluted,dividend,assets,cur_assets,cur_liab,cash,equity,cash_flow_op,cash_flow_inv,cash_flow_fin
GOOG,2009-06-30,False,Q2,10-Q,5522897000.0,1873894000.0,1484545000.0,4.7,4.66,0.0,35158760000.0,23834853000.0,2000962000.0,11911351000.0,31594856000.0,3858684000.0,-635974000.0,46354000.0
GOOG,2009-09-30,False,Q3,10-Q,5944851000.0,2073718000.0,1638975000.0,5.18,5.13,0.0,37702845000.0,26353544000.0,2321774000.0,12087115000.0,33721753000.0,6584667000.0,-3245963000.0,74851000.0
GOOG,2009-12-31,False,FY,10-K,23650563000.0,8312186000.0,6520448000.0,20.62,20.41,0.0,40496778000.0,29166958000.0,2747467000.0,10197588000.0,36004224000.0,9316198000.0,-8019205000.0,233412000.0
...

Installation

Prerequisites:

Python 2.7

pystock-crawler is based on Scrapy, so you will also need to install prerequisites such as lxml and libffi for Scrapy and its dependencies. See Scrapy’s installation guide for more details.

Install with virtualenv (recommended):

pip install pystock-crawler

Or do system-wide installation:

sudo pip install pystock-crawler

Quickstart

Example 1. Google’s and Yahoo’s daily prices ordered by date:

pystock-crawler prices GOOG,YHOO -o out.csv --sort

Example 2. Daily prices of all companies listed in ./symbols.txt:

pystock-crawler prices ./symbols.txt -o out.csv

Example 3. Facebook’s fundamentals during 2013:

pystock-crawler reports FB -o out.csv -s 20130101 -e 20131231

Example 4. Fundamentals all companies in ./nyse.txt and direct the logs to ./crawling.log:

pystock-crawler reports ./nyse.txt -o out.csv -l ./crawling.log

Example 5. All ticker symbols in NYSE and NASDAQ:

pystock-crawler symbols NYSE,NASDAQ -o out.txt

Usage

Type pystock-crawler -h to see command help:

Usage:
  pystock-crawler symbols <exchanges> (-o OUTPUT) [-l LOGFILE] [--sort]
  pystock-crawler prices <symbols> (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD] [-l LOGFILE] [--sort]
  pystock-crawler reports <symbols> (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD]  [-l LOGFILE] [--sort]
  pystock-crawler (-h | --help)
  pystock-crawler (-v | --version)

Options:
  -h --help     Show this screen
  -o OUTPUT     Output file
  -s YYYYMMDD   Start date [default: ]
  -e YYYYMMDD   End date [default: ]
  -l LOGFILE    Log output [default: ]
  --sort        Sort the result

There are three commands available:

pystock-crawler symbols grabs ticker symbol lists
pystock-crawler prices grabs daily prices
pystock-crawler reports grabs fundamentals

<exchanges> is a comma-separated string that specifies the stock exchanges you want to include. Only NYSE and NASDAQ are supported.

The output file of pystock-crawler symbols can be used for <symbols> argument in pystock-crawler prices and pystock-crawler reports commands.

<symbols> can be an inline string separated with commas or a text file that lists symbols line by line. For example, the inline string can be something like AAPL,GOOG,FB. And the text file may look like this:

# This line is comment
AAPL    Put anything you want here
GOOG    Since the text here is ignored
FB

Use -o to specify the output file. For pystock-crawler symbols command, the output format is a simple text file. For pystock-crawler prices and pystock-crawler reports the output format is CSV.

-l is where the crawling logs go to. If not specified, the logs go to stdout.

The rows in the output CSV file are in an arbitrary order by default. Use --sort to sort them by symbols and dates. But if you have a large output file, don’t use --sort because it will be slow and eat a lot of memory.

NOTE: The crawler stores HTTP cache in a directory named .scrapy under your current working directory. The cache helps speed up the crawling process next time your fetch same web pages again. The cache can be quite huge. If you don’t need it, just delete the .scrapy directory after you’ve done crawling.

Developer Guide

Installing Dependencies

pip install -r requirements.txt

Running Test

Install pytest, pytest-cov, and requests if you don’t have them:

pip install pytest pytest-cov requests

Then run the test:

py.test

This downloads the test data from from SEC EDGAR on the fly, so it will take some time and disk space. If you want to delete test data, just delete pystock_crawler/tests/sample_data directory.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python
- Python :: 2.7
Topic
- Internet :: WWW/HTTP
- Software Development :: Libraries :: Python Modules

Release history Release notifications | RSS feed

0.8.2

Oct 19, 2014

0.8.1

Oct 13, 2014

0.8.0

Oct 6, 2014

0.7.2

Aug 19, 2014

0.7.1

Aug 10, 2014

0.7.0

Aug 5, 2014

0.6.0

Jun 28, 2014

This version

0.5.0

May 12, 2014

0.4.0

Apr 29, 2014

0.3.3

Mar 26, 2014

0.3.2

Mar 25, 2014

0.3.1

Mar 18, 2014

0.3.0

Mar 18, 2014

0.2.0

Mar 15, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pystock-crawler-0.5.0.tar.gz (18.1 kB view hashes)

Uploaded May 12, 2014 Source

Hashes for pystock-crawler-0.5.0.tar.gz

Hashes for pystock-crawler-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`37e4f911de419775acbde3ead494a9e7f7f5f7b0c030cb296752320f046249b5`
MD5	`4147b832e449465cb7c40c02b0693169`
BLAKE2b-256	`c2758e4fe8c7492aebba7fc97a0b30a8cea556183996252d1f4dab44988f4833`