a module for polling urls and stats from homepages
Project description
pageone ====== a module for polling urls and stats from homepages
Install
pip install pageone
Test
Requires nose
nosetests
Usage
pageone does two things: extract article urls from a site’s homepage and also uses selenium and phantomjs to find the relative positions of these urls.
To get stats about the positions of links, use link_stats:
from pageone import PageOne
p = PageOne(url='http://www.propublica.org/')
# get stats about links positions
for link in p.link_stats():
print link
This will return a list of dictionaries that look like this:
{'bucket': 4,
'datetime': datetime.datetime(2014, 6, 7, 16, 6, 3, 533818),
'font_size': 13,
'has_img': 1,
'headline': u'',
'homepage': 'http://www.propublica.org/',
'img_area': 3969,
'img_height': 63,
'img_src': u'http://www.propublica.org/images/ngen/gypsy_image_medium/mpmh_victory_drive_140x140_130514_1.jpg',
'img_width': 63,
'url': u'http://www.propublica.org/article/protect-service-members-defense-department-plans-broad-ban-high-cost-loans',
'x': 61,
'x_bucket': 1,
'y': 730,
'y_bucket': 4}
To get simply get all of the article urls on a homepage, use articles:
from pageone import PageOne
p = PageOne(url='http://www.propublica.org/')
for article in p.articles():
print article
If you want to get article urls from other sites, use incl_external:
from pageone import PageOne
p = PageOne(url='http://www.propublica.org/')
for article in p.articles(incl_external=True):
print article
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pageone-0.0.4.tar.gz
(4.5 kB
view hashes)
Built Distribution
Close
Hashes for pageone-0.0.4.macosx-10.9-intel.exe
Algorithm | Hash digest | |
---|---|---|
SHA256 | 532355d2908484bc92314b2b98e1bc4a577318e79a7fbf0a0b439a0e5ffc35e6 |
|
MD5 | 1f2488b2f4795b89c8f61c2d4044aac1 |
|
BLAKE2b-256 | fe1796b634a74eae7e243d3a11f45a70bd613f34a8fdff57748ed8c5e451ef5c |