Skip to main content

ruia_pyppeteer - A Ruia plugin for loading javascript - pyppeteer.

Project description

ruia-pyppeteer

A Ruia plugin for loading javascript

Notice: Works on ruia >= 0.4.9

Installation

pip install ruia_pyppeteer
# New features
pip install git+https://github.com/ruia-plugins/ruia-pyppeteer

Usage

ruia_pyppeteer will load js by using pyppeteer.

You need to pay attention when you use load_js, it will download a recent version of Chromium (~100MB). This only happens once.

Load JavaScript

import asyncio

from ruia_pyppeteer import PyppeteerRequest as Request

request = Request("https://www.jianshu.com/", load_js=True)
response = asyncio.get_event_loop().run_until_complete(request.fetch())
print(response)

Complete example

from ruia import AttrField, TextField, Item

from ruia_pyppeteer import PyppeteerSpider as Spider


class JianshuItem(Item):
    target_item = TextField(css_select='ul.list>li')
    author_name = TextField(css_select='a.name')
    author_url = AttrField(attr='href', css_select='a.name')

    async def clean_author_url(self, author_url):
        return f"https://www.jianshu.com{author_url}"


class JianshuSpider(Spider):
    start_urls = ['https://www.jianshu.com/']
    concurrency = 10

    async def parse(self, response):
        async for item in JianshuItem.get_items(html=response.html):
            # Loading js by using PyppeteerRequest
            print(item)


if __name__ == '__main__':
    JianshuSpider.start()

Enjoy it :)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ruia_pyppeteer-0.0.5.tar.gz (4.0 kB view hashes)

Uploaded Source

Built Distribution

ruia_pyppeteer-0.0.5-py3-none-any.whl (6.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page