scrapy util
Project description
Scrapy util
基于scrapy 的一些扩展
启用数据收集功能
此功能配合 spider-admin-pro 使用
# 设置收集运行日志的路径,会以post方式提交json数据
STATS_COLLECTION_URL = "http://127.0.0.1:5001/api/collection"
# 启用数据收集扩展
EXTENSIONS = {
# ===========================================
# 可选:如果收集到的时间是utc时间,可以使用本地时间扩展收集
'scrapy.extensions.corestats.CoreStats': None,
'scrapy_util.extensions.LocaltimeCoreStats': 0,
# ===========================================
# 可选,打印程序运行时长
'scrapy_util.extensions.ShowDurationExtension': 100,
# 启用数据收集扩展
'scrapy_util.extensions.StatsCollectorExtension': 100
}
使用脚本Spider
# -*- coding: utf-8 -*-
from scrapy import cmdline
from scrapy_util.spiders import ScriptSpider
class BaiduScriptSpider(ScriptSpider):
name = 'baidu_script'
def execute(self):
print("hi")
if __name__ == '__main__':
cmdline.execute('scrapy crawl baidu_script'.split())
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrapy-util-0.0.8.tar.gz
(4.9 kB
view hashes)
Built Distribution
Close
Hashes for scrapy_util-0.0.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cacfef22e1d8f6d8c2c47268cf310d938324ac8d0abcbb5c01d4b84d73cddf31 |
|
MD5 | 9ca4b835497edfe4070bc707aa3c276d |
|
BLAKE2b-256 | b01af585a9fc39c24c119ad4edd624b9c18c4ca29b1cafda5f8c4e43d41c0344 |