Scrapy download handler that can impersonate browser fingerprints
Project description
scrapy-impersonate
scrapy-impersonate
is a Scrapy download handler. This project integrates curl_cffi to perform HTTP requests, so it can impersonate browsers' TLS signatures or JA3 fingerprints.
Installation
pip install git+http://github.com/jxlil/scrapy-impersonate
Activation
Replace the default http
and/or https
Download Handlers through DOWNLOAD_HANDLERS
DOWNLOAD_HANDLERS = {
"http": "scrapy_impersonate.ImpersonateDownloadHandler",
"https": "scrapy_impersonate.ImpersonateDownloadHandler",
}
Also, be sure to install the asyncio-based Twisted reactor:
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
Basic usage
Set the impersonate
Request.meta key to download a request using curl_cffi
:
import scrapy
class ImpersonateSpider(scrapy.Spider):
name = "impersonate_spider"
custom_settings = {
"DOWNLOAD_HANDLERS": {
"http": "scrapy_impersonate.ImpersonateDownloadHandler",
"https": "scrapy_impersonate.ImpersonateDownloadHandler",
},
"TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
}
def start_requests(self):
for browser in ["chrome110", "edge99", "safari15_5"]:
yield scrapy.Request(
"https://tls.browserleaks.com/json",
dont_filter=True,
meta={"impersonate": browser},
)
def parse(self, response):
# ja3_hash: 773906b0efdefa24a7f2b8eb6985bf37
# ja3_hash: cd08e31494f9531f560d64c695473da9
# ja3_hash: 2fe1311860bc318fc7f9196556a2a6b9
return {"ja3_hash": response.json()["ja3_hash"]}
In this case, a Chrome browser with version 110 (chrome110
) is being impersonated. Here you can find all the browsers that you can impersonate.
Thanks
This project is inspired by the following projects:
- curl_cffi - Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
- curl-impersonate - A special build of curl that can impersonate Chrome & Firefox
- scrapy-playwright - Playwright integration for Scrapy
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrapy-impersonate-1.0.0b1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46f8ecbc70e95ec25b65f5388218047cfd5bc197ea3b863cd3ab76aeff58f456 |
|
MD5 | e67db2ccc2c693c54d103047d968aa1e |
|
BLAKE2b-256 | 03becbe9252994b00e45f0c01a8233e8e6e240f7a0d011969f901fd318ea4f69 |
Hashes for scrapy_impersonate-1.0.0b1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac8027d9d6206cbf71df4568f33866add41ecfcb4ea82dfc3c39761c04dc0d4b |
|
MD5 | d42b3dfb8a6a29d4d2d771cf366d1325 |
|
BLAKE2b-256 | 59caff773583cf00a442e31cc5a981ee169d0738deac2f19876e127af438dd26 |