Scrapy download handler that can impersonate browser fingerprints
Project description
scrapy-impersonate
scrapy-impersonate
is a Scrapy download handler. This project integrates curl_cffi to perform HTTP requests, so it can impersonate browsers' TLS signatures or JA3 fingerprints.
Installation
pip install scrapy-impersonate
Activation
Replace the default http
and/or https
Download Handlers through DOWNLOAD_HANDLERS
DOWNLOAD_HANDLERS = {
"http": "scrapy_impersonate.ImpersonateDownloadHandler",
"https": "scrapy_impersonate.ImpersonateDownloadHandler",
}
Also, be sure to install the asyncio-based Twisted reactor:
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
Basic usage
Set the impersonate
Request.meta key to download a request using curl_cffi
:
import scrapy
class ImpersonateSpider(scrapy.Spider):
name = "impersonate_spider"
custom_settings = {
"DOWNLOAD_HANDLERS": {
"http": "scrapy_impersonate.ImpersonateDownloadHandler",
"https": "scrapy_impersonate.ImpersonateDownloadHandler",
},
"TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
}
def start_requests(self):
for browser in ["chrome110", "edge99", "safari15_5"]:
yield scrapy.Request(
"https://tls.browserleaks.com/json",
dont_filter=True,
meta={"impersonate": browser},
)
def parse(self, response):
# ja3_hash: 773906b0efdefa24a7f2b8eb6985bf37
# ja3_hash: cd08e31494f9531f560d64c695473da9
# ja3_hash: 2fe1311860bc318fc7f9196556a2a6b9
return {"ja3_hash": response.json()["ja3_hash"]}
Supported browsers
The following browsers can be impersonated
Browser | Version | Build | OS | Name |
---|---|---|---|---|
99 | 99.0.4844.51 | Windows 10 | chrome99 |
|
100 | 100.0.4896.75 | Windows 10 | chrome100 |
|
101 | 101.0.4951.67 | Windows 10 | chrome101 |
|
104 | 104.0.5112.81 | Windows 10 | chrome104 |
|
107 | 107.0.5304.107 | Windows 10 | chrome107 |
|
110 | 110.0.5481.177 | Windows 10 | chrome110 |
|
99 | 99.0.4844.73 | Android 12 | chrome99_android |
|
99 | 99.0.1150.30 | Windows 10 | edge99 |
|
101 | 101.0.1210.47 | Windows 10 | edge101 |
|
15.3 | 16612.4.9.1.8 | MacOS Big Sur | safari15_3 |
|
15.5 | 17613.2.7.1.8 | MacOS Monterey | safari15_5 |
Thanks
This project is inspired by the following projects:
- curl_cffi - Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
- curl-impersonate - A special build of curl that can impersonate Chrome & Firefox
- scrapy-playwright - Playwright integration for Scrapy
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for scrapy-impersonate-1.1.0b2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4014cc905906643b4335437c6952294e53ae02d8e7467efb9d29bc5353f62cec |
|
MD5 | a4a349b816d958636f0ba0cbfd0d0bb9 |
|
BLAKE2b-256 | 6c4a3086049c7c4d12c7ab8ebbea707df1bab58ce9ac5058907f179bdfed24fa |
Close
Hashes for scrapy_impersonate-1.1.0b2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ec86de84f40372ca2198a55e348926e6530dbb05014131c9d9f851cd86baee2 |
|
MD5 | 9c504ebe34cd8862bcdbfc78b3c4ee9e |
|
BLAKE2b-256 | d8d5e39ee66e0462da60cbd6aa58f80318ebc7e036b6d3e60557fc142a6bbf05 |