cloudscraper
A Python module to bypass Cloudflare's anti-bot page.
Description
cloudscraper
A simple Python module to bypass Cloudflare's anti-bot page (also known as "I'm Under Attack Mode", or IUAM), implemented with Requests. Cloudflare changes their techniques periodically, so I will update this repo frequently.
This can be useful if you wish to scrape or crawl a website protected with Cloudflare. Cloudflare's anti-bot page currently just checks if the client supports Javascript, though they may add additional techniques in the future.
Due to Cloudflare continually changing and hardening their protection page, cloudscraper requires a JavaScript Engine/interpreter to solve Javascript challenges. This allows the script to easily impersonate a regular web browser without explicitly deobfuscating and parsing Cloudflare's Javascript.
For reference, this is the default message Cloudflare uses for these sorts of pages:
Checking your browser before accessing website.com.
This process is automatic. Your browser will redirect to your requested content shortly.
Please allow up to 5 seconds...
Any script using cloudscraper will sleep for ~5 seconds for the first visit to any site with Cloudflare anti-bots enabled, though no delay will occur after the first request.
Donations
If you feel like showing your love and/or appreciation for this project, then how about shouting me a coffee or beer :)
<a href="https://buymeacoff.ee/venomous" target="_blank"><img src="https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png" alt="Buy Me A Coffee" style="height: 41px !important;width: 174px !important;box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;-webkit-box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;" ></a>
Installation
Simply run pip install cloudscraper. The PyPI package is at https://pypi.python.org/pypi/cloudscraper/
Alternatively, clone this repository and run python setup.py install.
Dependencies
- Python 3.x
- Requests >= 2.9.2
- requests_toolbelt >= 0.9.1
python setup.py install will install the Python dependencies automatically. The javascript interpreters and/or engines you decide to use are the only things you need to install yourself, excluding js2py which is part of the requirements as the default.
Javascript Interpreters and Engines
We support the following Javascript interpreters/engines.
- ChakraCore: Library binaries can also be located here.
- js2py: >=0.67
- native: Self made native python solver (Default)
- Node.js
- V8: We use Sony's v8eval() python module.
Usage
The simplest way to use cloudscraper is by calling create_scraper().
import cloudscraper
scraper = cloudscraper.create_scraper() # returns a CloudScraper instance
# Or: scraper = cloudscraper.CloudScraper() # CloudScraper inherits from requests.Session
print(scraper.get("http://somesite.com").text) # => "<!DOCTYPE html><html><head>..."
That's it...
Any requests made from this session object to websites protected by Cloudflare anti-bot will be handled automatically. Websites not using Cloudflare will be treated normally. You don't need to configure or call anything further, and you can effectively treat all websites as if they're not protected with anything.
You use cloudscraper exactly the same way you use Requests. cloudScraper works identically to a Requests Session object, just instead of calling requests.get() or requests.post(), you call scraper.get() or scraper.post().
Consult Requests' documentation for more information.
Options
Disable Cloudflare V1
Description
If you don't want to even attempt Cloudflare v1 (Deprecated) solving..
Parameters
| Parameter | Value | Default |
|---|---|---|
| disableCloudflareV1 | (boolean) | False |
Example
scraper = cloudscraper.create_scraper(disableCloudflareV1=True)
Brotli
Description
Brotli decompression support has been added, and it is enabled by default.
Parameters
| Parameter | Value | Default |
|---|---|---|
| allow_brotli | (boolean) | True |
Example
scraper = cloudscraper.create_scraper(allow_brotli=False)
Browser / User-Agent Filtering
Description
Control how and which User-Agent is "randomly" selected.
Parameters
Can be passed as an argument to create_scraper(), get_tokens(), get_cookie_string().
| Parameter | Value | Default |
|---|---|---|
| browser | (string) chrome or firefox | None |
Or
| Parameter | Value | Default |
|---|---|---|
| browser | (dict) |
browser dict Parameters
| Parameter | Value | Default |
|---|---|---|
| browser | (string) chrome or firefox | None |
| mobile | (boolean) | True |
| desktop | (boolean) | True |
| platform | (string) 'linux', 'windows', 'darwin', 'android', 'ios' | None |
| custom | (string) | None |
Example
scraper = cloudscraper.create_scraper(browser='chrome')
or
# will give you only mobile chrome User-Agents on Android
scraper = cloudscraper.create_scraper(
browser={
'browser': 'chrome',
'platform': 'android',
'desktop': False
}
)
# will give you only desktop firefox User-Agents on Windows
scraper = cloudscraper.create_scraper(
browser={
'browser': 'firefox',
'platform': 'windows',
'mobile': False
}
)
# Custom will also try find the user-agent string in the browsers.json,
# If a match is found, it will use the headers and cipherSuite from that "browser",
# Otherwise a generic set of headers and cipherSuite will be used.
scraper = cloudscraper.create_scraper(
browser={
'custom': 'ScraperBot/1.0',
}
)
Debug
Description
Prints out header and content information of the request for debugging.
Parameters
Can be set as an attribute via your cloudscraper object or passed as an argument to create_scraper(), get_tokens(), get_cookie_string().
| Parameter | Value | Default |
|---|---|---|
| debug | (boolean) | False |
Example
scraper = cloudscraper.create_scraper(debug=True)
Delays
Description
Cloudflare IUAM challenge requires the browser to wait ~5 seconds before submitting the challenge answer, If you would like to override this delay.
Parameters
Can be set as an attribute via your cloudscraper object or passed as an argument to create_scraper(), get_tokens(), get_cookie_string().
| Parameter | Value | Default |
|---|---|---|
| delay | (float) | extracted from IUAM page |
Example
scraper = cloudscraper.create_scraper(delay=10)
Existing session
Description:
If you already have an existing Requests session, you can pass it to the function create_scraper() to continue using that session.
Parameters
| Parameter | Value | Default |
|---|---|---|
| sess | (requests.session) | None |
Example
session = requests.session()
scraper = cloudscraper.create_scraper(sess=session)
Note
Unfortunately, not all of Requests session attributes are easily transferable, so if you run into problems with this,
You should replace your initial session initialization call
From:
sess = requests.session()
To:
sess = cloudscraper.create_scraper()
JavaScript Engines and Interpreters
Description
cloudscraper currently supports the following JavaScript Engines/Interpreters
- ChakraCore
- js2py
- native: Self made native python solver (Default)
- Node.js
- V8
Parameters
Can be set as an attribute via your cloudscraper object or passed as an argument to create_scraper(), get_tokens(), get_cookie_string().
| Parameter | Value | Default |
|---|---|---|
| interpreter | (string) | native |
Example
scraper = cloudscraper.create_scraper(interpreter='nodejs')
3rd Party Captcha Solvers
Description
cloudscraper currently supports the following 3rd party Captcha solvers, should you require them.
- 2captcha
- anticaptcha
- CapSolver
- CapMonster Cloud
- deathbycaptcha
- 9kw
- return_response
Note
I am working on adding more 3rd party solvers, if you wish to have a service added that is not currently supported, please raise a support ticket on github.
Required Parameters
Can be set as an attribute via your cloudscraper object or passed as an argument to create_scraper(), `get_tokens