Keywords: Selenium | ChromeDriver | Detection Evasion | Web Automation | Browser Fingerprinting
Abstract: This paper provides an in-depth analysis of how websites detect Selenium with ChromeDriver, focusing on evasion techniques through modifying specific strings in ChromeDriver binary files. It details the practical steps using Vim and Perl tools to alter the cdc_ string and validates the modification effectiveness. Additional detection mechanisms and countermeasures are also discussed, offering valuable guidance for web automation testing.
Technical Principles of Selenium Detection
The core mechanism of website detection for Selenium with ChromeDriver lies in identifying specific JavaScript variables and document properties. When Selenium controls the browser, it injects certain identifiers into the browser environment, which become key detection targets.
Technical analysis reveals that detection scripts typically check for the following types of identifiers: variables in the window object containing keywords like "selenium" or "webdriver", and document properties named $cdc_ and $wdc_. The presence of these identifiers indicates that the browser is being controlled by automation tools.
Key Detection Points in ChromeDriver
In the specific implementation of ChromeDriver, the $cdc_ string serves as a crucial detection marker. This string appears in the ChromeDriver binary file and is used for internal cache management. When websites detect this specific string, they can confirm that the user is employing a Selenium-driven browser.
Technical examination shows that the call_function.js file in ChromeDriver source code contains the following key function:
function getPageCache(opt_doc) {
var doc = opt_doc || document;
var key = '$cdc_asdjflasutopfhvcZLmcfl_';
if (!(key in doc))
doc[key] = new Cache();
return doc[key];
}
This function creates a property on the document object starting with $cdc_, which is precisely what detection scripts look for.
Practical Methods for Modifying ChromeDriver
By altering the cdc_ string in the ChromeDriver binary file, detection can be effectively evaded. This method does not require recompiling the source code and is relatively straightforward to implement.
Using Vim Editor for Modification
Vim is a powerful text editor that can handle binary files. The operational steps are as follows:
vim -b /path/to/chromedriver
After opening the file, execute the replacement command:
:%s/cdc_/dog_/g
Here, cdc_ is replaced with dog_. It is crucial that the replacement string has the same length as the original; otherwise, ChromeDriver may malfunction. Save and exit using the :wq! command after modification.
Using Perl Script for Modification
Perl offers a more concise command-line approach:
perl -pi -e 's/cdc_/dog_/g' /path/to/chromedriver
This command directly modifies all occurrences of cdc_ in the file, again ensuring the replacement string length matches.
Verification and Effectiveness Testing
After modification, verification is necessary to ensure success:
grep "cdc_" /path/to/chromedriver
If the command produces no output, all cdc_ occurrences have been replaced. Then, start the modified ChromeDriver for testing; if no "killed" prompt appears, the modification is successful.
Practical tests demonstrate that after this modification, websites previously able to detect Selenium can no longer identify the automated browser. For instance, on sites like StubHub, the original ChromeDriver would be detected and blocked immediately, whereas the modified version allows normal access.
Other Detection Mechanisms and Countermeasures
Beyond $cdc_ detection, websites may employ additional techniques:
JavaScript Environment Detection
Detection scripts examine a range of specific JavaScript properties:
var documentDetectionKeys = [
"__webdriver_evaluate",
"__selenium_evaluate",
"__webdriver_script_function",
// More detection keys...
];
var windowDetectionKeys = [
"_phantom",
"__nightmare",
"_selenium",
// More detection keys...
];
User Agent and Browser Fingerprinting
Although user agents and browser fingerprints may be identical in Selenium and regular Chrome, subtle differences can still be detected. Further evasion can be achieved through:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--disable-blink-features=AutomationControlled')
driver = webdriver.Chrome(options=options)
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
Technical Implementation Considerations
When implementing these evasion strategies, several points should be noted:
Always back up the original ChromeDriver file before modification to prevent loss of functionality if changes fail. The choice of replacement string should avoid common automation-related terms, preferably using random or irrelevant strings.
Even if Selenium detection is successfully evaded, websites might still identify automated traffic through other behavioral analysis techniques, so combining multiple strategies enhances success rates.
Industry Applications and Ethical Considerations
These techniques are primarily applied in legitimate scenarios such as web automation testing, data collection, and monitoring. In practice, it is essential to adhere to website terms of service and relevant laws to avoid imposing unnecessary burdens on target sites.
For test engineers, understanding these detection and evasion mechanisms aids in developing more stable automation scripts. For security researchers, this knowledge supports the analysis of malicious crawler detection and defense technologies.