Keywords: Bash scripting | cURL | HTTP status code checking
Abstract: This article provides an in-depth exploration of technical solutions for batch checking URL HTTP status codes using Bash scripts combined with the cURL tool. By analyzing key parameters such as --write-out and --head from the best answer, it explains how to efficiently retrieve status codes and handle server configuration anomalies. The article also compares alternative wget approaches, offering complete script implementations and performance optimization recommendations suitable for system administrators and developers.
In web development and system maintenance, regularly checking URL accessibility is a common requirement. HTTP status codes provide a quick way to determine resource status, such as 200 for success, 404 for not found, and 500 for server internal errors. Based on high-scoring Stack Overflow answers, this article details solutions for batch detection using Bash and cURL.
cURL Core Parameter Analysis
cURL is a powerful command-line tool supporting multiple protocols. For HTTP status code retrieval, the key parameter combination is:
curl -o /dev/null --silent --head --write-out '%{http_code}\n' <url>
-o /dev/null: Redirects response body output to null device, avoiding unnecessary output--silent: Silent mode, hides progress and error messages--head: Sends HEAD request instead of GET, retrieving only response headers to reduce data transfer--write-out '%{http_code}\n': Outputs formatted string, with%{http_code}variable extracting the status code
This approach's advantage lies in directly obtaining standard status codes returned by the server, avoiding the complexity of parsing response bodies.
Complete Bash Script Implementation
Extending single URL detection to batch processing requires combining Bash loop structures. The following script reads URL lists from a file and processes them line by line:
#!/bin/bash
while read LINE; do
curl -o /dev/null --silent --head --write-out "%{http_code} $LINE\n" "$LINE"
done < url-list.txt
Script execution flow:
- Uses
while read LINEloop to read each line fromurl-list.txt - Executes cURL command for each URL, outputting format as "status code URL"
- Provides URL list via input redirection
< url-list.txt
This implementation is simple and intuitive but has performance bottlenecks: each URL requires an independent cURL process and TCP connection, creating significant fork and connection overhead when processing large numbers of URLs.
Server Configuration Anomaly Handling
In practical applications, server configuration anomalies may occur, such as pages displaying "404 not found" but returning 200 status codes. This is typically due to server misconfiguration or custom error pages. While cURL's --write-out method cannot directly detect content, it can be enhanced as follows:
#!/bin/bash
while read LINE; do
CODE=$(curl -o /dev/null --silent --head --write-out '%{http_code}' "$LINE")
if [ "$CODE" = "200" ]; then
# Additional content verification for 200 status codes
if curl --silent "$LINE" | grep -q "404"; then
echo "WARNING: 200 but contains 404 - $LINE"
else
echo "$CODE $LINE"
fi
else
echo "$CODE $LINE"
fi
done < url-list.txt
This enhanced version performs secondary GET requests for URLs returning 200 status codes, checking if response bodies contain "404" text. Although this increases network overhead, it identifies configuration anomalies.
Alternative Approach: wget Method
Besides cURL, wget is another common command-line tool. The following command retrieves HTTP status codes:
wget --spider -S "http://example.com" 2>&1 | grep "HTTP/" | awk '{print $2}'
Parameter explanation:
--spider: Check mode, does not download files-S: Displays server response headers2>&1: Redirects standard error to standard output, as wget outputs header information to stderrgrep "HTTP/": Filters HTTP status linesawk '{print $2}': Extracts status code (second field)
Compared to cURL, the wget method requires more pipeline processing and text parsing, but some systems may have wget pre-installed without cURL.
Performance Optimization Recommendations
For large-scale URL detection, consider these optimization strategies:
- Parallel Processing: Use
xargs -Por GNU parallel tools to execute multiple cURL processes concurrently - Connection Reuse: cURL supports
--nextparameter for sending multiple requests in a single connection, but requires complex parameter configuration - Timeout Control: Add
--max-timeparameter to avoid long waits for unresponsive servers - Result Caching: Save detection results to files to avoid repeated checks of the same URLs
Here's a simple parallel processing example:
#!/bin/bash
cat url-list.txt | xargs -P 10 -I {} curl -o /dev/null --silent --head --max-time 10 --write-out "%{http_code} {}\n" "{}"
This command uses xargs -P 10 to run 10 cURL processes simultaneously, with --max-time 10 setting a 10-second timeout.
Application Scenarios and Extensions
URL status code detection has important applications in these scenarios:
- Website Monitoring: Regularly checking availability of critical pages
- Link Validation: Verifying validity of external links in documents or websites
- API Health Checks: Monitoring status of REST API endpoints
- Crawler Preprocessing: Filtering invalid links before crawling
Script functionality can be further extended, such as:
- Adding redirect tracking (cURL's
--locationparameter) - Supporting different HTTP methods (GET, POST, etc.)
- Integration into CI/CD pipelines
- Generating visual reports or alerts
By reasonably combining Bash scripts and command-line tools, flexible and efficient URL monitoring solutions can be constructed.