Implementation and Analysis of Batch URL Status Code Checking Script Using Bash and cURL

Keywords: Bash scripting | cURL | HTTP status code checking

Abstract: This article provides an in-depth exploration of technical solutions for batch checking URL HTTP status codes using Bash scripts combined with the cURL tool. By analyzing key parameters such as --write-out and --head from the best answer, it explains how to efficiently retrieve status codes and handle server configuration anomalies. The article also compares alternative wget approaches, offering complete script implementations and performance optimization recommendations suitable for system administrators and developers.

In web development and system maintenance, regularly checking URL accessibility is a common requirement. HTTP status codes provide a quick way to determine resource status, such as 200 for success, 404 for not found, and 500 for server internal errors. Based on high-scoring Stack Overflow answers, this article details solutions for batch detection using Bash and cURL.

cURL Core Parameter Analysis

cURL is a powerful command-line tool supporting multiple protocols. For HTTP status code retrieval, the key parameter combination is:

curl -o /dev/null --silent --head --write-out '%{http_code}\n' <url>

-o /dev/null: Redirects response body output to null device, avoiding unnecessary output
--silent: Silent mode, hides progress and error messages
--head: Sends HEAD request instead of GET, retrieving only response headers to reduce data transfer
--write-out '%{http_code}\n': Outputs formatted string, with %{http_code} variable extracting the status code

This approach's advantage lies in directly obtaining standard status codes returned by the server, avoiding the complexity of parsing response bodies.

Complete Bash Script Implementation

Extending single URL detection to batch processing requires combining Bash loop structures. The following script reads URL lists from a file and processes them line by line:

#!/bin/bash
while read LINE; do
  curl -o /dev/null --silent --head --write-out "%{http_code} $LINE\n" "$LINE"
done < url-list.txt

Script execution flow:

Uses while read LINE loop to read each line from url-list.txt
Executes cURL command for each URL, outputting format as "status code URL"
Provides URL list via input redirection < url-list.txt

This implementation is simple and intuitive but has performance bottlenecks: each URL requires an independent cURL process and TCP connection, creating significant fork and connection overhead when processing large numbers of URLs.

Server Configuration Anomaly Handling

In practical applications, server configuration anomalies may occur, such as pages displaying "404 not found" but returning 200 status codes. This is typically due to server misconfiguration or custom error pages. While cURL's --write-out method cannot directly detect content, it can be enhanced as follows:

#!/bin/bash
while read LINE; do
  CODE=$(curl -o /dev/null --silent --head --write-out '%{http_code}' "$LINE")
  if [ "$CODE" = "200" ]; then
    # Additional content verification for 200 status codes
    if curl --silent "$LINE" | grep -q "404"; then
      echo "WARNING: 200 but contains 404 - $LINE"
    else
      echo "$CODE $LINE"
    fi
  else
    echo "$CODE $LINE"
  fi
done < url-list.txt

This enhanced version performs secondary GET requests for URLs returning 200 status codes, checking if response bodies contain "404" text. Although this increases network overhead, it identifies configuration anomalies.

Alternative Approach: wget Method

Besides cURL, wget is another common command-line tool. The following command retrieves HTTP status codes:

wget --spider -S "http://example.com" 2>&1 | grep "HTTP/" | awk '{print $2}'

Parameter explanation:

--spider: Check mode, does not download files
-S: Displays server response headers
2>&1: Redirects standard error to standard output, as wget outputs header information to stderr
grep "HTTP/": Filters HTTP status lines
awk '{print $2}': Extracts status code (second field)

Compared to cURL, the wget method requires more pipeline processing and text parsing, but some systems may have wget pre-installed without cURL.

Performance Optimization Recommendations

For large-scale URL detection, consider these optimization strategies:

Parallel Processing: Use xargs -P or GNU parallel tools to execute multiple cURL processes concurrently
Connection Reuse: cURL supports --next parameter for sending multiple requests in a single connection, but requires complex parameter configuration
Timeout Control: Add --max-time parameter to avoid long waits for unresponsive servers
Result Caching: Save detection results to files to avoid repeated checks of the same URLs

Here's a simple parallel processing example:

#!/bin/bash
cat url-list.txt | xargs -P 10 -I {} curl -o /dev/null --silent --head --max-time 10 --write-out "%{http_code} {}\n" "{}"

This command uses xargs -P 10 to run 10 cURL processes simultaneously, with --max-time 10 setting a 10-second timeout.

Application Scenarios and Extensions

URL status code detection has important applications in these scenarios:

Website Monitoring: Regularly checking availability of critical pages
Link Validation: Verifying validity of external links in documents or websites
API Health Checks: Monitoring status of REST API endpoints
Crawler Preprocessing: Filtering invalid links before crawling

Script functionality can be further extended, such as:

Adding redirect tracking (cURL's --location parameter)
Supporting different HTTP methods (GET, POST, etc.)
Integration into CI/CD pipelines
Generating visual reports or alerts

By reasonably combining Bash scripts and command-line tools, flexible and efficient URL monitoring solutions can be constructed.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.