In-depth Analysis and Solutions for Elasticsearch Index Read-Only Due to Disk Watermark Exceedance

Keywords: Elasticsearch | Disk Watermark | Read-Only Index | cluster_block_exception | Disk Space Management

Abstract: This article provides a comprehensive analysis of the cluster_block_exception error in Elasticsearch, explaining the disk watermark mechanism and its impact on index states. Through practical examples, it demonstrates how Elasticsearch automatically sets indices to read-only mode when the flood stage disk watermark exceeds the 95% threshold. The paper presents two main solutions: freeing up disk space with manual read-only lock removal, and adjusting disk watermark configuration parameters. It also discusses different handling strategies for production versus development environments, providing specific curl command examples and configuration modification methods.

Problem Phenomenon and Error Analysis

When performing document write operations in Elasticsearch, developers often encounter the cluster_block_exception [FORBIDDEN/12/index read-only / allow delete (api)] error. This error indicates that the Elasticsearch cluster has entered read-only mode, preventing normal write operations. In Elasticsearch log files, warning messages like flood stage disk watermark [95%] exceeded ... all indices on this node will marked read-only typically accompany this error.

Disk Watermark Mechanism Explanation

Elasticsearch incorporates a comprehensive disk space management mechanism that monitors disk usage through three key watermark thresholds: low watermark, high watermark, and flood stage watermark. These watermarks can be configured based on either percentage of disk space or absolute capacity.

By default, the flood stage watermark is set to 95%, meaning Elasticsearch triggers protective measures when disk usage reaches 95%. This design is based on an important consideration: maintaining adequate free space is crucial for system stability and performance optimization, even on large-capacity disks. For example, on a 1TB disk, the 95% watermark corresponds to requiring at least 50GB of available space.

Solution One: Freeing Up Disk Space

The most direct solution involves cleaning up disk space to ensure available space exceeds the flood stage watermark threshold. However, it's important to note that even when disk space returns to safe levels, Elasticsearch does not automatically remove the read-only status from indices. Developers need to manually execute the following command to reset the read-only attribute of indices:

curl -XPUT -H "Content-Type: application/json" https://[YOUR_ELASTICSEARCH_ENDPOINT]:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

This command uses Elasticsearch's REST API to modify settings for all indices, resetting the index.blocks.read_only_allow_delete property to null, thereby removing the read-only restriction.

Solution Two: Adjusting Watermark Configuration

In certain scenarios, particularly in development environments or for specific business requirements, adjusting the default watermark configuration may be more appropriate. Elasticsearch allows dynamic modification of these parameters through the cluster settings API:

PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "100gb",
    "cluster.routing.allocation.disk.watermark.high": "50gb",
    "cluster.routing.allocation.disk.watermark.flood_stage": "10gb",
    "cluster.info.update.interval": "1m"
  }
}

In this configuration example, we switch the watermarks from percentage-based to absolute capacity-based. The low watermark is set to 100GB, high watermark to 50GB, and flood stage to 10GB. Additionally, the cluster information update interval is adjusted to 1 minute to ensure timely monitoring of disk status.

Environmental Differences and Best Practices

The choice between solutions depends on the specific usage environment. In production environments, prioritizing disk space cleanup is recommended as it aligns with the system's design intent—protecting data integrity. In development or testing environments, appropriately adjusting watermark configurations can avoid frequent maintenance operations.

Referencing actual cases, Elasticsearch instances running in Docker environments are equally affected by these mechanisms. As shown in the reference article, even when containers are allocated sufficient memory (such as 16GB), insufficient disk space will still trigger read-only protection. This reminds us that in containerized deployments, we must not only focus on memory configuration but also ensure persistent storage has adequate capacity.

Preventive Measures and Monitoring Recommendations

To prevent recurrence of similar issues, establishing a comprehensive disk space monitoring system is advised. Real-time tracking of disk usage can be achieved through Elasticsearch's built-in monitoring features or third-party monitoring tools. Additionally, regular cleanup of unnecessary index data, optimization of index mappings, and enabling index lifecycle management (ILM) can effectively prevent disk space shortages.

For large-scale production environments, implementing shard allocation filtering strategies to distribute indices across different data nodes can prevent excessive disk pressure on individual nodes. This distributed storage approach not only enhances system fault tolerance but also better balances disk load.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.