Analysis and Optimization of MySQL InnoDB Page Cleaner Warnings

Keywords: MySQL Optimization | InnoDB Page Cleaner | Performance Tuning | Dirty Page Management | I/O Optimization

Abstract: This paper provides an in-depth analysis of the 'page_cleaner: 1000ms intended loop took XXX ms' warning mechanism in MySQL InnoDB storage engine, examining its manifestations during high-load data import scenarios. The article elaborates on dirty page management, page cleaner thread operation principles, and the functional mechanism of the innodb_lru_scan_depth parameter. It presents comprehensive solutions based on hardware configuration and software tuning, demonstrating through practical cases how to optimize import performance by adjusting scan depth while discussing the impact of critical parameters like innodb_io_capacity and buffer pool configuration on system I/O performance.

Problem Phenomenon and Context

During large-scale data import operations in MySQL 5.7 environments, the following warning messages frequently appear in system logs:

InnoDB: page_cleaner: 1000ms intended loop took 4013ms. The settings might not be optimal. (flushed=1438 and evicted=0, during the time.)

This warning typically accompanies the "waiting for table flush" state, indicating performance bottlenecks in dirty page flushing. A common scenario involves importing 5GB SQL dump files on servers configured with 16GB RAM, 8-core processors, and CentOS 6 operating system.

Deep Analysis of InnoDB Page Cleaning Mechanism

The InnoDB storage engine employs a buffer pool mechanism to manage data pages. When data is modified, corresponding pages become "dirty pages" in memory. The page cleaner thread is responsible for periodically flushing these dirty pages to disk, ensuring data persistence and maintaining available buffer pool space.

The page cleaner executes a scanning loop every second with core responsibilities including:

Scanning dirty page lists in the buffer pool
Determining pages requiring refresh based on LRU algorithm
Writing dirty page data to disk
Updating page status to "clean"

The "1000ms intended loop took 4013ms" message in warnings indicates that the cleaning loop takes significantly longer than expected, usually caused by:

Dirty page generation rate exceeding disk I/O processing capacity
Mismatch between buffer pool configuration and workload
Insufficient disk subsystem performance

Key Parameter: innodb_lru_scan_depth

The innodb_lru_scan_depth parameter controls the page depth scanned by the cleaner during each loop. The default value of 1024 means the cleaner scans 1024 pages in the buffer pool to find dirty pages. In high-write load scenarios, excessive scan depth can cause:

-- Default configuration may cause performance issues
SET GLOBAL innodb_lru_scan_depth = 1024;  -- Default value

-- Optimized configuration
SET GLOBAL innodb_lru_scan_depth = 256;   -- Recommended adjustment

Reducing scan depth decreases the workload per loop, allowing the cleaner to complete refresh tasks more frequently. It's important to note that when multiple buffer pool instances are configured, each instance performs independent scanning, multiplying the total workload accordingly.

I/O Performance Tuning Strategies

Disk Subsystem Optimization

Page cleaning performance largely depends on disk I/O capability. For write-intensive workloads, recommendations include:

Using SSD or NVMe storage devices
Configuring RAID 10 arrays to improve write performance
Ensuring sufficient I/O bandwidth and low latency

InnoDB I/O Parameter Configuration

-- Setting I/O capacity parameters
innodb_io_capacity = 2000          -- Basic I/O throughput limit
innodb_io_capacity_max = 4000      -- Maximum I/O throughput limit

-- Adjusting flush-related parameters
innodb_max_dirty_pages_pct = 75    -- Maximum percentage of dirty pages in buffer pool
innodb_flush_method = O_DIRECT     -- Using direct I/O to reduce cache overhead

The innodb_io_capacity parameter defines the I/O operation capacity available for InnoDB background tasks. When the dirty page ratio exceeds the innodb_max_dirty_pages_pct threshold, the system automatically increases the flush rate, potentially triggering warnings again.

System Monitoring and Diagnosis

When page cleaner warnings appear, system-level monitoring should be conducted to identify bottlenecks:

-- Viewing current process status
SHOW PROCESSLIST;

-- Monitoring InnoDB status
SHOW ENGINE INNODB STATUS;

-- Checking buffer pool usage
SELECT 
    POOL_ID,
    POOL_SIZE,
    FREE_BUFFERS,
    DATABASE_PAGES,
    OLD_DATABASE_PAGES,
    MODIFIED_DATABASE_PAGES
FROM information_schema.INNODB_BUFFER_POOL_STATS;

Using system tools to monitor disk I/O utilization:

# Monitoring disk activity
atop -d

# Viewing MySQL error log
tail -f /var/log/mysql/error.log

Case Analysis and Solutions

In the 5GB data import case, the problem was resolved through these steps:

Problem Diagnosis: Confirmed warnings were caused by high-frequency dirty page generation with disk I/O as bottleneck
Parameter Adjustment: Reduced innodb_lru_scan_depth from 1024 to 256
Effect Verification: Import time significantly shortened, warning messages disappeared
Long-term Optimization: Adjusted buffer pool size and I/O parameters based on workload characteristics

For production environments with persistent warnings, considerations may include:

Upgrading hardware configuration (faster storage devices)
Implementing read-write separation architecture
Optimizing application write patterns
Conducting regular performance benchmarking

Best Practice Recommendations

Based on MySQL 5.7 practical experience, the following configuration strategies are recommended:

Buffer Pool Configuration: Set innodb_buffer_pool_size to 70-80% of available memory
Instance Count: Configure innodb_buffer_pool_instances appropriately based on CPU cores
Scan Depth: Typically set innodb_lru_scan_depth to 256 or lower
I/O Configuration: Adjust innodb_io_capacity series parameters according to storage performance
Monitoring Maintenance: Establish regular performance monitoring and parameter tuning mechanisms

It's particularly important to note that parameter adjustments should be based on actual workload testing, as different application scenarios may require different optimization strategies. For temporary large data import operations, parameters can be temporarily adjusted and restored after completion.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.