Efficient Retrieval of Keys and Values by Prefix in Redis: Methods and Performance Considerations

Keywords: Redis | Key-Value Query | SCAN Command | Performance Optimization | Hash Data Structure

Abstract: This article provides an in-depth exploration of techniques for retrieving all keys and their corresponding values with specific prefixes in Redis. It analyzes the limitations of the HGETALL command, introduces the basic usage of the KEYS command along with its performance risks in production environments, and elaborates on the SCAN command as a safer alternative. Through practical code examples, the article demonstrates complete solutions from simple queries to high-performance iteration, while discussing real-world applications of hash data structures and sorted sets in Redis.

Fundamental Principles of Redis Data Querying and Limitations of HGETALL

In Redis databases, data is typically stored as key-value pairs, where keys are often designed with specific naming patterns to facilitate efficient querying. A common requirement in practical applications is retrieving data based on key prefixes, such as obtaining all document information starting with "doc:". However, many developers initially attempt to use the HGETALL command with wildcards, which stems from a misunderstanding of Redis command semantics.

The HGETALL command is designed to retrieve all fields and values from a single hash key, with its function signature strictly requiring a specific key name as parameter. When developers attempt dbclient1.hgetall("doc:*", function(err, res) { ... }), Redis actually searches for a single hash key named "doc:*", rather than all keys matching the "doc:" prefix. Since such a key typically doesn't exist, the result naturally returns undefined or empty values. This misunderstanding highlights the importance of understanding the precise semantics of Redis commands—each command has specific parameter requirements and execution boundaries, and cannot simply accommodate wildcard patterns.

The KEYS Command: Basic Solution with Critical Performance Warnings

Redis provides the KEYS command to support pattern-based key queries, with basic syntax KEYS pattern, where pattern supports wildcards like "*" for any number of characters and "?" for single characters. To retrieve all keys starting with "doc:", one can execute KEYS doc:*, which returns a list of all keys matching this pattern.

However, the Redis official documentation issues a clear warning about the KEYS command: it should be used with extreme caution in production environments because it blocks the Redis server until all matching keys are scanned. In large databases containing millions of keys, the KEYS command can cause significant performance degradation or even service interruption. Its time complexity is O(N), where N is the total number of keys in the database, not just the matching keys. Therefore, the KEYS command is recommended only for debugging environments or special maintenance operations, and should never be part of regular application code.

The SCAN Command: Safe Iteration Solution for Production Environments

To address the performance issues of the KEYS command, Redis version 2.8 introduced the SCAN command family, providing a non-blocking, incremental key space iteration mechanism. The basic working principle of SCAN involves scanning in batches using a cursor, where each call returns a portion of matching keys and the next cursor value, with a cursor return of 0 indicating completion.

Below is a complete implementation example using the SCAN command to retrieve all keys with the "doc:" prefix:

function getAllKeysWithPrefix(prefix, callback) {
    var keys = [];
    var cursor = '0';
    
    function scan() {
        dbclient1.scan(cursor, 'MATCH', prefix + '*', 'COUNT', 100, function(err, result) {
            if (err) {
                callback(err, null);
                return;
            }
            
            cursor = result[0];
            keys = keys.concat(result[1]);
            
            if (cursor === '0') {
                // Iteration complete
                callback(null, keys);
            } else {
                // Continue iteration
                scan();
            }
        });
    }
    
    scan();
}

// Usage example
getAllKeysWithPrefix('doc:', function(err, keys) {
    if (err) {
        console.error('Scan failed:', err);
        return;
    }
    
    console.log('Found keys:', keys);
    
    // Retrieve all hash values
    keys.forEach(function(key) {
        dbclient1.hgetall(key, function(err, hashData) {
            if (err) {
                console.error('Failed to get hash:', err);
                return;
            }
            
            console.log('Key:', key, 'Data:', hashData);
        });
    });
});

The above code demonstrates several key design considerations: first, specifying key pattern matching conditions via the MATCH parameter; second, controlling the number of keys returned per iteration with the COUNT parameter (recommended values between 100-1000); and finally, implementing complete iteration through recursive calls. This approach ensures that even on large databases, there is no significant performance impact on the Redis server.

Data Structure Design and Query Optimization Strategies

The original problem's data structure design includes two important components: hash structures storing document metadata, and sorted sets maintaining cache ordering. This design pattern is common in Redis applications but requires appropriate query strategies.

For batch retrieval of hash data, after obtaining all target keys, pipeline technology can optimize performance:

// Using pipelines for batch hash data retrieval
function getHashDataBatch(keys, callback) {
    var pipeline = dbclient1.multi();
    
    keys.forEach(function(key) {
        pipeline.hgetall(key);
    });
    
    pipeline.exec(function(err, results) {
        if (err) {
            callback(err, null);
            return;
        }
        
        var dataMap = {};
        keys.forEach(function(key, index) {
            dataMap[key] = results[index];
        });
        
        callback(null, dataMap);
    });
}

Pipeline technology packages multiple commands into a single request sent to the Redis server, significantly reducing network round-trip time, especially suitable for batch operation scenarios. For sorted set queries requiring prefix-based patterns, the ZSCAN command can similarly be used for safe iteration.

Advanced Patterns: Optimizing Queries with Auxiliary Indexes

For scenarios requiring frequent prefix-based queries, consider establishing dedicated index structures. For example, maintaining a set containing all "doc:" keys:

// Update index when adding documents
dbclient1.multi()
    .hmset("doc:3743-da23-dcdf-3213", "date", "2015-09-06 00:00:01", "size", "203")
    .sadd("index:docs", "doc:3743-da23-dcdf-3213")
    .exec(function(err, results) {
        // Process results
    });

// Directly use index for queries
function getDocsFromIndex(callback) {
    dbclient1.smembers("index:docs", function(err, keys) {
        if (err) {
            callback(err, null);
            return;
        }
        
        getHashDataBatch(keys, callback);
    });
}

This indexing strategy, while increasing write operation complexity, reduces query time complexity from O(N) to O(1), particularly suitable for read-heavy, write-light scenarios. Additionally, set operations inherently support efficient element management and queries, avoiding the performance risks of full table scans.

Performance Testing and Monitoring Recommendations

When deploying prefix-based query solutions in practice, establish comprehensive performance monitoring mechanisms. Key metrics include: query response time, Redis server CPU usage, memory consumption changes, etc. For the SCAN command, special attention should be paid to balancing iteration count and keys per iteration—too small COUNT values result in excessive network round-trips, while too large values may cause high single-request latency.

Recommend conducting benchmark tests at different data scales to determine optimal COUNT parameter values. Simultaneously, monitoring slow query logs (slowlog) can help identify potential performance bottlenecks. Redis's INFO command provides rich server status information, including keyspace statistics and memory usage details, which should be incorporated into regular monitoring systems.

Conclusion and Best Practices

Implementing prefix-based key-value queries in Redis requires balancing functional requirements with performance impacts. Core principles include: avoiding the KEYS command in production environments; prioritizing the SCAN command family for safe iteration; reasonably using pipeline technology to optimize batch operations; and considering index structure design based on application scenarios.

For specific implementation, follow these best practices: first, clearly define query patterns and usage frequency; second, select appropriate iteration parameters and batch sizes; then implement error handling and retry mechanisms; finally, establish comprehensive performance monitoring and alerting systems. Through these measures, query functionality can be ensured while maintaining Redis service stability and high performance.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.