Keywords: Redis | Keyspace Iteration | KEYS Command | SCAN Command | Performance Optimization | Database Operations
Abstract: This article provides an in-depth exploration of two primary methods for retrieving all keys in Redis: the KEYS command and the SCAN command. By analyzing time complexity, performance impacts, and applicable scenarios, it details the basic usage and potential risks of KEYS, along with the cursor-based iteration mechanism and advantages of SCAN. Through concrete code examples, it demonstrates how to safely and efficiently traverse the keyspace in Redis clients and Python-redis libraries, offering best practice guidance for key operations in both production and debugging environments.
Overview of Redis Keyspace Iteration
Redis, as a high-performance key-value store, organizes data in key-value pairs within a flat keyspace. In practical development and operations, there is often a need to retrieve all keys or keys matching specific patterns for tasks such as data cleanup, migration, and debugging. Redis provides two commands for keyspace iteration: KEYS and SCAN, which differ significantly in their implementation mechanisms and suitable use cases.
Fundamentals and Usage of the KEYS Command
The KEYS command is the most direct method for key retrieval in Redis, with the syntax KEYS pattern, where pattern supports glob-style pattern matching. For example, KEYS * returns a list of all keys in the database, while KEYS user:* returns all keys prefixed with "user:".
Using the KEYS command in the Redis client is straightforward:
redis-cli> KEYS *
1) "user:1"
2) "user:2"
3) "product:100"
4) "session:abc123"
The KEYS command supports various pattern matching syntaxes:
h?llomatches hello, hallo, etc.h*llomatches hllo, heeeello, etc.h[ae]llomatches hello and hallo, but not hilloh[^e]llomatches hallo, hbllo, etc., but not hello
Performance Risks and Limitations of KEYS
Although the KEYS command is easy to use, its time complexity is O(N), where N is the number of keys in the database. This means execution time increases linearly with the number of keys. More critically, Redis uses a single-threaded model for command processing, so executing KEYS blocks all other operations, which can cause severe performance issues in production environments.
Consider the following Python code example illustrating the potential risks of KEYS in large databases:
import redis
# Connect to Redis
r = redis.Redis(host='localhost', port=6379, db=0)
# Risky operation: Using KEYS on a large database
try:
all_keys = r.keys('*') # This blocks the Redis server
print(f"Found {len(all_keys)} keys")
except Exception as e:
print(f"Error: {e}")
In Redis Cluster environments, the KEYS command optimizes searches for patterns that might match a single hash slot. For instance, with the pattern {a}h*llo, Redis only attempts to match keys in the slot corresponding to the hash tag {a}, rather than scanning the entire database.
SCAN Command: A Safe Keyspace Iteration Solution
Starting from Redis version 2.8, the SCAN command family was introduced, providing a cursor-based keyspace iteration mechanism. SCAN uses incremental iteration, returning only a subset of keys each time, thus avoiding long-term server blocking.
Basic usage of the SCAN command is as follows:
redis-cli> SCAN 0
1) "4"
2) 1) "user:1"
2) "product:100"
redis-cli> SCAN 4
1) "0"
2) 1) "user:2"
2) "session:abc123"
In the Redis client, you can also use the --scan option for pattern matching:
$ redis-cli --scan --pattern 'user:*'
user:1
user:2
Advanced Features of the SCAN Command
The SCAN command supports COUNT and MATCH options, offering more flexible iteration control:
# Using COUNT to limit the number of keys returned per iteration
redis-cli> SCAN 0 COUNT 5
1) "8"
2) 1) "key:1"
2) "key:9"
3) "key:13"
4) "key:29"
5) "key:23"
# Using MATCH for pattern matching
redis-cli> SCAN 0 MATCH "user:*" COUNT 10
1) "12"
2) 1) "user:1"
2) "user:2"
The SCAN command family also includes variants for specific data structures:
SSCAN: For iterating over set elementsHSCAN: For iterating over hash fields and valuesZSCAN: For iterating over sorted set elements and scores
Implementation Mechanism and Characteristics of SCAN
The SCAN command is implemented based on Redis's internal dictionary hash table structure. It maintains a cursor to track iteration progress, returning a new cursor and a batch of keys with each call. Iteration is complete when the returned cursor is 0.
Key characteristics of the SCAN command include:
- Stateless Server: Iteration state is entirely maintained by the client; the server stores no state information
- Fault Tolerance: Iterations can be safely stopped and restarted
- Parallel Iteration: Supports multiple concurrent iterations
- Possible Duplicates: The same element may be returned multiple times during iteration
- Advisory COUNT: The COUNT parameter is only a suggestion; the actual number returned may vary
Safe Keyspace Iteration Practices in Python
In Python applications, the SCAN command should be preferred for keyspace iteration. Here is a safe implementation using the redis-py library:
import redis
def safe_iterate_keys(redis_client, pattern='*', batch_size=100):
"""
Safely iterate over keys matching the specified pattern
Args:
redis_client: Redis connection instance
pattern: Key pattern
batch_size: Batch size per iteration
Returns:
list: All matching keys
"""
all_keys = []
cursor = 0
while True:
cursor, keys = redis_client.scan(
cursor=cursor,
match=pattern,
count=batch_size
)
all_keys.extend(keys)
# Cursor 0 indicates iteration completion
if cursor == 0:
break
return all_keys
# Usage example
r = redis.Redis(host='localhost', port=6379, db=0)
# Safely retrieve all user keys
user_keys = safe_iterate_keys(r, 'user:*')
print(f"Found {len(user_keys)} user keys")
# Process large key sets in batches
for key in safe_iterate_keys(r, 'product:*', batch_size=50):
# Process each product key
process_product(key)
Best Practices for Production Environments
Based on performance and safety considerations, the following best practices are recommended:
- Development Environment: Use KEYS for quick debugging in small databases or development settings
- Production Environment: Always use SCAN to avoid blocking the Redis server
- Batch Size: Adjust the COUNT parameter based on database size and performance requirements
- Error Handling: Implement appropriate retry and error handling mechanisms
- Monitoring: Monitor keyspace size and the performance impact of iteration operations
Conclusion
Redis offers two keyspace iteration solutions—KEYS and SCAN—each suited to different scenarios. The KEYS command is simple and direct, ideal for small databases and debugging, but poses significant performance risks in production. The SCAN command, through its cursor-based iteration mechanism, provides a safe and reliable method for keyspace traversal, especially suitable for large production environments. Developers should choose the appropriate command based on specific needs and environmental characteristics to ensure the stability and performance of Redis services.