Keywords: Python | List Comprehension | Performance Optimization
Abstract: This article provides an in-depth analysis of efficient methods for creating a list containing N independent empty sublists in Python. By comparing the performance differences among list multiplication, list comprehensions, itertools.repeat, and NumPy approaches, it reveals the critical distinction between memory sharing and independence. Experiments show that list comprehensions with itertools.repeat offer approximately 15% performance improvement by avoiding redundant integer object creation, while the NumPy method, despite bypassing Python loops, actually performs worse. Through detailed code examples and memory address verification, the article offers practical performance optimization guidance for developers.
Introduction
In Python programming, creating data structures containing multiple independent sublists is a common requirement. Many developers might initially attempt to use list multiplication [[]] * n, but this results in all sublists sharing the same memory address, causing modifications to one sublist to affect all others. This article systematically analyzes various methods for creating independent sublists and recommends the optimal solution based on performance testing results.
Problem Background and Common Pitfalls
Using [[]] * n does quickly create a list containing n empty lists, but all elements are references to the same list object. For example:
d = [[]] * 3
d[0].append(1)
print(d) # Output: [[1], [1], [1]]This can be confirmed by checking memory addresses:
for i in range(len(d)):
print(id(d[i])) # All addresses are identicalThis shared reference behavior poses serious issues when independent sublist operations are required.
Core Methods for Performance Optimization
Standard List Comprehension
The most straightforward solution is using list comprehension:
d = [[] for _ in range(n)]This method creates a new empty list in each iteration, ensuring each sublist has an independent memory address. Performance tests show this is the baseline method, but there is still room for optimization.
Optimization with itertools.repeat
Combining with itertools.repeat avoids creating new integer objects in each iteration:
from itertools import repeat
d = [[] for _ in repeat(None, n)]This method is approximately 15% faster than standard list comprehension because it doesn't need to generate new range integers in each loop, reducing object creation overhead.
NumPy Array Conversion Method
Although NumPy can avoid Python-level loops:
import numpy as np
d = np.empty((n, 0)).tolist()Actual tests show this method is 2.5 times slower than list comprehension, mainly due to the overhead of converting from NumPy to Python lists.
Traditional Loop Method
Using explicit loops can also create independent sublists:
d = []
for _ in range(n):
d.append([])This method has similar performance to list comprehension but with more verbose code.
Performance Comparison Analysis
Through time performance testing (using the timeit module), the relative performance ranking of various methods is:
[[] for _ in repeat(None, n)]- Fastest (15% faster than baseline)[[] for _ in range(n)]- Baseline performance- Explicit loop method - Similar to baseline
np.empty((n, 0)).tolist()- Slowest (2.5 times slower than baseline)
Memory Independence Verification
To ensure sublists are truly independent, memory addresses can be verified:
d = [[] for _ in repeat(None, 3)]
for i in range(len(d)):
print(f"d[{i}] address: {id(d[i])}") # Addresses are all different
d[0].append(1)
print(d) # Output: [[1], [], []]Only the first sublist is modified, proving memory independence.
Application Scenarios and Recommendations
In scenarios requiring high-performance creation of large numbers of independent sublists (such as graph algorithms, matrix operations, data chunking), using [[] for _ in repeat(None, n)] is recommended. For general applications, standard list comprehension is sufficient and more readable. List multiplication should be avoided unless shared references are indeed required.
Conclusion
The optimal method for creating independent sublists is list comprehension combined with itertools.repeat, which provides the best performance while maintaining code conciseness. Developers should balance performance and readability based on specific needs and always verify sublist independence through memory address checks.