Keywords: Python | Boolean Lists | Index Retrieval | Performance Optimization | enumerate | itertools | numpy
Abstract: This article comprehensively examines various methods for retrieving indices of True values in Python boolean lists. By analyzing list comprehensions, itertools.compress, and numpy.where, it compares their performance differences and applicable scenarios. The article demonstrates implementation details through practical code examples and provides performance benchmark data to help developers choose optimal solutions based on specific requirements.
Problem Background and Common Pitfalls
In Python programming, it's common to process boolean lists and retrieve the indices of True values. A frequent mistake is using the list.index() method, as shown in the original code:
self.states = [False, False, False, False, True, True, False, True, False, False, False, False, False, False, False, False]
def which_switch(self):
x = [self.states.index(i) for i in self.states if i == True]
This approach only returns the index of the first True value because list.index() always returns the index of the first matching item. For lists containing multiple True values, this method fails to meet requirements.
Solution 1: Using enumerate with List Comprehension
The most straightforward and Pythonic approach uses the enumerate function with list comprehension:
>>> t = [False, False, False, False, True, True, False, True, False, False, False, False, False, False, False, False]
>>> [i for i, x in enumerate(t) if x]
[4, 5, 7]
This method works because enumerate(t) returns an iterator yielding tuples of indices and corresponding values. The list comprehension [i for i, x in enumerate(t) if x] filters out indices where the value is True. Since if x is equivalent to if x == True in boolean context, the code becomes more concise.
Solution 2: Performance Optimization with itertools.compress
For large lists, itertools.compress offers better performance:
>>> from itertools import compress
>>> list(compress(range(len(t)), t))
[4, 5, 7]
The compress function takes two iterators: a data iterator and a selector iterator. It returns elements from the data iterator where the corresponding selector is True. Here, range(len(t)) generates the index sequence, and t serves as the selector.
Performance comparison for a list with 16,000 elements shows:
>>> t = t*1000
>>> %timeit [i for i, x in enumerate(t) if x]
100 loops, best of 3: 2.55 ms per loop
>>> %timeit list(compress(range(len(t)), t))
1000 loops, best of 3: 696 µs per loop
itertools.compress is approximately 3.7 times faster than list comprehension because it avoids Python interpreter loop overhead.
Solution 3: Best Performance with numpy.where
If the NumPy library is already in use, np.where provides the fastest solution:
>>> import numpy as np
>>> states = [False, False, False, False, True, True, False, True, False, False, False, False, False, False, False, False]
>>> np.where(states)[0]
array([4, 5, 7])
Detailed performance comparison reveals:
>>> from itertools import compress
>>> import numpy as np
>>> t = [False, False, False, False, True, True, False, True, False, False, False, False, False, False, False, False]
>>> t = 1000*t
# Method 1: List Comprehension
>>> %timeit [i for i, x in enumerate(t) if x]
457 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# Method 2: itertools.compress
>>> %timeit list(compress(range(len(t)), t))
210 µs ± 704 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# Method 3: numpy.where (Fastest)
>>> %timeit np.where(t)
179 µs ± 593 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
np.where is approximately 17% faster than itertools.compress and about 2.5 times faster than list comprehension. NumPy's underlying C implementation provides significant advantages in numerical computations.
Practical Application Scenarios
In switchboard control systems, as described in the original problem, accurately identifying all active switch positions is crucial. Using the correct method ensures reliable system operation:
class SwitchBoard:
def __init__(self):
self.states = [False, False, False, False, True, True, False, True, False, False, False, False, False, False, False, False]
def which_switch(self):
# Using enumerate method
return [i for i, state in enumerate(self.states) if state]
This method returns [4, 5, 7], accurately identifying all active switch positions.
Method Selection Recommendations
Based on different usage scenarios, the following selection strategy is recommended:
- Small Lists and Simple Applications: Use
enumeratelist comprehension for concise and readable code - Large Lists and Performance-Sensitive Applications: Use
itertools.compressto balance performance and code readability - Numerical Computation Intensive Applications: Use
numpy.wherefor optimal performance, especially when data is already in NumPy arrays
The component methods mentioned in reference articles have applications in specific domains (like Grasshopper), but Python's standard library methods are more general and efficient.
Conclusion
Retrieving indices of True values in boolean lists is a common programming task. Avoid the pitfalls of list.index() and choose appropriate tools based on specific needs: enumerate for simple scenarios, itertools.compress for performance optimization, and numpy.where for numerical computation-intensive tasks. Understanding the principles and performance characteristics of these methods helps in writing more efficient and reliable code.