Dictionary Reference Issues in Python: Analysis and Solutions for Lists Storing Identical Dictionary Objects

Nov 28, 2025 · Programming · 13 views · 7.8

Keywords: Python | Dictionary Reference | List Storage | Object Reference | Data Structures

Abstract: This article provides an in-depth analysis of common dictionary reference issues in Python programming. Through a practical case of extracting iframe attributes from web pages, it explains why reusing the same dictionary object in loops results in lists storing identical references. The paper elaborates on Python's object reference mechanism, offers multiple solutions including creating new dictionaries within loops, using dictionary comprehensions and copy() methods, and provides performance comparisons and best practices to help developers avoid such pitfalls.

Problem Phenomenon and Analysis

In Python development, a typical issue frequently encountered when working with data structures is that when attempting to create a list containing multiple dictionaries, all elements in the resulting list point to the same dictionary object. This phenomenon is particularly common among beginners, but its underlying principles involve Python's core object reference mechanism.

Consider the following practical scenario: extracting attribute information from all iframe tags on a webpage. The original code implementation is as follows:

site = "http://" + url
f = urllib2.urlopen(site)
web_content = f.read()

soup = BeautifulSoup(web_content)
info = {}
content = []
for iframe in soup.find_all('iframe'):
    info['src'] = iframe.get('src')
    info['height'] = iframe.get('height')
    info['width'] = iframe.get('width')
    content.append(info)
    print(info)

pprint(content)

During debugging, individual print(info) statements output correct results:

{'src': u'abc.com', 'width': u'0', 'height': u'0'}
{'src': u'xyz.com', 'width': u'0', 'height': u'0'}
{'src': u'http://www.detik.com', 'width': u'1000', 'height': u'600'}

However, the final pprint(content) output shows that all dictionaries in the list are identical:

[{'height': u'600', 'src': u'http://www.detik.com', 'width': u'1000'},
{'height': u'600', 'src': u'http://www.detik.com', 'width': u'1000'},
{'height': u'600', 'src': u'http://www.detik.com', 'width': u'1000'}]

Root Cause: Object Reference Mechanism

The core of this issue lies in Python's object reference model. When executing content.append(info), it's not adding a copy of the dictionary to the list, but rather adding a reference pointing to the same dictionary object. In each iteration of the loop, we're modifying the same dictionary object and then adding another reference to it.

This mechanism can be verified with a simplified example:

>>> d = {}
>>> dlist = []
>>> for i in range(3):
...     d['data'] = i
...     dlist.append(d)
...     print(d)
...
{'data': 0}
{'data': 1}
{'data': 2}
>>> print(dlist)
[{'data': 2}, {'data': 2}, {'data': 2}]

Using the id() function makes it clearer that all list items point to the same object:

>>> for item in dlist:
...     print("List item points to object ID:", id(item))
...
List item points to object ID: 47472232
List item points to object ID: 47472232
List item points to object ID: 47472232

Solutions

Method 1: Create New Dictionary Within Loop

The most direct and effective solution is to create a new dictionary object in each loop iteration:

for iframe in soup.find_all('iframe'):
    info = {}
    info['src'] = iframe.get('src')
    info['height'] = iframe.get('height')
    info['width'] = iframe.get('width')
    content.append(info)

A more elegant implementation creates the complete dictionary directly within the loop:

for iframe in soup.find_all('iframe'):
    info = {
        "src": iframe.get('src'),
        "height": iframe.get('height'),
        "width": iframe.get('width')
    }
    content.append(info)

Method 2: Using Dictionary Comprehension

For more complex scenarios, dictionary comprehension can be used to create the list:

content = [
    {
        "src": iframe.get('src'),
        "height": iframe.get('height'),
        "width": iframe.get('width')
    }
    for iframe in soup.find_all('iframe')
]

Method 3: Using copy() Method

Another solution is to use the dictionary's copy() method to create copies:

info = {}
for iframe in soup.find_all('iframe'):
    info['src'] = iframe.get('src')
    info['height'] = iframe.get('height')
    info['width'] = iframe.get('width')
    content.append(info.copy())

Verifying the effectiveness of this method:

>>> dlist = []
>>> for i in range(3):
...     d['data'] = i
...     dlist.append(d.copy())
...     print(d)
...
{'data': 0}
{'data': 1}
{'data': 2}
>>> print(dlist)
[{'data': 0}, {'data': 1}, {'data': 2}]

Checking object IDs confirms that different objects were created:

>>> for item in dlist:
...     print("List item points to object ID:", id(item))
...
List item points to object ID: 33861576
List item points to object ID: 47472520
List item points to object ID: 47458120

Performance Considerations and Best Practices

In data processing scenarios, performance is an important consideration. Referring to performance tests of related dataset conversions, we can observe efficiency differences between various methods:

For a dataset containing 100 columns and 10,000 rows, the average time for directly looping through the dataset to convert to a list of dictionaries is 876.99 milliseconds, while the method of first converting to PyDataset then processing averages 492.04 milliseconds. This indicates that in large dataset processing, choosing appropriate data structures and methods significantly impacts performance.

Best practice recommendations:

Conclusion

Python's object reference mechanism is part of its powerful functionality, but requires developers to have a deep understanding. When working with mutable objects (such as dictionaries, lists), it's essential to distinguish between references and copies. Through proper programming patterns, common pitfalls can be avoided, resulting in more robust and efficient code.

In practical development, it's recommended to adopt the method of creating new dictionaries within loops. This not only solves the reference issue but also makes the code's intent clearer, facilitating maintenance and debugging.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.