Keywords: Python | JSON parsing | file reading | error handling | data extraction
Abstract: This article provides an in-depth analysis of the common 'JSON object must be str, bytes or bytearray' error when reading JSON files in Python. Through examination of a real user case, it explains the differences and proper usage of json.loads() and json.load() functions. Starting from error causes, the article guides readers step-by-step on correctly reading JSON file contents, extracting specific fields like ['text'], and offers complete code examples with best practices. It also covers file path handling, encoding issues, and error handling mechanisms to help developers avoid common pitfalls and improve JSON data processing efficiency.
Analysis of JSON File Reading Errors
In Python programming, handling JSON data is a common task, but beginners often encounter type errors. In the user's case, the error message "the JSON object must be str, bytes or bytearray, not 'TextIOWrapper'" reveals the core issue: the json.loads() function expects a string, bytes, or bytearray as input, while the user passed a file object.
Error Code Analysis
The original erroneous code is:
with open('C:/Users/bilal butt/Desktop/PanamalEakJson.json','r') as lst:
b = json.loads(lst)
print(b['text'])
There are two key issues here: first, lst is a file object (TextIOWrapper), not a JSON string; second, the JSON file contains an array, requiring access to array elements before retrieving the text field.
Correct Solutions
Based on the best answer, there are two correct approaches to reading JSON files:
Method 1: Using json.loads() with File Reading
This method requires reading file content as a string first:
import json
with open('panamaleaks50k.json', 'r', encoding='utf-8') as f:
json_string = f.read()
data = json.loads(json_string)
# Extract all text fields
all_texts = [item['text'] for item in data]
print("All text fields:", all_texts)
# Extract single text field (first element)
single_text = data[0]['text']
print("Single text field:", single_text)
Method 2: Using json.load() for Direct File Object Processing
This is a more concise approach, as json.load() is specifically designed for file objects:
import json
with open('panamaleaks50k.json', 'r', encoding='utf-8') as f:
data = json.load(f)
# Batch extraction of text fields
for item in data:
print(item.get('text', 'Field not found'))
# Conditional extraction example
specific_text = next((item['text'] for item in data if item['id'] == '885800668862263296'), None)
Core Concept Comparison
json.loads() vs json.load():
json.loads(): Parameter must be a string (s stands for string), used for parsing JSON-formatted stringsjson.load(): Parameter must be a file object, internally calls theread()method automatically
Practical Application Extensions
When dealing with large JSON files, consider the following optimization strategies:
import json
from pathlib import Path
# Using pathlib for path handling
json_path = Path('data/panamaleaks50k.json')
# Adding error handling
try:
with json_path.open('r', encoding='utf-8') as file:
data = json.load(file)
# Safe field access
texts = []
for record in data:
if isinstance(record, dict) and 'text' in record:
texts.append(record['text'])
print(f"Successfully extracted {len(texts)} text records")
except FileNotFoundError:
print(f"File not found: {json_path}")
except json.JSONDecodeError as e:
print(f"JSON decoding error: {e}")
except KeyError:
print("text field not present in some records")
Performance Considerations and Best Practices
1. Encoding Specification: Always explicitly specify file encoding (e.g., utf-8) to avoid cross-platform issues
2. Memory Management: For extremely large JSON files, consider using the ijson library for streaming parsing
3. Data Validation: Use the .get() method or check key existence before accessing fields
4. Path Handling: Use pathlib or os.path to ensure path compatibility
Conclusion
Proper handling of JSON file reading requires understanding the distinction between Python file objects and strings. json.load() provides the most direct conversion from file to JSON object, while json.loads() requires explicit file content reading. In practical development, combining appropriate error handling and data type validation enables building robust JSON data processing pipelines. For the PanamaLeaks data example, the correct methods efficiently extract all text fields, supporting subsequent text analysis or data mining tasks.