Keywords: Python Serial Communication | pySerial Optimization | Real-time Data Acquisition
Abstract: This paper provides an in-depth analysis of performance bottlenecks encountered when using Python's pySerial library for high-speed serial communication. By comparing the differences between readline() and inWaiting() reading methods, it reveals the critical impact of buffer management and reading strategies on real-time data reception. The article details how to optimize reading logic to avoid data delays and buffer accumulation in 2Mbps high-speed communication scenarios, offering complete code examples and performance comparisons to help developers achieve genuine real-time data acquisition.
Problem Background and Phenomenon Analysis
In high-speed communication scenarios between embedded systems and computers, Python has become a popular tool for serial communication due to its easy-to-use pySerial library. However, when communication rates reach 2Mbps, traditional reading methods often fail to meet real-time requirements. Users observed that while Putty could perfectly receive all messages from a PIC microcontroller via FTDI USB serial port at 2Mbps, pySerial scripts experienced significant performance degradation.
The specific manifestation was: although data integrity was maintained (counter increments continuously), the reception frequency dropped sharply from the expected 100-150 times per second to approximately 5 times per second. More notably, even after stopping data transmission, accumulated data in the buffer continued to output, indicating serious buffer management issues.
Performance Bottlenecks of readline() Method
The original code used the ser.readline() method for data reading, which performs well in low-speed communication but reveals obvious defects in high-speed scenarios. According to pySerial documentation, the readline() method waits for an end-of-line character (default \n) or timeout expiration. In 2Mbps high-speed communication, this approach is equivalent to "reading racing car-speed data with an old car's pace".
The key issue is: each call to readline() requires scanning the entire input buffer for the end-of-line character. In high-speed data streams, this results in significant time wasted on buffer traversal rather than actual data processing. Even with the timeout=2 parameter set, each read still needs to wait for the full timeout period or until finding the end-of-line character, which is extremely inefficient for frequent small packet transmission.
Optimization Solution with inWaiting() and read()
To address the above issues, the optimal solution is to combine inWaiting() and read() methods. inWaiting() can obtain the number of bytes in the input queue in real-time, while read() can specify the number of bytes to read, avoiding unnecessary waiting and scanning.
The optimized code logic is as follows:
ser = serial.Serial('/dev/ttyUSB0', 2000000)
ser.flushInput()
ser.flushOutput()
while True:
bytes_to_read = ser.inWaiting()
if bytes_to_read > 0:
data_raw = ser.read(bytes_to_read)
print(data_raw)
The core advantage of this method is: significantly improved real-time performance, enabling immediate reading of all available data without waiting for specific terminators or timeouts. In 2Mbps high-speed communication, this ensures data can be processed promptly, avoiding continuous buffer accumulation.
Performance Comparison and Implementation Details
To verify the optimization effect, we conducted detailed performance analysis:
Reading Latency Comparison: The readline() method may introduce up to 2 seconds of delay per call (based on the set timeout), while the inWaiting()+read() combination has almost no delay and can immediately respond to available data.
Buffer Management: The original method caused continuous buffer accumulation, requiring considerable time to clear even after stopping data transmission. The optimized solution ensures timely buffer cleanup through real-time reading, avoiding the "data ghost" phenomenon.
Resource Utilization: The inWaiting() method only queries status without involving actual data movement, with minimal computational overhead. Combined with read()'s batch reading, it significantly reduces system call frequency and improves overall efficiency.
Practical Application Considerations
When implementing high-speed serial communication, the following key factors need consideration:
Data Integrity Verification: Since data segmentation no longer relies on end-of-line characters, appropriate data frame parsing logic needs to be implemented at the application layer to ensure correct message boundary identification.
System Scheduling Optimization: In virtualized environments (such as the Xubuntu VM used by the user), sufficient CPU time slices need to be allocated to serial reading tasks to avoid performance degradation due to resource competition.
Error Handling Mechanism: Add appropriate exception handling to cope with abnormal situations like serial port disconnection and data verification errors, ensuring system robustness.
Extended Optimization Strategies
For real-time applications with higher requirements, consider the following advanced optimizations:
Multi-threaded Architecture: Separate data reading and data processing into different threads to avoid I/O blocking affecting real-time performance.
Ring Buffer: Implement custom ring buffers at the application layer to provide more efficient memory management and data flow.
Hardware Flow Control: Enable RTS/CTS flow control on supported hardware platforms to achieve hardware-level traffic management.
Through the analysis and optimization solutions presented in this paper, developers can achieve genuine high-speed real-time serial communication in Python, meeting the requirements of industrial control, data acquisition, and other scenarios demanding strict real-time performance.