Algorithm Implementation and Optimization for Sorting 1 Million 8-Digit Numbers in 1MB RAM

Keywords: Memory-Constrained Sorting | Compact List Encoding | Sublist Grouping | Bit-Level Optimization | Algorithm Implementation

Abstract: This paper thoroughly investigates the challenging algorithmic problem of sorting 1 million 8-digit decimal numbers under strict memory constraints (1MB RAM). By analyzing the compact list encoding scheme from the best answer (Answer 4), it details how to utilize sublist grouping, dynamic header mapping, and efficient merging strategies to achieve complete sorting within limited memory. The article also compares the pros and cons of alternative approaches (e.g., ICMP storage, arithmetic coding, and LZMA compression) and demonstrates key algorithm implementations with practical code examples. Ultimately, it proves that through carefully designed bit-level operations and memory management, the problem is not only solvable but can be completed within a reasonable time frame.

Problem Background and Memory Constraint Analysis

In embedded systems or resource-constrained environments, sorting large-scale data often faces severe memory limitations. This problem requires using only 1MB of RAM to sort 1 million 8-digit decimal numbers (duplicates allowed) received via a TCP connection and output the sorted list through another TCP connection. The system provides 2KB for network state and buffers, with code residing in ROM and not counting against memory.

Preliminary calculations show that storing raw numbers directly (each requiring 27 bits due to 100000000 possible values) would need approximately 3.35MB, far exceeding available memory. Thus, compressed storage and efficient algorithms are essential.

Core Algorithm: Compact Sorted List

The optimal solution (Answer 4) centers on constructing a compact sorted list, dividing the number range [0, 99999999] into 781250 sublists, each covering 128 consecutive numbers. Through meticulously designed sublist header encoding and entry storage, memory usage is significantly reduced.

Sublist Encoding Scheme

The initial design uses four header types:

00: Empty sublist
01: Singleton, followed by a 7-bit number value
10: At least two distinct numbers, entries stored in non-decreasing order but with the last entry ≤ the first to mark the end (e.g., numbers 2,4,6 stored as (4,6,2))
11: Multiple repetitions of a single number, followed by the number value and repetition count encoding

For further compression, it is optimized to three header types (A, B, C), with D-type (repeated numbers) sublists encoded via "impossible" C-type patterns:

A (00): Empty sublist
B (01): Singleton sublist
C (1): Multi-number sublist, or encoding repeated numbers via specific patterns

For example, three repetitions of number N are encoded as: C{00NNNNN}{1NN0000}{0100000}, where N represents the number bits.

Memory Usage Calculation

The total bit count for a fully populated compact list is calculated as: 2*781250 (headers) + 7*1000000 (entries) - 781250/3 (header optimization savings) ≈ 8302083 bits, about 1037764 bytes. With TCP overhead added, it remains below the 1MB limit.

Algorithm Flow and Implementation Details

The algorithm is divided into four phases: initialization, number reception, sorting and merging, and output:

Initialization: Create an empty compact list and pre-allocate memory regions.
Number Reception: Read input numbers in batches, temporarily store as 32-bit integers, and sort them (e.g., using heapsort).
Merging: Merge the new number list with the existing compact list to generate a new compact list. Read and write pointers are managed to ensure the merge process does not overwrite unprocessed data.
Output: Traverse the final compact list and output numbers in order.

Key optimizations include dynamic header mapping (adjusting encoding based on sublist type frequency) and sublist grouping (10 groups of 78125 sublists each), reducing the impact of temporary expansion on memory.

Comparison with Alternative Approaches

Besides the compact list, other answers propose various creative solutions, each with limitations:

ICMP Storage Scheme (Answer 1): Uses network packets for temporary data storage, requiring only a few RAM variables. However, it generates numerous ICMP requests, performs poorly, and may be restricted by network equipment, making it unsuitable for production.
Arithmetic Coding Scheme (Answer 2): Employs arithmetic coding for data compression, managing memory with fixed arrays (e.g., circular[253250] and stage[8000]). The code is verified to work, but compression efficiency depends on data distribution, and worst-case memory requirements are hard to guarantee.
LZMA Compression Scheme (Answer 3): Combines bucket sorting with LZMA compression, achieving high compression ratios for pre-sorted data (e.g., compression factor of 0.06 for ascending sequences). However, LZMA dictionary size and memory overhead may exceed limits, and it relies on external libraries.

The compact list scheme strikes the best balance between theoretical guarantees and practical feasibility.

Performance Evaluation and Optimization Suggestions

On typical hardware (e.g., Xeon W3520), complete sorting takes about 23 seconds. The main time overhead comes from reading, writing, and merging the compact list. Performance can be further improved by:

Batch Processing: Increase the number of numbers processed per batch to reduce merge operations.
Adaptive Header Mapping: Dynamically select optimal header encoding based on real-time sublist type frequency statistics.
Memory Layout Optimization: Minimize fragmentation and ensure contiguous data storage to enhance cache efficiency.

Code Examples and Implementation Key Points

The following pseudocode illustrates the core logic of compact list merging:

function mergeCompactList(newNumbers, oldList):
    newList = initializeEmptyList()
    readPtrOld = startOf(oldList)
    readPtrNew = startOf(newNumbers)
    writePtr = startOf(newListBuffer)
    
    while readPtrNew < endOf(newNumbers) or readPtrOld < endOf(oldList):
        if readPtrNew < endOf(newNumbers) and (readPtrOld >= endOf(oldList) or newNumbers[readPtrNew] <= decodeNext(oldList, readPtrOld)):
            value = newNumbers[readPtrNew]
            readPtrNew++
        else:
            value = decodeNext(oldList, readPtrOld)
        
        encodeToCompactList(newList, writePtr, value)
    
    return newList

Actual implementation must handle sublist header decoding, number encoding/decoding, and edge cases.

Conclusion and Extended Applications

This paper demonstrates that through compact data structures and bit-level optimization, large-scale sorting under strict memory constraints is entirely feasible. The compact list scheme is not only applicable to this problem but can also be extended to other resource-constrained scenarios, such as IoT device data processing or embedded database index construction.

Future work could explore combining more efficient compression algorithms (e.g., dictionary-based encoding) with hardware acceleration (e.g., bit manipulation instructions) to further reduce time and space complexity.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.