Efficient Algorithm for Removing Duplicate Integers from an Array: An In-Place Solution Based on Two-Pointer and Element Swapping

Keywords: array deduplication | in-place algorithm | two-pointer

Abstract: This paper explores an algorithm for in-place removal of duplicate elements from an integer array without using auxiliary data structures or pre-sorting. The core solution leverages two-pointer techniques and element swapping strategies, comparing current elements with subsequent ones to move duplicates to the array's end, achieving deduplication in O(n²) time complexity. It details the algorithm's principles, implementation, performance characteristics, and compares it with alternative methods like hashing and merge sort variants, highlighting its practicality in memory-constrained scenarios.

Algorithm Background and Problem Definition

In computer science, array deduplication is a classic problem prevalent in data processing, database operations, and algorithm interviews. Given an array of random integers, the goal is to remove all duplicate elements in-place without additional data structures (e.g., hash tables) or pre-sorting, returning a sequence of unique values. For example, input array {4, 8, 4, 1, 1, 2, 9} should output {4, 8, 1, 2, 9, ?, ?}, where "?" denotes irrelevant values at the array's tail after duplicate removal. This problem originates from a Microsoft interview, with constraints designed to test candidates' deep understanding of space efficiency and algorithm design.

Core Algorithm Principle and Implementation

Based on the best answer (Answer 3), we employ a two-pointer technique for in-place deduplication. The core idea is: traverse the array, and for each element, check all subsequent elements; if a duplicate is found, swap it with the element at the array's end and reduce the effective array range. This approach avoids the overhead of shifting elements forward while ensuring unique elements remain at the front. Below is the C implementation code:

void rmdup(int *array, int length) {
    int *current, *end = array + length - 1;

    for (int *ptr = array; ptr < end; ptr++) {
        current = ptr + 1;
        while (current <= end) {
            if (*current == *ptr) {
                *current = *end--;
            } else {
                current++;
            }
        }
    }
}

Code analysis: The outer loop uses pointer ptr to traverse the array, and the inner loop pointer current checks all elements after ptr. When a duplicate is detected (*current == *ptr), the element at current is replaced with the end element (*end), and the effective range is reduced via end--. Otherwise, the current pointer advances. This process continues until all elements are processed, with unique elements concentrated at the array's start.

Performance Analysis and Complexity Evaluation

The algorithm has a time complexity of O(n²), where n is the array length. In the worst case (e.g., all elements are unique), each element is compared with all subsequent ones, resulting in approximately n*(n-1)/2 operations. Space complexity is O(1), using only constant extra space, meeting the in-place requirement. Although the time complexity is high, for small datasets or memory-sensitive scenarios, this algorithm offers advantages due to no additional storage. Practical tests show acceptable performance for typical inputs (e.g., partially duplicate arrays), with a simple and understandable implementation.

Comparison with Other Methods

Referring to other answers, hashing (Answer 2) achieves deduplication in O(n) average time but relies on hash tables or similar structures, violating this problem's constraints. Its D implementation handles hash collisions recursively, being efficient but complex and potentially affected by hash distribution. A merge sort variant (Answer 1) combines sorting and deduplication with O(n log n) time complexity but requires modifying the sorting process and does not meet the "no sorting" requirement. In contrast, the algorithm in this paper, though slower, strictly adheres to constraints, emphasizing space efficiency, making it suitable for resource-constrained environments like embedded systems.

Application Scenarios and Optimization Suggestions

This algorithm is applicable to real-time systems, low-memory devices, or applications requiring strict in-place operations, such as sensor data processing or legacy hardware maintenance. Optimization directions include: introducing early termination (e.g., skipping inner loops when no duplicates are detected), using bitmap marking for small-range integers (but may increase space overhead). In interviews, discussing these trade-offs can demonstrate comprehensive thinking in algorithm design, such as balancing time and space or adapting to different data distributions.

Conclusion

Through two-pointer and element swapping strategies, we have implemented an efficient, in-place array deduplication algorithm, highlighting innovative solutions under constraints. Although with O(n²) time complexity, its O(1) space complexity gives it practical value in specific scenarios. Future work could explore hybrid approaches, such as incorporating hashing ideas without additional structures, to further enhance performance. This research deepens the understanding of fundamental algorithm design and optimization principles.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.