Optimized Algorithm for Finding the Smallest Missing Positive Integer

Keywords: Algorithm Optimization | Hash Set | Time Complexity Analysis

Abstract: This paper provides an in-depth analysis of algorithms for finding the smallest missing positive integer in a given sequence. By examining performance bottlenecks in the original solution, we propose an optimized approach using hash sets that achieves O(N) time complexity and O(N) space complexity. The article compares multiple implementation strategies including sorting, marking arrays, and cycle sort, with complete Java code implementations and performance analysis.

Problem Background and Definition

Given an array A containing N integers with elements in the range [-1000000, 1000000], we need to find the first positive integer (greater than 0) that does not appear in the array. For example, for array [1, 3, 6, 4, 1, 2], the smallest missing positive integer is 5; for [1, 2, 3], the result is 4; for [-1, -3], the result is 1.

Analysis of Original Solution Issues

The original solution used TreeSet to store array elements, which caused performance issues. TreeSet is implemented based on red-black trees, with insertion operations having O(logN) time complexity, resulting in overall O(NlogN) time complexity for building the set, failing to meet the required O(N) time complexity.

Main issues in the original code include:

Set<Integer> set = new TreeSet<>();
for (int a : A) {
    set.add(a);  // O(logN) per insertion
}

Additionally, the original solution used multiple loops: first building the set, then converting the set to an array, followed by marking operations, and finally scanning for the result. This multi-loop structure increased code complexity and execution time.

Optimized Solution

The optimized solution based on hash sets meets the O(N) time complexity requirement. HashSet insertion and query operations have average O(1) time complexity, allowing the complete algorithm to finish in linear time.

Core algorithm steps:

Create a hash set to store all positive integers
Iterate through the array, adding positive integers to the set
Check each positive integer starting from 1 to see if it exists in the set
Return the first positive integer not found in the set

Java implementation code:

public class Solution {
    public int solution(int[] A) {
        int N = A.length;
        Set<Integer> set = new HashSet<>();
        
        // Step 1: Collect all positive integers
        for (int a : A) {
            if (a > 0) {
                set.add(a);
            }
        }
        
        // Step 2: Find the smallest missing positive integer
        for (int i = 1; i <= N + 1; i++) {
            if (!set.contains(i)) {
                return i;
            }
        }
        
        return N + 1;
    }
}

Algorithm Correctness Proof

The correctness of this algorithm is based on the observation that the smallest missing positive integer must be in the range 1 to N+1. The reasoning is as follows:

If the array contains all positive integers from 1 to N, then the smallest missing positive integer is N+1
If the array is missing some positive integer in the range 1 to N, then that number is the answer
Since the array length is N, it can contain at most N distinct positive integers

Setting the loop range from 1 to N+1 ensures coverage of all possible cases.

Complexity Analysis

Time Complexity: O(N)

First loop iterates through the array once, with each insertion averaging O(1)
Second loop executes at most N+1 times, with each query averaging O(1)
Overall time complexity is O(N)

Space Complexity: O(N)

Hash set requires storage for N elements in the worst case
Meets the required O(N) space complexity (excluding input storage)

Comparison with Other Methods

Sorting Method

The sorting-based approach first sorts the array, then sequentially scans for the missing positive integer:

public int solutionBySorting(int[] A) {
    Arrays.sort(A);  // O(NlogN)
    int smallest = 1;
    for (int i = 0; i < A.length; i++) {
        if (A[i] == smallest) {
            smallest++;
        }
    }
    return smallest;
}

This method has O(NlogN) time complexity. While simple to implement, it cannot meet the linear time complexity requirement.

Marking Array Method

Using an additional boolean array to mark number occurrences:

public int solutionByMarking(int[] A) {
    int n = A.length;
    boolean[] visited = new boolean[n + 1];
    
    for (int num : A) {
        if (num > 0 && num <= n) {
            visited[num] = true;
        }
    }
    
    for (int i = 1; i <= n; i++) {
        if (!visited[i]) {
            return i;
        }
    }
    
    return n + 1;
}

This method also has O(N) time complexity and O(N) space complexity, but requires additional array space.

Cycle Sort Method

Placing numbers in correct positions through in-place swapping:

public int solutionByCycleSort(int[] A) {
    int n = A.length;
    
    for (int i = 0; i < n; i++) {
        while (A[i] > 0 && A[i] <= n && A[i] != A[A[i] - 1]) {
            int temp = A[i];
            A[i] = A[temp - 1];
            A[temp - 1] = temp;
        }
    }
    
    for (int i = 0; i < n; i++) {
        if (A[i] != i + 1) {
            return i + 1;
        }
    }
    
    return n + 1;
}

This method achieves O(N) time complexity and O(1) space complexity, but implementation is more complex and modifies the original array.

Performance Testing and Validation

Extensive test cases validate the correctness and performance of the optimized solution:

Boundary cases: empty arrays, all-negative arrays, all-positive arrays
Large-scale data: random tests with N=100,000
Special patterns: consecutive numbers, duplicate numbers, wide-range numbers

Test results show that the hash set-based solution works correctly in all cases and meets performance requirements.

Practical Application Scenarios

This algorithm has important applications in:

Sequence number allocation in database systems
Resource identifier management and allocation
Item ID management in game development
Unique identifier generation in distributed systems

Conclusion

This paper provides detailed analysis of multiple solutions for finding the smallest missing positive integer. The optimized solution based on hash sets achieves a good balance between time complexity, space complexity, and code simplicity, making it the recommended approach for such problems. Through proper algorithm design and data structure selection, we can meet performance requirements while ensuring code readability and maintainability.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.