Optimized Algorithm for Finding the Smallest Missing Positive Integer

Nov 16, 2025 · Programming · 12 views · 7.8

Keywords: Algorithm Optimization | Hash Set | Time Complexity Analysis

Abstract: This paper provides an in-depth analysis of algorithms for finding the smallest missing positive integer in a given sequence. By examining performance bottlenecks in the original solution, we propose an optimized approach using hash sets that achieves O(N) time complexity and O(N) space complexity. The article compares multiple implementation strategies including sorting, marking arrays, and cycle sort, with complete Java code implementations and performance analysis.

Problem Background and Definition

Given an array A containing N integers with elements in the range [-1000000, 1000000], we need to find the first positive integer (greater than 0) that does not appear in the array. For example, for array [1, 3, 6, 4, 1, 2], the smallest missing positive integer is 5; for [1, 2, 3], the result is 4; for [-1, -3], the result is 1.

Analysis of Original Solution Issues

The original solution used TreeSet to store array elements, which caused performance issues. TreeSet is implemented based on red-black trees, with insertion operations having O(logN) time complexity, resulting in overall O(NlogN) time complexity for building the set, failing to meet the required O(N) time complexity.

Main issues in the original code include:

Set<Integer> set = new TreeSet<>();
for (int a : A) {
    set.add(a);  // O(logN) per insertion
}

Additionally, the original solution used multiple loops: first building the set, then converting the set to an array, followed by marking operations, and finally scanning for the result. This multi-loop structure increased code complexity and execution time.

Optimized Solution

The optimized solution based on hash sets meets the O(N) time complexity requirement. HashSet insertion and query operations have average O(1) time complexity, allowing the complete algorithm to finish in linear time.

Core algorithm steps:

  1. Create a hash set to store all positive integers
  2. Iterate through the array, adding positive integers to the set
  3. Check each positive integer starting from 1 to see if it exists in the set
  4. Return the first positive integer not found in the set

Java implementation code:

public class Solution {
    public int solution(int[] A) {
        int N = A.length;
        Set<Integer> set = new HashSet<>();
        
        // Step 1: Collect all positive integers
        for (int a : A) {
            if (a > 0) {
                set.add(a);
            }
        }
        
        // Step 2: Find the smallest missing positive integer
        for (int i = 1; i <= N + 1; i++) {
            if (!set.contains(i)) {
                return i;
            }
        }
        
        return N + 1;
    }
}

Algorithm Correctness Proof

The correctness of this algorithm is based on the observation that the smallest missing positive integer must be in the range 1 to N+1. The reasoning is as follows:

Setting the loop range from 1 to N+1 ensures coverage of all possible cases.

Complexity Analysis

Time Complexity: O(N)

Space Complexity: O(N)

Comparison with Other Methods

Sorting Method

The sorting-based approach first sorts the array, then sequentially scans for the missing positive integer:

public int solutionBySorting(int[] A) {
    Arrays.sort(A);  // O(NlogN)
    int smallest = 1;
    for (int i = 0; i < A.length; i++) {
        if (A[i] == smallest) {
            smallest++;
        }
    }
    return smallest;
}

This method has O(NlogN) time complexity. While simple to implement, it cannot meet the linear time complexity requirement.

Marking Array Method

Using an additional boolean array to mark number occurrences:

public int solutionByMarking(int[] A) {
    int n = A.length;
    boolean[] visited = new boolean[n + 1];
    
    for (int num : A) {
        if (num > 0 && num <= n) {
            visited[num] = true;
        }
    }
    
    for (int i = 1; i <= n; i++) {
        if (!visited[i]) {
            return i;
        }
    }
    
    return n + 1;
}

This method also has O(N) time complexity and O(N) space complexity, but requires additional array space.

Cycle Sort Method

Placing numbers in correct positions through in-place swapping:

public int solutionByCycleSort(int[] A) {
    int n = A.length;
    
    for (int i = 0; i < n; i++) {
        while (A[i] > 0 && A[i] <= n && A[i] != A[A[i] - 1]) {
            int temp = A[i];
            A[i] = A[temp - 1];
            A[temp - 1] = temp;
        }
    }
    
    for (int i = 0; i < n; i++) {
        if (A[i] != i + 1) {
            return i + 1;
        }
    }
    
    return n + 1;
}

This method achieves O(N) time complexity and O(1) space complexity, but implementation is more complex and modifies the original array.

Performance Testing and Validation

Extensive test cases validate the correctness and performance of the optimized solution:

Test results show that the hash set-based solution works correctly in all cases and meets performance requirements.

Practical Application Scenarios

This algorithm has important applications in:

Conclusion

This paper provides detailed analysis of multiple solutions for finding the smallest missing positive integer. The optimized solution based on hash sets achieves a good balance between time complexity, space complexity, and code simplicity, making it the recommended approach for such problems. Through proper algorithm design and data structure selection, we can meet performance requirements while ensuring code readability and maintainability.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.