Keywords: array deduplication | algorithm optimization | time complexity | two-pointer technique | sorting preprocessing
Abstract: This paper provides an in-depth exploration of efficient algorithms for removing duplicate elements from arrays in Java without utilizing Set collections. By analyzing performance bottlenecks in the original nested loop approach, we propose an optimized solution based on sorting and two-pointer technique, reducing time complexity from O(n²) to O(n log n). The article details algorithmic principles, implementation steps, performance comparisons, and includes complete code examples with complexity analysis.
Analysis of Algorithm Performance Issues
The original nested loop algorithm demonstrates significant performance issues when processing large-scale data. With an array containing 1,000,000 elements, this algorithm requires approximately O(n²) comparison operations, which is computationally expensive. The main bottleneck lies in the element shifting operations within the inner loop, where all subsequent elements need to be moved forward by one position each time a duplicate is found.
Optimization Strategy: Sorting Preprocessing
Based on recommendations from the best answer, we can employ sorting preprocessing to optimize the deduplication algorithm. By first sorting the array, all identical elements become clustered together, making the identification and removal of duplicates significantly more efficient. Quick sort algorithms typically achieve O(n log n) time complexity, which is substantially better than the original algorithm's O(n²).
Two-Pointer Technique Implementation
After sorting is complete, we can utilize the two-pointer technique to remove duplicate elements. The specific implementation is as follows:
import java.util.Arrays;
public class OptimizedDuplicateRemoval {
public static int[] removeDuplicates(int[] arr) {
if (arr.length == 0) return arr;
// Step 1: Sort the array
Arrays.sort(arr);
// Step 2: Use two-pointer technique to remove duplicates
int destination = 0;
for (int source = 1; source < arr.length; source++) {
if (arr[source] != arr[destination]) {
destination++;
arr[destination] = arr[source];
}
}
// Step 3: Return the subarray without duplicate elements
return Arrays.copyOf(arr, destination + 1);
}
public static void main(String[] args) {
int[] testArray = {3, 1, 2, 2, 3, 3, 4, 5, 5, 6, 1};
int[] result = removeDuplicates(testArray);
System.out.println("Deduplicated array: " + Arrays.toString(result));
}
}
Algorithm Complexity Analysis
The optimized algorithm demonstrates significantly improved time complexity:
- Sorting Phase: Using Arrays.sort() method with average time complexity of O(n log n)
- Deduplication Phase: Single pass through the array with time complexity O(n)
- Overall Complexity: O(n log n), primarily determined by the sorting operation
- Space Complexity: O(k), where k is the length of the deduplicated array
Performance Comparison Experiment
Experimental comparison of performance differences between original and optimized algorithms:
public class PerformanceComparison {
public static void measurePerformance(int[] array, String algorithmName) {
long startTime = System.nanoTime();
if ("optimized".equals(algorithmName)) {
OptimizedDuplicateRemoval.removeDuplicates(array.clone());
} else {
// Original algorithm implementation
originalRemoveDuplicates(array.clone());
}
long endTime = System.nanoTime();
System.out.println(algorithmName + " algorithm execution time: " +
(endTime - startTime) / 1_000_000 + "ms");
}
private static int[] originalRemoveDuplicates(int[] arr) {
int end = arr.length;
for (int i = 0; i < end; i++) {
for (int j = i + 1; j < end; j++) {
if (arr[i] == arr[j]) {
for (int k = j + 1; k < end; k++) {
arr[k - 1] = arr[k];
}
end--;
j--;
}
}
}
return Arrays.copyOf(arr, end);
}
}
Edge Case Handling
In practical applications, we need to consider various edge cases:
public class EdgeCaseHandler {
public static int[] robustRemoveDuplicates(int[] arr) {
// Handle empty arrays and single-element arrays
if (arr == null || arr.length <= 1) {
return arr != null ? arr.clone() : new int[0];
}
// Clone array to avoid modifying original data
int[] workingArray = arr.clone();
Arrays.sort(workingArray);
int uniqueIndex = 0;
for (int i = 1; i < workingArray.length; i++) {
if (workingArray[i] != workingArray[uniqueIndex]) {
uniqueIndex++;
workingArray[uniqueIndex] = workingArray[i];
}
}
return Arrays.copyOf(workingArray, uniqueIndex + 1);
}
}
Practical Application Scenarios
This optimized algorithm is suitable for various practical scenarios:
- Duplicate data cleansing in big data processing
- Deduplication of database query results
- Unique event statistics in log analysis
- Feature deduplication in machine learning feature engineering
Further Optimization Suggestions
For specific scenarios, consider the following optimization strategies:
- If the array is partially sorted, consider using adaptive sorting algorithms
- For integer arrays, use counting sort to achieve O(n) time complexity
- In memory-constrained environments, consider external sorting techniques
- For streaming data, use Bloom filters for approximate deduplication
Through the optimization methods introduced in this paper, we successfully reduced the time complexity of array deduplication from O(n²) to O(n log n), achieving significant performance improvements when processing large-scale data. This approach is not only applicable to Java but its core concepts can also be applied to similar problems in other programming languages.