Efficient Implementation of Merging Two ArrayLists with Deduplication and Sorting in Java

Keywords: Java | ArrayList | Collection Merging | Deduplication Sorting | Algorithm Optimization

Abstract: This article explores efficient methods for merging two sorted ArrayLists in Java while removing duplicate elements. By analyzing the combined use of ArrayList.addAll(), Collections.sort(), and traversal deduplication, we achieve a solution with O(n*log(n)) time complexity. The article provides detailed explanations of algorithm principles, performance comparisons, practical applications, complete code examples, and optimization suggestions.

Problem Background and Challenges

In Java programming, there is often a need to merge two collections while ensuring the resulting set contains no duplicate elements and remains ordered. ArrayList, as the most commonly used dynamic array implementation in Java's Collections Framework, provides flexible data manipulation capabilities. However, directly merging two ArrayLists and manually handling deduplication and sorting often leads to complex code and poor efficiency. Particularly with large datasets, inappropriate algorithms can cause performance issues or even memory overflow.

Core Solution Analysis

Based on the best answer's guidance, we can implement efficient merging using a three-step approach: first merge the two lists using ArrayList.addAll(), then sort using Collections.sort(), and finally traverse the result list to remove duplicates. This method has a time complexity of O(n)+O(n*log(n))+O(n), i.e., O(n*log(n)), providing good performance in most practical applications.

The specific implementation code is as follows:

import java.util.ArrayList;
import java.util.Collections;

public class ArrayListMerger {
    public static ArrayList<Integer> mergeAndDeduplicate(ArrayList<Integer> list1, ArrayList<Integer> list2) {
        // Create new list to avoid modifying original data
        ArrayList<Integer> result = new ArrayList<>(list1);
        
        // Step 1: Merge lists
        result.addAll(list2);
        
        // Step 2: Sort
        Collections.sort(result);
        
        // Step 3: Deduplicate
        ArrayList<Integer> deduplicated = new ArrayList<>();
        for (int i = 0; i < result.size(); i++) {
            if (i == 0 || !result.get(i).equals(result.get(i - 1))) {
                deduplicated.add(result.get(i));
            }
        }
        
        return deduplicated;
    }
}

In-depth Algorithm Analysis

The ArrayList.addAll() method has a time complexity of O(n), where n is the number of elements to add. This method uses System.arraycopy for efficient underlying array copying, avoiding the overhead of adding elements individually. Collections.sort() uses the TimSort algorithm, an optimized merge sort with average and worst-case time complexity of O(n*log(n)). The traversal deduplication process has O(n) time complexity, identifying duplicates by comparing adjacent elements for equality.

Compared to the nested loop approach in the original problem, this method avoids O(n²) time complexity. The original code could trigger list insertion operations during each comparison, causing frequent array copying and memory reallocation, which is the root cause of the OutOfMemoryError.

Performance Optimization and Variants

For pre-sorted input lists, we can further optimize the algorithm:

public static ArrayList<Integer> mergeSortedLists(ArrayList<Integer> list1, ArrayList<Integer> list2) {
    ArrayList<Integer> result = new ArrayList<>();
    int i = 0, j = 0;
    
    while (i < list1.size() && j < list2.size()) {
        if (list1.get(i) < list2.get(j)) {
            if (result.isEmpty() || !result.get(result.size() - 1).equals(list1.get(i))) {
                result.add(list1.get(i));
            }
            i++;
        } else if (list1.get(i) > list2.get(j)) {
            if (result.isEmpty() || !result.get(result.size() - 1).equals(list2.get(j))) {
                result.add(list2.get(j));
            }
            j++;
        } else {
            // Elements equal, add only once
            if (result.isEmpty() || !result.get(result.size() - 1).equals(list1.get(i))) {
                result.add(list1.get(i));
            }
            i++;
            j++;
        }
    }
    
    // Handle remaining elements
    while (i < list1.size()) {
        if (result.isEmpty() || !result.get(result.size() - 1).equals(list1.get(i))) {
            result.add(list1.get(i));
        }
        i++;
    }
    
    while (j < list2.size()) {
        if (result.isEmpty() || !result.get(result.size() - 1).equals(list2.get(j))) {
            result.add(list2.get(j));
        }
        j++;
    }
    
    return result;
}

This two-pointer approach has O(n) complexity, but requires pre-sorted input lists. If lists are unsorted, sorting is needed first, maintaining overall O(n*log(n)) complexity.

Practical Applications and Considerations

In actual development, the choice of method depends on specific requirements: if input data may be unsorted, the general three-step approach is more reliable; if data is known to be sorted, the two-pointer method offers better performance. Additionally, memory usage must be considered, especially with large datasets, to avoid creating excessive intermediate collections.

Another important consideration is element comparison. For custom objects, ensure proper implementation of equals() and compareTo() methods (if using Comparable), or provide custom Comparators.

Conclusion

By properly utilizing methods provided by Java's Collections Framework, we can efficiently solve ArrayList merging, deduplication, and sorting problems. The key is understanding the time complexity and applicable scenarios of various operations to avoid common performance pitfalls. The methods introduced in this article not only solve specific technical problems but, more importantly, demonstrate how to improve code quality through algorithm analysis and optimization.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.