Research on Object List Deduplication Methods Based on Java 8 Stream API

Nov 22, 2025 · Programming · 9 views · 7.8

Keywords: Java 8 | List Deduplication | Stream API | Object Properties | TreeSet | Wrapper Pattern

Abstract: This paper provides an in-depth exploration of multiple implementation schemes for removing duplicate elements from object lists based on specific properties in Java 8 environment. By analyzing core methods including TreeSet with custom comparators, Wrapper classes, and HashSet state tracking, the article compares the application scenarios, performance characteristics, and implementation details of various approaches. Combined with specific code examples, it demonstrates how to efficiently handle object list deduplication problems, offering practical technical references for developers.

Introduction

In Java programming practice, processing object lists containing duplicate elements is a common task. Particularly in data processing, cache management, and business logic implementation, ensuring data uniqueness is crucial for system performance and correctness. The Stream API introduced in Java 8 provides more elegant and functional solutions for such problems.

Custom Comparator Method Based on TreeSet

Leveraging the ordered characteristics of TreeSet and custom comparators enables deduplication operations based on object properties. The core idea of this method is to define uniqueness judgment criteria for objects through comparators.

import static java.util.Comparator.comparingInt;
import static java.util.stream.Collectors.collectingAndThen;
import static java.util.stream.Collectors.toCollection;

List<Employee> uniqueEmployees = employee.stream()
    .collect(collectingAndThen(
        toCollection(() -> new TreeSet<>(comparingInt(Employee::getId))),
        ArrayList::new
    ));

Analysis of the implementation mechanism: First, convert the list to a stream via the stream() method, then use collectingAndThen to combine collectors. The internal collector toCollection creates a TreeSet instance and specifies comparison logic based on the id property through comparingInt(Employee::getId). When encountering objects with the same id, TreeSet automatically removes duplicates. Finally, convert the result back to list format via ArrayList::new.

Consider the following test case:

List<Employee> employee = Arrays.asList(
    new Employee(1, "John"), 
    new Employee(1, "Bob"), 
    new Employee(2, "Alice")
);

After executing the deduplication operation, the output result is: [Employee{id=1, name='John'}, Employee{id=2, name='Alice'}]. It's important to note that when multiple objects with the same id exist, TreeSet retains the first encountered element, and subsequent duplicate elements are discarded.

Wrapper Class Solution

Another approach involves creating wrapper classes to redefine object equality judgment logic, which is particularly suitable for situations where modifying the original class's equals method is not possible or desirable.

Basic Wrapper implementation:

class WrapperEmployee {
    private Employee e;

    public WrapperEmployee(Employee e) {
        this.e = e;
    }

    public Employee unwrap() {
        return this.e;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        WrapperEmployee that = (WrapperEmployee) o;
        return Objects.equals(e.getId(), that.e.getId());
    }

    @Override
    public int hashCode() {
        return Objects.hash(e.getId());
    }
}

Using wrapper classes for deduplication:

List<Employee> unique = employee.stream()
    .map(WrapperEmployee::new)
    .distinct()
    .map(WrapperEmployee::unwrap)
    .collect(Collectors.toList());

Execution flow of this solution: First, wrap each Employee object as a WrapperEmployee instance through the map operation, then utilize Stream's distinct() method for deduplication based on the wrapper class's equals and hashCode implementation, and finally unwrap and restore to original objects through another map operation.

Generalized Wrapper Design

To enhance code reusability, a generic Wrapper class can be designed, dynamically specifying equality judgment criteria through functional interfaces.

public class Wrapper<T, U> {
    private T t;
    private Function<T, U> equalityFunction;

    public Wrapper(T t, Function<T, U> equalityFunction) {
        this.t = t;
        this.equalityFunction = equalityFunction;
    }

    public T unwrap() {
        return this.t;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Wrapper<T, U> that = (Wrapper<T, U>) o;
        return Objects.equals(
            equalityFunction.apply(this.t), 
            equalityFunction.apply(that.t)
        );
    }

    @Override
    public int hashCode() {
        return Objects.hash(equalityFunction.apply(this.t));
    }
}

Usage of generic Wrapper:

.map(e -> new Wrapper<>(e, Employee::getId))

This design allows creating wrapper instances for different types of objects, achieving deduplication based on arbitrary properties by simply providing corresponding property extraction functions.

State Tracking Method Based on HashSet

In addition to Stream API solutions, traditional collection operations also provide efficient solutions. By tracking processed element states through HashSet and combining with the removeIf method, in-place deduplication can be achieved.

HashSet<Object> seen = new HashSet<>();
employee.removeIf(e -> !seen.add(e.getId()));

Core mechanism of this method: The HashSet.add() method returns true when addition is successful, and returns false if the element already exists. removeIf determines whether to remove elements based on the predicate's return value. When !seen.add(e.getId()) is true (meaning the id already exists), the corresponding element will be removed.

It should be noted that this method directly modifies the original list and requires the list to support element removal operations. For immutable lists or situations requiring preservation of original data, solutions that create new collections should be chosen.

Performance Analysis and Selection Recommendations

Different deduplication methods vary in time and space complexity:

In practical applications, appropriate methods should be selected based on specific requirements: if performance is a concern and modifying the original list is acceptable, the HashSet solution is the best choice; if maintaining pure function characteristics of the code is important, the Wrapper solution is more suitable; if both deduplication and sorting are needed simultaneously, the TreeSet solution can achieve both goals at once.

Extended Application Scenarios

The aforementioned methods can be further extended to more complex scenarios:

Deduplication based on multiple properties: Create composite keys by combining multiple properties, for example using person -> person.getName() + "-" + person.getAge() as unique identifiers.

Preserving specific order: If maintaining element insertion order is required, LinkedHashSet can be used instead of HashSet, or the Wrapper solution can be combined with Collectors.toCollection(LinkedHashSet::new).

Custom conflict resolution strategies: When duplicates occur, retention strategies can be specified through the third parameter of Collectors.toMap, such as retaining the newest element: (existing, replacement) -> replacement.

Conclusion

Java 8's Stream API provides rich and flexible solutions for object list deduplication. From TreeSet-based custom comparators to generic Wrapper patterns, each method has its unique advantages and applicable scenarios. Developers should choose the most suitable deduplication strategy based on specific performance requirements, code maintainability, and business needs. Mastering these technologies not only solves current problems but also lays a solid foundation for handling more complex data processing tasks.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.