Practical Methods and Performance Analysis for Avoiding Duplicate Elements in C# Lists

Keywords: C# | List Deduplication | LINQ | HashSet | Collection Operations

Abstract: This article provides an in-depth exploration of how to effectively prevent adding duplicate elements to List collections in C# programming. By analyzing a common error case, it explains the pitfalls of using List.Contains() to check array objects and presents multiple solutions including foreach loop item-by-item checking, LINQ's Distinct() method, Except() method, and HashSet alternatives. The article compares different approaches from three dimensions: code implementation, performance characteristics, and applicable scenarios, helping developers choose optimal strategies based on actual requirements.

Problem Background and Error Analysis

In C# development, ensuring element uniqueness is a common requirement when processing collection data. A typical scenario involves adding elements from a string array to a List collection while avoiding duplicates. The original code contains a critical error: if (!lines2.Contains(lines3.ToString())). Here, lines3.ToString() returns the type name string of the array object (such as "System.String[]"), not the actual elements within the array. This renders the checking logic completely ineffective, failing to prevent duplicate element addition.

Core Solutions

Based on the best answer analysis, the most straightforward solution is to iterate through the array and check each item individually:

foreach (string str in lines3)
{
    if (!lines2.Contains(str))
        lines2.Add(str);
}

This approach has clear logic and works in any situation. If lines2 is initially empty, a more concise LINQ method can be used: lines2.AddRange(lines3.Distinct()). Here, the Distinct() method returns a deduplicated sequence, ensuring only unique elements are added.

Performance Optimization and Alternative Approaches

For large-scale data processing, the O(n) time complexity of List.Contains() may become a performance bottleneck. In such cases, consider:

Using lines2.AddRange(lines3.Except(lines2)), which internally implements HashSet for better performance.
Directly replacing List<string> with HashSet<string>, as HashSet inherently guarantees element uniqueness without manual checks.

Extension methods also provide an elegant solution:

public static class CollectionExtensions
{
    public static void AddItem<T>(this List<T> list, T item)
    {
        if (!list.Contains(item))
        {
            list.Add(item);
        }
    }
}

Method Comparison and Selection Recommendations

<table border="1"><tr><th>Method</th><th>Advantages</th><th>Disadvantages</th><th>Applicable Scenarios</th></tr><tr><td>foreach+Contains</td><td>Simple logic, easy to understand</td><td>Poor performance (O(n²))</td><td>Small datasets or learning examples</td></tr><tr><td>Distinct()</td><td>Concise code, LINQ style</td><td>Requires empty list or additional handling</td><td>Adding deduplicated data during initialization</td></tr><tr><td>Except()</td><td>Better performance, utilizes HashSet</td><td>Requires understanding of LINQ operations</td><td>Deduplicating and merging large datasets</td></tr><tr><td>HashSet</td><td>Inherent deduplication, optimal performance</td><td>Loses List's indexing features</td><td>Scenarios purely requiring uniqueness</td></tr>

In practical development, decisions should be made considering data scale, performance requirements, and code maintainability. For most application scenarios, lines2.AddRange(lines3.Distinct()) or lines2.AddRange(lines3.Except(lines2)) offer a good balance.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.

Problem Background and Error Analysis

Core Solutions

Performance Optimization and Alternative Approaches

Method Comparison and Selection Recommendations

Cite this article