Comprehensive Guide to LINQ Distinct Operations: From Basic to Advanced Scenarios

Nov 01, 2025 · Programming · 10 views · 7.8

Keywords: LINQ | Distinct | C# | GroupBy | Deduplication

Abstract: This article provides an in-depth exploration of LINQ Distinct method usage in C#, focusing on filtering unique elements based on specific properties. Through detailed code examples and performance comparisons, it covers multiple implementation approaches including GroupBy+First combination, custom comparers, anonymous types, and discusses the trade-offs between deferred and immediate execution. The content integrates Q&A data with reference documentation to offer complete solutions from fundamental to advanced levels.

Overview of LINQ Distinct Method

LINQ (Language Integrated Query), as a powerful query feature in the .NET framework, offers rich collection manipulation capabilities. The Distinct method specifically removes duplicate elements from sequences, returning collections containing unique values. In C# development, handling data deduplication is a common requirement, and LINQ Distinct provides an elegant solution for this purpose.

Basic Usage of Distinct Method

For simple value types, the Distinct method can directly utilize default comparers for deduplication operations. The following example demonstrates how to remove duplicates from an integer collection:

List<int> numbers = new List<int> { 1, 2, 2, 3, 4, 4, 5 };
IEnumerable<int> distinctNumbers = numbers.Distinct();

foreach (int num in distinctNumbers)
{
    Console.WriteLine(num);
}
// Output: 1, 2, 3, 4, 5

Similarly, for string types, the Distinct method employs case-sensitive comparison by default:

string[] names = { "Alice", "alice", "Bob", "bob", "Alice" };
var distinctNames = names.Distinct();
// Result: "Alice", "alice", "Bob", "bob"

Deduplication Based on Specific Properties

In practical development, there's often a need to perform deduplication based on specific object properties. Consider this scenario: a list containing multiple objects where unique records need to be filtered by ID property, preserving the first occurrence for each ID.

public class DataItem
{
    public int Id { get; set; }
    public string Value { get; set; }
}

List<DataItem> dataList = new List<DataItem>
{
    new DataItem { Id = 1, Value = "A" },
    new DataItem { Id = 1, Value = "B" },
    new DataItem { Id = 2, Value = "C" },
    new DataItem { Id = 3, Value = "D" },
    new DataItem { Id = 3, Value = "E" }
};

GroupBy and First Combination Approach

The most straightforward and efficient solution involves using the GroupBy method to group by ID, then selecting the first element from each group:

IEnumerable<DataItem> distinctItems = dataList
    .GroupBy(item => item.Id)
    .Select(group => group.First());

This method's execution process involves three steps: first, GroupBy operation groups objects with the same ID; second, Select operation chooses the first element from each group; finally, it returns a sequence containing objects with unique IDs.

If conversion to a list is required, ToList invocation can be added:

List<DataItem> resultList = dataList
    .GroupBy(item => item.Id)
    .Select(group => group.First())
    .ToList();

Execution Strategy Selection

LINQ queries support two execution strategies: deferred execution and immediate execution. Deferred execution IEnumerable computes results during iteration, suitable for large datasets or infinite sequences; while immediate execution ToList immediately computes all results and stores them in memory.

// Deferred execution - query executes during iteration
IEnumerable<DataItem> deferredQuery = dataList
    .GroupBy(item => item.Id)
    .Select(group => group.First());

// Immediate execution - query executes immediately and stores results
List<DataItem> immediateResult = deferredQuery.ToList();

Custom Comparer Implementation

For complex objects, custom comparison logic can be defined by implementing the IEqualityComparer interface:

public class DataItemComparer : IEqualityComparer<DataItem>
{
    public bool Equals(DataItem x, DataItem y)
    {
        if (ReferenceEquals(x, y)) return true;
        if (x is null || y is null) return false;
        return x.Id == y.Id;
    }

    public int GetHashCode(DataItem obj)
    {
        return obj?.Id.GetHashCode() ?? 0;
    }
}

// Using custom comparer
var distinctWithComparer = dataList.Distinct(new DataItemComparer());

Application of Anonymous Types

Another concise approach involves using anonymous types for projection, leveraging their automatically implemented Equals and GetHashCode methods:

var distinctAnonymous = dataList
    .Select(item => new { item.Id, item.Value })
    .Distinct()
    .Select(anon => new DataItem { Id = anon.Id, Value = anon.Value });

Third-Party Library Extension Methods

Third-party libraries like MoreLINQ provide DistinctBy extension methods for more intuitive property-based deduplication:

// Requires MoreLINQ NuGet package installation
using MoreLinq;

var distinctByExtension = dataList.DistinctBy(item => item.Id);

Performance Analysis and Best Practices

When selecting deduplication approaches, performance considerations are crucial. The GroupBy+First combination generally performs well with O(n) time complexity. For large datasets, deferred execution is recommended to avoid unnecessary memory allocation.

Key performance factors include: dataset size, memory constraints, query execution frequency. For frequently executed queries, consider caching results; for extremely large datasets, consider chunk processing.

Practical Application Scenarios

Property-based deduplication operations find wide applications in various scenarios: data cleaning, report generation, cache management, user session handling, etc. Understanding the strengths and weaknesses of different implementation approaches helps select the most suitable solution for specific contexts.

By appropriately utilizing LINQ's Distinct-related operations, developers can write code that is both concise and efficient, enhancing application performance and maintainability.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.