Keywords: LINQ | Distinct | Property_Distinct | C# | Extension_Methods
Abstract: This article provides an in-depth exploration of implementing distinct operations based on one or more object properties in C# LINQ. By analyzing the limitations of the default Distinct() method, it details two primary solutions: query expressions using GroupBy with First method and custom DistinctBy extension methods. The article includes concrete code examples, explains the application of anonymous types in multi-property distinct operations, and discusses the implementation principles of custom comparers. Practical recommendations for performance considerations and EF Core compatibility issues in different scenarios are also provided to help developers effectively handle complex data deduplication requirements.
Introduction
In practical applications of Language Integrated Query (LINQ) in C#, developers frequently need to perform distinct operations based on specific properties of objects. While LINQ provides the standard Distinct() method, its default implementation relies on overall object equality comparison, which often fails to meet the requirements for distinct operations based on specific properties when dealing with complex objects. This article provides multiple practical solutions through comprehensive analysis.
Problem Context and Challenges
Consider a typical scenario: we have a list of Person objects, each containing Id and Name properties. When multiple objects share the same Id value, the standard Distinct() method cannot perform deduplication based on the Id property because it compares either reference equality or value equality of the entire object.
public class Person
{
public int Id { get; set; }
public string Name { get; set; }
}
List<Person> people = new List<Person>
{
new Person { Id = 1, Name = "Test1" },
new Person { Id = 1, Name = "Test1" },
new Person { Id = 2, Name = "Test2" }
};
// Standard Distinct() cannot achieve deduplication based on Id
var result = people.Distinct(); // Still returns 3 elementsSolution One: GroupBy and First Combination
The first solution utilizes LINQ's GroupBy operator to group elements by specified properties, then selects the first element from each group as the representative.
// Distinct by single property
List<Person> distinctPeople = people
.GroupBy(p => p.Id)
.Select(g => g.First())
.ToList();
// Distinct by multiple properties
List<Person> distinctPeopleMulti = people
.GroupBy(p => new { p.Id, p.Name })
.Select(g => g.First())
.ToList();The advantage of this approach lies in its simplicity and readability. Through anonymous types, composite key grouping based on multiple properties can be easily achieved. It's important to note that in some query providers (such as earlier versions of Entity Framework Core), FirstOrDefault might be necessary to ensure proper query execution.
Solution Two: Custom DistinctBy Extension Method
The second solution provides a more elegant and efficient approach by creating a custom DistinctBy extension method. The core idea of this method is to maintain a hash set of seen keys, using a key selector function to determine element uniqueness.
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (keySelector == null) throw new ArgumentNullException(nameof(keySelector));
HashSet<TKey> seenKeys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (seenKeys.Add(keySelector(element)))
{
yield return element;
}
}
}Usage examples:
// By single property
var distinctById = people.DistinctBy(p => p.Id);
// By multiple properties
var distinctByMultiple = people.DistinctBy(p => new { p.Id, p.Name });Advanced Features and Custom Comparison
To handle more complex comparison requirements, the DistinctBy method can be extended to support custom equality comparers:
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey> comparer)
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (keySelector == null) throw new ArgumentNullException(nameof(keySelector));
HashSet<TKey> seenKeys = new HashSet<TKey>(comparer);
foreach (TSource element in source)
{
if (seenKeys.Add(keySelector(element)))
{
yield return element;
}
}
}Performance Analysis and Comparison
From a performance perspective, the DistinctBy method generally outperforms the GroupBy approach because:
DistinctByuses a singleHashSetfor deduplication with near O(n) time complexityGroupByrequires creating a grouping dictionary and then iterating through each group, resulting in higher memory overheadDistinctBydemonstrates better memory efficiency on large datasets
Practical Application Scenarios
In real-world development, these techniques can be applied to various scenarios:
// Database query result deduplication
var uniqueCustomers = dbContext.Customers
.DistinctBy(c => c.Email);
// Log data deduplication
var uniqueErrors = errorLogs
.DistinctBy(e => new { e.ErrorCode, e.Timestamp.Date });
// Product catalog processing
var uniqueProducts = productList
.DistinctBy(p => p.SKU);Compatibility Considerations
It's worth noting that the MoreLINQ library includes an official implementation of the DistinctBy method. For production environments, it's recommended to reference the MoreLINQ library directly to obtain a thoroughly tested and optimized implementation. Additionally, while Entity Framework Core 6 and later versions provide better support for certain query patterns, developers should still be aware of query provider limitations in complex scenarios.
Conclusion
Through the comprehensive analysis presented in this article, we have demonstrated multiple effective methods for performing distinct operations based on object properties in C# LINQ. Whether using the GroupBy combination or custom DistinctBy extension methods, developers can choose the most suitable solution based on specific requirements. These techniques not only enhance code readability and maintainability but also provide powerful flexibility when handling complex data deduplication scenarios.