Keywords: LINQ Queries | Partial Match Filtering | C# List Operations
Abstract: This article provides a comprehensive exploration of using LINQ queries in C# to implement partial match filtering between two lists. Through detailed analysis of the original problem's code examples, it explains the limitations of the Contains method and presents efficient solutions combining Any and Contains methods. Drawing from reference materials discussing the clarity of intent with Any method, the article compares different implementation approaches from performance optimization and code readability perspectives, concluding with complete code examples and best practice recommendations.
Problem Background and Requirements Analysis
In practical development, filtering between lists based on matching criteria is a common requirement. The original problem involves two string lists: test1 containing domain suffixes "@bob.com" and "@tom.com", and test2 containing complete email addresses. The requirement is to remove from test2 any items that contain the domain suffixes from test1.
Limitations of Initial Attempt
The developer initially attempted to use the Contains method:
bool bContained1 = test1.Contains(test2);
bool bContained2 = test2.Contains(test1);
This approach suffers from two main issues: first, the Contains method expects a single element parameter rather than an entire list; second, and more importantly, Contains performs exact matching rather than partial matching. Therefore, even though "joe@bob.com" in test2 contains "@bob.com", exact matching would fail.
Core Principles of LINQ Solution
The best answer provides a solution combining Any and Contains methods:
var test2NotInTest1 = test2.Where(t2 => !test1.Any(t1 => t2.Contains(t1)));
The logic of this query is: for each element t2 in test2, check if there does not exist any element t1 in test1 such that t2 contains t1. Here, Contains is the string method performing partial matching.
Performance Optimization and Code Readability
The reference article emphasizes the advantage of Any method in expressing intent clearly. Compared to the approach using Count:
var test2NotInTest1 = test2.Where(t2 => test1.Count(t1 => t2.Contains(t1)) == 0);
The Any method returns immediately upon finding the first match, while Count needs to iterate through the entire collection. For large datasets, this difference can significantly impact performance. Additionally, Any has clearer semantics, directly expressing the query intent of "whether any exists".
Complete Code Example and Implementation Details
Below is the complete implementation code:
List<string> test1 = new List<string> { "@bob.com", "@tom.com" };
List<string> test2 = new List<string> { "joe@bob.com", "test@sam.com" };
// Efficient partial match filtering
var filteredList = test2.Where(email => !test1.Any(domain => email.Contains(domain))).ToList();
// Output result: only contains "test@sam.com"
foreach (var email in filteredList)
{
Console.WriteLine(email);
}
Extended Applications and Considerations
This approach is not limited to partial string matching but can be extended to other scenarios requiring complex conditional filtering. Important considerations include:
- When
test1contains empty strings, all elements intest2will be filtered out - For performance-sensitive scenarios, consider using
HashSetto storetest1for improved lookup efficiency - If case-insensitive matching is required, use the
StringComparison.OrdinalIgnoreCaseparameter
Conclusion
By appropriately combining LINQ's Where, Any, and string's Contains methods, efficient partial match filtering between lists can be achieved. This approach not only produces concise code but also offers excellent readability and performance, making it the preferred solution for similar scenarios.