Keywords: C# | HashSet | multithreading | uniqueness | device management
Abstract: This paper explores how to effectively avoid adding duplicate devices to a list in C# multithreaded environments. By analyzing the limitations of traditional lock mechanisms combined with LINQ queries, it focuses on the solution using the HashSet<T> collection. The article explains in detail how HashSet works, including its hash table-based internal implementation, the return value mechanism of the Add method, and how to define the uniqueness of device objects by overriding Equals and GetHashCode methods or using custom equality comparers. Additionally, it compares the differences of other collection types like Dictionary in handling uniqueness and provides complete code examples and performance optimization suggestions, helping developers build efficient, thread-safe device management modules in asynchronous network communication scenarios.
Problem Background and Challenges
In distributed systems or IoT applications, devices announcing themselves via asynchronous network communication (e.g., UDP broadcasts or TCP connections) is a common pattern. For instance, remote devices may periodically send notification messages containing unique identifiers (such as UUIDs), and receivers need to add these devices to a centralized list for management. However, due to the asynchronous nature of network communication, multiple threads may process different announcement messages simultaneously, leading to data races and duplicate additions in traditional list operations.
Limitations of Traditional Approaches
In the provided code snippet, the developer uses a lock mechanism to protect the shared resource _remoteDevices list and checks for device existence via a LINQ query:
lock (_remoteDevicesLock)
{
RemoteDevice rDevice = (from d in _remoteDevices
where d.UUID.Trim().Equals(notifyMessage.UUID.Trim(), StringComparison.OrdinalIgnoreCase)
select d).FirstOrDefault();
if (rDevice != null)
{
// Update device information
}
else
{
// Create a new device and add to the list
rDevice = new RemoteDevice(notifyMessage.UUID);
_remoteDevices.Add(rDevice);
}
}
Although this method works in single-threaded environments, it may fail in high-concurrency scenarios. The reason is that while the lock ensures atomicity of the code block, tiny time windows between query and add operations can still cause race conditions. For example, two threads might enter the lock block simultaneously, both query that the device does not exist, and then add new instances separately, resulting in duplicates. Moreover, frequent lock contention reduces system throughput, and LINQ queries have O(n) time complexity, which is inefficient on large lists.
Solution with HashSet<T>
To fundamentally solve the uniqueness issue, the Microsoft .NET framework provides the HashSet<T> collection class. HashSet is implemented based on a hash table and designed to store non-duplicate elements; its Add method returns false when adding an existing item, simplifying duplicate detection logic. Here is a refactored code example using HashSet:
// Assuming the RemoteDevice class has correctly implemented equality comparison
private HashSet<RemoteDevice> _remoteDevices = new HashSet<RemoteDevice>();
// In asynchronous processing
lock (_remoteDevicesLock)
{
RemoteDevice newDevice = new RemoteDevice(notifyMessage.UUID);
if (_remoteDevices.Add(newDevice))
{
// Successfully added a new device
}
else
{
// Device already exists, perform update operations
var existingDevice = _remoteDevices.First(d => d.Equals(newDevice));
// Update properties of existingDevice
}
}
HashSet's Add method internally uses hashing algorithms to quickly determine element existence, with an average time complexity of O(1), significantly improving performance. However, its thread safety still requires external locks, but the risk of duplicate additions is eliminated.
Implementing Custom Equality Comparison
For HashSet to work correctly, the equality of RemoteDevice objects must be defined. There are two main approaches:
- Override Equals and GetHashCode methods: Override these methods in the
RemoteDeviceclass, basing the comparison on UUID. Example code:
public class RemoteDevice
{
public string UUID { get; private set; }
public RemoteDevice(string uuid)
{
UUID = uuid?.Trim() ?? throw new ArgumentNullException(nameof(uuid));
}
public override bool Equals(object obj)
{
if (obj is RemoteDevice other)
return string.Equals(UUID, other.UUID, StringComparison.OrdinalIgnoreCase);
return false;
}
public override int GetHashCode()
{
return UUID?.ToUpperInvariant().GetHashCode() ?? 0;
}
}
<ol start="2">
RemoteDevice class cannot be modified. Example:public class RemoteDeviceComparer : IEqualityComparer<RemoteDevice>
{
public bool Equals(RemoteDevice x, RemoteDevice y)
{
if (ReferenceEquals(x, y)) return true;
if (x is null || y is null) return false;
return string.Equals(x.UUID?.Trim(), y.UUID?.Trim(), StringComparison.OrdinalIgnoreCase);
}
public int GetHashCode(RemoteDevice obj)
{
return obj.UUID?.Trim().ToUpperInvariant().GetHashCode() ?? 0;
}
}
// Pass the comparer when initializing HashSet
private HashSet<RemoteDevice> _remoteDevices = new HashSet<RemoteDevice>(new RemoteDeviceComparer());
Comparison with Other Collection Types
When discussing uniqueness management, other collections like Dictionary<TKey, TValue> are often mentioned. Dictionary ensures uniqueness through keys, but adding duplicate keys throws an exception, which may crash the application, as shown in the example:
Dictionary<int, string> dict = new Dictionary<int, string>();
dict.Add(1, "Happy");
dict.Add(2, "Smile");
dict.Add(2, "Sad"); // Runtime error: "An item with the same key has already been added."
In contrast, HashSet's Add method returns a boolean value, making it more suitable for scenarios that require silent handling of duplicates. However, Dictionary has advantages when key-value pair mappings are needed. The choice depends on specific requirements: if only storing unique objects, HashSet is a cleaner option; if associating additional data is necessary, Dictionary might be more appropriate.
Performance and Thread Safety Considerations
In multithreaded environments, even with HashSet, thread safety must be considered. HashSet<T> in .NET is not inherently thread-safe, so it is recommended to use it with locks or concurrent collections (e.g., ConcurrentDictionary<TKey, TValue>). For high-performance scenarios, evaluate the suitability of ConcurrentDictionary, which provides atomic operations like GetOrAdd, but may introduce additional overhead. Furthermore, regularly monitoring collection size and hash collisions helps optimize performance, such as by adjusting initial capacity or using more uniform hash functions.
Conclusion and Best Practices
In C#, managing unique device lists is efficiently and reliably addressed by HashSet<T>, especially in asynchronous and multithreaded environments. Key steps include: correctly implementing equality comparison (via overriding methods or using comparers), ensuring thread safety (e.g., with locks), and selecting appropriate collection types based on performance needs. By following these practices, developers can build robust systems that effectively avoid duplicate data issues, enhancing application stability and efficiency. In real-world projects, it is advisable to combine unit tests to validate collection behavior and focus on concurrency handling logic during code reviews to prevent potential defects.