Keywords: Swift array deduplication | Hashable protocol | Set conversion | Algorithm performance | Order preservation
Abstract: This article provides an in-depth exploration of various methods for removing duplicate elements from arrays in Swift, focusing on linear time complexity algorithms based on the Hashable protocol. It compares the advantages and disadvantages of Set conversion versus custom extensions, offering complete code examples and performance analysis to help developers choose the most appropriate deduplication strategy based on specific requirements.
Problem Background and Requirements Analysis
In Swift programming practice, handling arrays containing duplicate elements is a common requirement. For example, given an array [1, 4, 2, 2, 6, 24, 15, 2, 60, 15, 6], the expected deduplicated result is [1, 4, 2, 6, 24, 15, 60] while maintaining the original element order. The Swift standard library does not directly provide array deduplication methods, requiring developers to implement their own solutions.
Basic Method: Set Conversion Approach
The most intuitive deduplication method utilizes Swift's Set type characteristics:
let originals = [1, 4, 2, 2, 6, 24, 15, 2, 60, 15, 6]
let unique = Array(Set(originals))
print(unique) // Output order is uncertain
This method is concise and efficient with O(n) time complexity. However, it has a significant drawback: it cannot maintain the original element order. Sets are implemented based on hash tables, and element storage order is determined by hash values, not insertion order.
Order-Preserving Generic Algorithm Implementation
To maintain element order, custom deduplication algorithms need to be implemented. The core idea is to use a Set to record encountered elements and only add unseen elements to the result array:
func unique<S: Sequence, T: Hashable>(source: S) -> [T] where S.Iterator.Element == T {
var buffer = [T]()
var added = Set<T>()
for elem in source {
if !added.contains(elem) {
buffer.append(elem)
added.insert(elem)
}
}
return buffer
}
let vals = [1, 4, 2, 2, 6, 24, 15, 2, 60, 15, 6]
let uniqueVals = unique(source: vals)
print(uniqueVals) // [1, 4, 2, 6, 24, 15, 60]
This algorithm also has O(n) time complexity and O(n) space complexity, ensuring performance efficiency while maintaining order.
Elegant Array Extension Implementation
To improve code usability, the deduplication functionality can be encapsulated as an array extension:
extension Array where Element: Hashable {
func uniqued() -> Array {
var buffer = Array()
var added = Set<Element>()
for elem in self {
if !added.contains(elem) {
buffer.append(elem)
added.insert(elem)
}
}
return buffer
}
}
let result = [1, 2, 4, 2, 1].uniqued()
print(result) // [1, 2, 4]
Modern Swift 4/5 Implementation
Utilizing modern Swift syntax features, more concise implementations can be written:
extension Sequence where Element: Hashable {
func uniqued() -> [Element] {
var set = Set<Element>()
return filter { set.insert($0).inserted }
}
}
let modernResult = [1, 2, 4, 2, 1].uniqued()
print(modernResult) // [1, 2, 4]
This version uses the inserted boolean value from the tuple returned by Set.insert(_:) method directly as the filter condition, resulting in more functional and readable code.
Deduplication Based on Specific Properties
In actual development, deduplication based on specific object properties is sometimes needed. For this purpose, generic methods can be extended:
extension Sequence {
func unique<T: Hashable>(by keyForValue: (Iterator.Element) throws -> T) rethrows -> [Iterator.Element] {
var seen: Set<T> = []
return try filter { try seen.insert(keyForValue($0)).inserted }
}
}
struct Employee {
let name: String
let title: String
}
let employees = [
Employee(name: "Antoine", title: "Swift Developer"),
Employee(name: "Dorian", title: "Swift Developer"),
Employee(name: "Ralph", title: "Head of Sales")
]
let uniqueByTitle = employees.unique(by: { $0.title })
// Result contains only employees with different titles
Performance Analysis and Algorithm Comparison
All implementations based on Hashable have O(n) time complexity, superior to O(n²) nested loop solutions. Set's contains and insert operations have average O(1) time complexity, ensuring overall linear complexity.
Application Scenario Summary
Choose Set conversion approach when:
- Element order is not important
- Pursuing simplest implementation
- Extremely high performance requirements
Choose custom extension when:
- Must maintain original element order
- Need deduplication based on specific properties
- Want better code readability and maintainability
Best Practice Recommendations
In actual projects, using the Sequence extension's uniqued() method is recommended, as it balances performance, readability, and generality. For complex objects, combine with the unique(by:) method to implement property-based deduplication logic.