Multiple Approaches for Removing Duplicate Elements from Arrays in Swift

Keywords: Swift array deduplication | Hashable protocol | Set conversion | Algorithm performance | Order preservation

Abstract: This article provides an in-depth exploration of various methods for removing duplicate elements from arrays in Swift, focusing on linear time complexity algorithms based on the Hashable protocol. It compares the advantages and disadvantages of Set conversion versus custom extensions, offering complete code examples and performance analysis to help developers choose the most appropriate deduplication strategy based on specific requirements.

Problem Background and Requirements Analysis

In Swift programming practice, handling arrays containing duplicate elements is a common requirement. For example, given an array [1, 4, 2, 2, 6, 24, 15, 2, 60, 15, 6], the expected deduplicated result is [1, 4, 2, 6, 24, 15, 60] while maintaining the original element order. The Swift standard library does not directly provide array deduplication methods, requiring developers to implement their own solutions.

Basic Method: Set Conversion Approach

The most intuitive deduplication method utilizes Swift's Set type characteristics:

let originals = [1, 4, 2, 2, 6, 24, 15, 2, 60, 15, 6]
let unique = Array(Set(originals))
print(unique) // Output order is uncertain

This method is concise and efficient with O(n) time complexity. However, it has a significant drawback: it cannot maintain the original element order. Sets are implemented based on hash tables, and element storage order is determined by hash values, not insertion order.

Order-Preserving Generic Algorithm Implementation

To maintain element order, custom deduplication algorithms need to be implemented. The core idea is to use a Set to record encountered elements and only add unseen elements to the result array:

func unique<S: Sequence, T: Hashable>(source: S) -> [T] where S.Iterator.Element == T {
    var buffer = [T]()
    var added = Set<T>()
    for elem in source {
        if !added.contains(elem) {
            buffer.append(elem)
            added.insert(elem)
        }
    }
    return buffer
}

let vals = [1, 4, 2, 2, 6, 24, 15, 2, 60, 15, 6]
let uniqueVals = unique(source: vals)
print(uniqueVals) // [1, 4, 2, 6, 24, 15, 60]

This algorithm also has O(n) time complexity and O(n) space complexity, ensuring performance efficiency while maintaining order.

Elegant Array Extension Implementation

To improve code usability, the deduplication functionality can be encapsulated as an array extension:

extension Array where Element: Hashable {
    func uniqued() -> Array {
        var buffer = Array()
        var added = Set<Element>()
        for elem in self {
            if !added.contains(elem) {
                buffer.append(elem)
                added.insert(elem)
            }
        }
        return buffer
    }
}

let result = [1, 2, 4, 2, 1].uniqued()
print(result) // [1, 2, 4]

Modern Swift 4/5 Implementation

Utilizing modern Swift syntax features, more concise implementations can be written:

extension Sequence where Element: Hashable {
    func uniqued() -> [Element] {
        var set = Set<Element>()
        return filter { set.insert($0).inserted }
    }
}

let modernResult = [1, 2, 4, 2, 1].uniqued()
print(modernResult) // [1, 2, 4]

This version uses the inserted boolean value from the tuple returned by Set.insert(_:) method directly as the filter condition, resulting in more functional and readable code.

Deduplication Based on Specific Properties

In actual development, deduplication based on specific object properties is sometimes needed. For this purpose, generic methods can be extended:

extension Sequence {
    func unique<T: Hashable>(by keyForValue: (Iterator.Element) throws -> T) rethrows -> [Iterator.Element] {
        var seen: Set<T> = []
        return try filter { try seen.insert(keyForValue($0)).inserted }
    }
}

struct Employee {
    let name: String
    let title: String
}

let employees = [
    Employee(name: "Antoine", title: "Swift Developer"),
    Employee(name: "Dorian", title: "Swift Developer"),
    Employee(name: "Ralph", title: "Head of Sales")
]

let uniqueByTitle = employees.unique(by: { $0.title })
// Result contains only employees with different titles

Performance Analysis and Algorithm Comparison

All implementations based on Hashable have O(n) time complexity, superior to O(n²) nested loop solutions. Set's contains and insert operations have average O(1) time complexity, ensuring overall linear complexity.

Application Scenario Summary

Choose Set conversion approach when:

Element order is not important
Pursuing simplest implementation
Extremely high performance requirements

Choose custom extension when:

Must maintain original element order
Need deduplication based on specific properties
Want better code readability and maintainability

Best Practice Recommendations

In actual projects, using the Sequence extension's uniqued() method is recommended, as it balances performance, readability, and generality. For complex objects, combine with the unique(by:) method to implement property-based deduplication logic.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.