Keywords: PostgreSQL | ON CONFLICT | Unique Constraints | UPSERT | Stored Functions
Abstract: This paper examines the limitations of PostgreSQL's INSERT ... ON CONFLICT ... DO UPDATE syntax when dealing with multiple independently unique columns. Through analysis of official documentation and practical examples, it reveals why ON CONFLICT (col1, col2) cannot directly detect conflicts on separately unique columns. The article presents a stored function solution that combines traditional UPSERT logic with exception handling, enabling safe data merging while maintaining individual uniqueness constraints. Alternative approaches using composite unique indexes are also discussed, along with their implications and trade-offs.
Problem Context and Core Challenge
In PostgreSQL database design, tables often contain multiple columns with independent uniqueness constraints. For instance, a user table might require both username and email fields to be unique individually. When using the INSERT ... ON CONFLICT ... DO UPDATE syntax introduced in PostgreSQL 9.5 (commonly called UPSERT), developers expect to detect conflicts on multiple unique columns simultaneously, but the syntax does not directly support this operation.
Analysis of Syntax Limitations
According to PostgreSQL documentation, the conflict_target parameter in the ON CONFLICT clause performs conflict detection through unique index inference. This requires an existing unique index as an arbiter index. When specifying ON CONFLICT (col1, col2), the system looks for a composite unique index containing both columns, rather than detecting uniqueness violations on col1 and col2 separately.
Consider the following example table structure:
CREATE TABLE dupes(
col1 int PRIMARY KEY,
col2 int,
col3 text,
CONSTRAINT col2_unique UNIQUE (col2)
);
Attempting to execute this query:
INSERT INTO dupes VALUES(3,2,'c')
ON CONFLICT (col1) DO UPDATE SET col3 = 'c', col2 = 2;
When col2=2 already exists, even though col1=3 doesn't conflict, the operation fails due to violation of the col2_unique constraint. Attempting to use ON CONFLICT (col1,col2) results in a syntax error due to the absence of a corresponding composite unique index.
Technical Principles Deep Dive
PostgreSQL's design stems from semantic ambiguity in conflict resolution. When multiple columns have independent uniqueness constraints, the system cannot determine:
- If
col2conflicts, shouldcol1be updated? - If updating
col1causes new conflicts, how should they be handled? - When multiple conflicts occur simultaneously, what should be the priority?
These undefined semantics make direct support for multiple independent unique conflict detection complex. In contrast, composite unique indexes provide clear conflict definitions—only considering it a conflict when the combination of all specified columns is duplicated.
Stored Function Solution
By combining traditional UPSERT logic, we can create a stored function to handle this complex scenario:
CREATE OR REPLACE FUNCTION merge_db(key1 INT, key2 INT, data TEXT)
RETURNS VOID AS $$
BEGIN
LOOP
-- First attempt to update existing record
UPDATE dupes SET col3 = data WHERE col1 = key1 AND col2 = key2;
IF found THEN
RETURN;
END IF;
-- Update failed, attempt to insert new record
BEGIN
INSERT INTO dupes VALUES (key1, key2, data)
ON CONFLICT (col1) DO UPDATE SET col3 = data;
RETURN;
EXCEPTION WHEN unique_violation THEN
BEGIN
INSERT INTO dupes VALUES (key1, key2, data)
ON CONFLICT (col2) DO UPDATE SET col3 = data;
RETURN;
EXCEPTION WHEN unique_violation THEN
-- Double conflict, retry the update loop
END;
END;
END LOOP;
END;
$$ LANGUAGE plpgsql;
This function implements the following logic:
- First attempt to update existing records based on both column conditions
- If update fails (record doesn't exist), attempt to insert a new record
- During insertion, first check for
col1conflicts, thencol2conflicts if needed - If both columns conflict, restart the loop (other transactions may have updated data)
Alternative Approach: Composite Unique Index
As mentioned in other answers, creating a composite unique index is another solution:
CREATE UNIQUE INDEX idx_table_col1_col2 ON table_name (col1, col2);
Then you can use:
INSERT INTO table_name ...
ON CONFLICT (col1, col2) DO UPDATE SET ...;
However, this approach changes the semantics of data constraints—it allows col1 or col2 to repeat individually, only prohibiting the combination from being duplicated, which may not meet original business requirements.
Performance and Concurrency Considerations
The stored function solution performs well in concurrent environments because:
- It uses a retry loop mechanism to handle race conditions
- Exception handling ensures transaction integrity
- The update-first strategy reduces lock contention
However, in high-concurrency scenarios, you may need to adjust retry logic or add exponential backoff. The composite unique index approach generally offers better performance, but as noted, may not satisfy business constraint requirements.
Version Compatibility and Upgrade Recommendations
This discussion is based on PostgreSQL 9.5. In newer versions (9.6+), the ON CONFLICT syntax remains compatible with performance optimizations. For scenarios requiring handling of multiple independent unique constraints, we recommend:
- Evaluate business requirements to determine if multiple independent uniqueness is truly needed
- Consider using composite unique indexes to simplify design
- If independent uniqueness must be maintained, adopt the stored function solution presented here
- In PostgreSQL 12+, explore advanced features like partial indexes and conditional unique constraints
Conclusion
PostgreSQL's ON CONFLICT syntax provides powerful support for data merging but has inherent limitations when dealing with multiple independent unique constraints. By deeply understanding unique index inference mechanisms and conflict detection principles, developers can choose appropriate solutions. The stored function solution offers maximum flexibility, while the composite unique index approach is more concise and efficient when semantic requirements are met. In practical applications, make choices based on specific business needs, performance requirements, and concurrency characteristics.