Keywords: PostgreSQL | UPSERT | ON CONFLICT | RETURNING | Database Optimization
Abstract: This technical article provides an in-depth exploration of combining INSERT...ON CONFLICT statements with RETURNING clauses in PostgreSQL, focusing on how to ensure existing row identifiers are returned during conflicts by using DO UPDATE instead of DO NOTHING. The paper thoroughly explains the implementation principles, performance advantages, and practical considerations, including handling strategies in concurrent environments and the importance of avoiding unnecessary updates. By comparing the strengths and weaknesses of different solutions, it offers developers efficient and reliable UPSERT implementation approaches.
Problem Background and Core Challenge
In PostgreSQL database operations, UPSERT (UPDATE or INSERT) is a common requirement, particularly when dealing with scenarios that may involve duplicate data. A specific problem developers frequently encounter is: when using the INSERT ... ON CONFLICT DO NOTHING statement, if a conflict occurs, the RETURNING clause does not return any rows, making it impossible to retrieve identifiers (such as id) of existing records.
Consider this typical example:
INSERT INTO chats ("user", "contact", "name")
VALUES ($1, $2, $3),
($2, $1, NULL)
ON CONFLICT("user", "contact") DO NOTHING
RETURNING id;
When no conflicts exist, this statement normally returns the id values of newly inserted rows. However, once a conflict occurs, due to the effect of the DO NOTHING clause, RETURNING produces no output, preventing applications from determining which records already exist and their corresponding id values.
Solution: Using DO UPDATE to Ensure Row Return
To address this issue, the most direct and effective approach is to replace DO NOTHING with DO UPDATE, even when no actual data update is needed. The modified statement appears as follows:
INSERT INTO chats ("user", "contact", "name")
VALUES ($1, $2, $3),
($2, $1, NULL)
ON CONFLICT("user", "contact")
DO UPDATE SET
name = EXCLUDED.name
RETURNING id;
The key aspects here are:
ON CONFLICT DO UPDATEensures that an update operation is performed when conflicts occurSET name = EXCLUDED.namesets the field to the value provided during insertion, which may not actually change existing dataRETURNING idnow returns identifiers for all affected rows, including both newly inserted and pre-existing ones
In-Depth Technical Principles
PostgreSQL's ON CONFLICT mechanism detects conflicts based on unique constraints or unique indexes. When DO UPDATE is specified, the system:
- First attempts to insert new rows
- If a unique constraint is violated, it instead executes an UPDATE operation
- Regardless of whether INSERT or UPDATE is performed, the
RETURNINGclause returns affected rows
The EXCLUDED pseudotable plays a crucial role in this process, containing data that was originally intended for insertion but excluded due to conflicts. By referencing EXCLUDED.name, we essentially state: "If a conflict occurs, update this field to the value that was originally going to be inserted." In many cases, this equates to performing no actual change but ensures row return.
Performance Considerations and Optimization Recommendations
While this method resolves the row return issue, its performance impact must be considered:
Avoiding Unnecessary Update Operations
In certain scenarios, even "no-op" updates like SET name = EXCLUDED.name can trigger the following side effects:
- Execution of triggers, even when data hasn't actually changed
- Acquisition of write locks, potentially affecting concurrency performance
- Creation of new MVCC (Multi-Version Concurrency Control) versions, leading to table and index bloat
- Degraded performance in subsequent operations due to handling more row versions
Therefore, this approach is suitable in scenarios with low duplicate rates. In high-duplication scenarios, more complex solutions may need consideration.
Alternative Approach: Using CTE with UNION
For situations requiring avoidance of unnecessary updates, a solution using Common Table Expressions (CTE) combined with UNION can be employed:
WITH input_rows(usr, contact, name) AS (
VALUES
(text 'foo1', text 'bar1', text 'bob1'),
('foo2', 'bar2', 'bob2')
)
, ins AS (
INSERT INTO chats (usr, contact, name)
SELECT * FROM input_rows
ON CONFLICT (usr, contact) DO NOTHING
RETURNING id, usr, contact
)
SELECT 'i' AS source, id FROM ins
UNION ALL
SELECT 's' AS source, c.id
FROM input_rows
JOIN chats c USING (usr, contact);
This method ensures row return while avoiding unnecessary updates by separating insertion and query operations.
Considerations in Concurrent Environments
In concurrent write scenarios, the following issues must be addressed:
Handling Concurrent Conflicts
When multiple transactions attempt to insert the same data simultaneously:
- Later transactions wait for earlier ones to complete
- If the earlier transaction rolls back, the later transaction can proceed normally
- If the earlier transaction commits, the later transaction detects the conflict and executes the appropriate action
Row Locking Strategies
To ensure data consistency in concurrent environments, existing rows can be locked during conflicts:
ON CONFLICT (usr, contact) DO UPDATE
SET name = name WHERE FALSE
Although this formulation doesn't actually update data, it acquires row locks, preventing other transactions from modifying these rows until the current transaction completes.
Best Practices Summary
Based on the above analysis, we summarize the following best practices:
- Low Conflict Rate Scenarios: Directly use
ON CONFLICT DO UPDATE SET column = EXCLUDED.columnfor simplicity and effectiveness - High Conflict Rate Scenarios: Consider CTE-based approaches to avoid unnecessary update overhead
- Concurrency-Sensitive Scenarios: Appropriately use row locking to ensure data consistency
- Identifier Design: Avoid reserved keywords as column names, such as
"user"; consider alternatives likeusr - Type Safety: Ensure proper data type conversions in complex queries
By judiciously selecting and applying these techniques, developers can implement efficient and reliable UPSERT operations in PostgreSQL, ensuring complete row return information under all circumstances.