Implementing Complete Row Return in PostgreSQL UPSERT Operations Using ON CONFLICT with RETURNING

Keywords: PostgreSQL | UPSERT | ON CONFLICT | RETURNING | Database Optimization

Abstract: This technical article provides an in-depth exploration of combining INSERT...ON CONFLICT statements with RETURNING clauses in PostgreSQL, focusing on how to ensure existing row identifiers are returned during conflicts by using DO UPDATE instead of DO NOTHING. The paper thoroughly explains the implementation principles, performance advantages, and practical considerations, including handling strategies in concurrent environments and the importance of avoiding unnecessary updates. By comparing the strengths and weaknesses of different solutions, it offers developers efficient and reliable UPSERT implementation approaches.

Problem Background and Core Challenge

In PostgreSQL database operations, UPSERT (UPDATE or INSERT) is a common requirement, particularly when dealing with scenarios that may involve duplicate data. A specific problem developers frequently encounter is: when using the INSERT ... ON CONFLICT DO NOTHING statement, if a conflict occurs, the RETURNING clause does not return any rows, making it impossible to retrieve identifiers (such as id) of existing records.

Consider this typical example:

INSERT INTO chats ("user", "contact", "name") 
       VALUES ($1, $2, $3), 
              ($2, $1, NULL) 
ON CONFLICT("user", "contact") DO NOTHING
RETURNING id;

When no conflicts exist, this statement normally returns the id values of newly inserted rows. However, once a conflict occurs, due to the effect of the DO NOTHING clause, RETURNING produces no output, preventing applications from determining which records already exist and their corresponding id values.

Solution: Using DO UPDATE to Ensure Row Return

To address this issue, the most direct and effective approach is to replace DO NOTHING with DO UPDATE, even when no actual data update is needed. The modified statement appears as follows:

INSERT INTO chats ("user", "contact", "name") 
       VALUES ($1, $2, $3), 
              ($2, $1, NULL) 
ON CONFLICT("user", "contact") 
DO UPDATE SET 
    name = EXCLUDED.name 
RETURNING id;

The key aspects here are:

ON CONFLICT DO UPDATE ensures that an update operation is performed when conflicts occur
SET name = EXCLUDED.name sets the field to the value provided during insertion, which may not actually change existing data
RETURNING id now returns identifiers for all affected rows, including both newly inserted and pre-existing ones

In-Depth Technical Principles

PostgreSQL's ON CONFLICT mechanism detects conflicts based on unique constraints or unique indexes. When DO UPDATE is specified, the system:

First attempts to insert new rows
If a unique constraint is violated, it instead executes an UPDATE operation
Regardless of whether INSERT or UPDATE is performed, the RETURNING clause returns affected rows

The EXCLUDED pseudotable plays a crucial role in this process, containing data that was originally intended for insertion but excluded due to conflicts. By referencing EXCLUDED.name, we essentially state: "If a conflict occurs, update this field to the value that was originally going to be inserted." In many cases, this equates to performing no actual change but ensures row return.

Performance Considerations and Optimization Recommendations

While this method resolves the row return issue, its performance impact must be considered:

Avoiding Unnecessary Update Operations

In certain scenarios, even "no-op" updates like SET name = EXCLUDED.name can trigger the following side effects:

Execution of triggers, even when data hasn't actually changed
Acquisition of write locks, potentially affecting concurrency performance
Creation of new MVCC (Multi-Version Concurrency Control) versions, leading to table and index bloat
Degraded performance in subsequent operations due to handling more row versions

Therefore, this approach is suitable in scenarios with low duplicate rates. In high-duplication scenarios, more complex solutions may need consideration.

Alternative Approach: Using CTE with UNION

For situations requiring avoidance of unnecessary updates, a solution using Common Table Expressions (CTE) combined with UNION can be employed:

WITH input_rows(usr, contact, name) AS (
   VALUES
      (text 'foo1', text 'bar1', text 'bob1'),
      ('foo2', 'bar2', 'bob2')
   )
, ins AS (
   INSERT INTO chats (usr, contact, name) 
   SELECT * FROM input_rows
   ON CONFLICT (usr, contact) DO NOTHING
   RETURNING id, usr, contact
   )
SELECT 'i' AS source, id FROM ins
UNION ALL
SELECT 's' AS source, c.id 
FROM input_rows
JOIN chats c USING (usr, contact);

This method ensures row return while avoiding unnecessary updates by separating insertion and query operations.

Considerations in Concurrent Environments

In concurrent write scenarios, the following issues must be addressed:

Handling Concurrent Conflicts

When multiple transactions attempt to insert the same data simultaneously:

Later transactions wait for earlier ones to complete
If the earlier transaction rolls back, the later transaction can proceed normally
If the earlier transaction commits, the later transaction detects the conflict and executes the appropriate action

Row Locking Strategies

To ensure data consistency in concurrent environments, existing rows can be locked during conflicts:

ON CONFLICT (usr, contact) DO UPDATE
SET name = name WHERE FALSE

Although this formulation doesn't actually update data, it acquires row locks, preventing other transactions from modifying these rows until the current transaction completes.

Best Practices Summary

Based on the above analysis, we summarize the following best practices:

Low Conflict Rate Scenarios: Directly use ON CONFLICT DO UPDATE SET column = EXCLUDED.column for simplicity and effectiveness
High Conflict Rate Scenarios: Consider CTE-based approaches to avoid unnecessary update overhead
Concurrency-Sensitive Scenarios: Appropriately use row locking to ensure data consistency
Identifier Design: Avoid reserved keywords as column names, such as "user"; consider alternatives like usr
Type Safety: Ensure proper data type conversions in complex queries

By judiciously selecting and applying these techniques, developers can implement efficient and reliable UPSERT operations in PostgreSQL, ensuring complete row return information under all circumstances.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.