Keywords: PostgreSQL | UPSERT | Concurrency Safety
Abstract: This article provides an in-depth exploration of UPSERT operations in PostgreSQL, focusing on the INSERT...ON CONFLICT syntax introduced in version 9.5 and its advantages. It compares traditional approaches, including retry loops and bulk locking updates, with modern methods, explaining race condition issues and solutions in concurrent environments. Practical code examples illustrate various implementations, offering technical guidance for PostgreSQL users across different versions.
Introduction
In database operations, UPSERT (update or insert) is a common requirement that combines the functionalities of INSERT and UPDATE. It updates a record if it exists or inserts a new one if it does not. MySQL supports this via INSERT ... ON DUPLICATE KEY UPDATE, while the SQL standard uses the MERGE statement. Before PostgreSQL 9.5, native support was lacking, requiring developers to adopt alternative methods. This article systematically covers UPSERT implementations in PostgreSQL, with a focus on concurrency safety.
PostgreSQL 9.5 and Newer: ON CONFLICT Syntax
PostgreSQL 9.5 introduced the INSERT ... ON CONFLICT syntax, directly supporting UPSERT operations. This syntax handles unique constraint violations through conflict detection, offering DO UPDATE and DO NOTHING behaviors. For example, for a table testtable with a primary key id and a field somedata, the UPSERT code is as follows:
INSERT INTO testtable (id, somedata) VALUES
(2, 'Joe'),
(3, 'Alan')
ON CONFLICT (id)
DO UPDATE SET somedata = EXCLUDED.somedata;
This statement attempts to insert records with id 2 and 3. If id=2 already exists, it updates its somedata to 'Joe'; if id=3 does not exist, it inserts a new record. The keyword EXCLUDED references the conflicting insert values. This method supports multiple rows, requires no explicit locking or retry loops, and significantly improves efficiency. Compared to MySQL's ON DUPLICATE KEY UPDATE, PostgreSQL's syntax is more flexible, allowing specification of any unique constraint.
Traditional Methods for PostgreSQL 9.4 and Older
Before version 9.5, PostgreSQL had no built-in UPSERT support, necessitating custom implementations. The core challenge is ensuring concurrency safety and avoiding race conditions. Common approaches include individual row retry loops and bulk locking updates.
Individual Row Retry Loop
This method is suitable for high-concurrency scenarios, where each UPSERT operation retries in a loop until success. The PostgreSQL documentation provides an example function using a BEGIN...EXCEPTION block to handle duplicate key errors:
CREATE OR REPLACE FUNCTION upsert_testtable(p_id integer, p_somedata text)
RETURNS void AS $$
BEGIN
LOOP
BEGIN
UPDATE testtable SET somedata = p_somedata WHERE id = p_id;
IF found THEN
RETURN;
END IF;
INSERT INTO testtable (id, somedata) VALUES (p_id, p_somedata);
RETURN;
EXCEPTION WHEN unique_violation THEN
-- Retry
END;
END LOOP;
END;
$$ LANGUAGE plpgsql;
This function first attempts an update; if no row is found, it inserts. If a unique violation occurs during insertion, it catches the exception and retries. This approach is safe under the READ COMMITTED isolation level but inefficient, as the loop may execute multiple times. Note that if triggers or other unique constraints exist, it may not handle all conflicts correctly.
Bulk Locking Update
For bulk data processing, the locking method is recommended. It involves storing new data in a temporary table, locking the target table, and then performing updates and inserts. For example:
BEGIN;
CREATE TEMPORARY TABLE newvals(id integer, somedata text);
INSERT INTO newvals(id, somedata) VALUES (2, 'Joe'), (3, 'Alan');
LOCK TABLE testtable IN EXCLUSIVE MODE;
UPDATE testtable
SET somedata = newvals.somedata
FROM newvals
WHERE newvals.id = testtable.id;
INSERT INTO testtable
SELECT newvals.id, newvals.somedata
FROM newvals
LEFT OUTER JOIN testtable ON (testtable.id = newvals.id)
WHERE testtable.id IS NULL;
COMMIT;
This process starts by creating a temporary table newvals and inserting data. It then locks testtable in EXCLUSIVE mode to prevent modifications by other transactions. Next, it uses UPDATE ... FROM to update existing records and inserts new records via a left join. This method is efficient and concurrency-safe but may impact other operations during the lock.
Concurrency Safety and Race Condition Analysis
UPSERT operations are prone to race conditions in concurrent environments. For instance, a common erroneous approach is:
-- Incorrect example: prone to race conditions
BEGIN;
UPDATE testtable SET somedata = 'blah' WHERE id = 2;
INSERT INTO testtable (id, somedata)
SELECT 2, 'blah'
WHERE NOT EXISTS (SELECT 1 FROM testtable WHERE testtable.id = 2);
COMMIT;
If two transactions execute this simultaneously, both might update zero rows (due to no initial matches), then both pass the EXISTS check and attempt inserts, leading to a duplicate key error in one transaction. Retry loops or locking methods avoid this issue. The ON CONFLICT syntax handles conflicts at the database kernel level, requiring no application-layer intervention.
Comparison with Other Databases
MySQL's INSERT ... ON DUPLICATE KEY UPDATE is similar to PostgreSQL's ON CONFLICT, but the latter is more versatile, supporting any unique constraint. The SQL standard's MERGE statement is not widely used for UPSERT in PostgreSQL due to ambiguous concurrency semantics, often requiring additional locking. In other databases like SQL Server and Oracle, MERGE is common but also faces concurrency challenges.
Conclusion
PostgreSQL's UPSERT implementation has evolved from traditional methods to native ON CONFLICT support. For versions 9.5 and above, INSERT ... ON CONFLICT is recommended for its simplicity, efficiency, and concurrency safety. Users of older versions can opt for retry loops or bulk locking methods but should consider performance and complexity. In practice, the choice of method should depend on data volume, concurrency needs, and the PostgreSQL version.