A Comprehensive Guide to unnest() with Element Numbers in PostgreSQL

Keywords: PostgreSQL | unnest function | WITH ORDINALITY | array processing | element numbering

Abstract: This article provides an in-depth exploration of how to add original position numbers to array elements generated by the unnest() function in PostgreSQL. By analyzing solutions for different PostgreSQL versions, including key technologies such as WITH ORDINALITY, LATERAL JOIN, and generate_subscripts(), it offers a complete implementation approach from basic to advanced levels. The article also discusses the differences between array subscripts and ordinal numbers, and provides best practice recommendations for practical applications.

Introduction

In database design, there are situations where multiple values are stored in a single field, which typically violates the first normal form. While this design is not ideal, in practical work we often need to handle such legacy data. PostgreSQL provides powerful array processing capabilities, with the unnest() function being a key tool for expanding arrays into multiple rows. However, when there is a need to preserve the positional information of elements in the original array, more advanced techniques are required.

PostgreSQL 14 and Later: The string_to_table() Method

Starting from PostgreSQL 14, the string_to_table() function was introduced, which can directly convert delimited strings into table form and supports the WITH ORDINALITY clause to add an ordinal number column. This method is more concise and efficient than the traditional unnest(string_to_array()) combination.

SELECT t.id, a.elem, a.nr
FROM   tbl t
LEFT   JOIN LATERAL string_to_table(t.elements, ',')
                    WITH ORDINALITY AS a(elem, nr) ON true;

Here, LEFT JOIN LATERAL ... ON true ensures that all rows from the left table are preserved even if the right-hand expression returns empty rows. WITH ORDINALITY automatically adds a bigint ordinal number column starting from 1 and incrementing by 1.

PostgreSQL 9.4 and Later: WITH ORDINALITY with unnest()

For PostgreSQL versions 9.4 through 13, although the string_to_table() function is not available, the same functionality can be achieved using unnest() with WITH ORDINALITY.

SELECT t.id, a.elem, a.nr
FROM   tbl AS t
LEFT   JOIN LATERAL unnest(string_to_array(t.elements, ','))
                    WITH ORDINALITY AS a(elem, nr) ON true;

If there is no need to preserve potentially empty rows, a more concise implicit CROSS JOIN LATERAL syntax can be used:

SELECT t.id, a.elem, a.nr
FROM   tbl t, unnest(string_to_array(t.elements, ',')) WITH ORDINALITY a(elem, nr);

For columns that are already of array type, the syntax becomes even simpler:

SELECT t.id, a.elem, a.nr
FROM   tbl t, unnest(t.arr) WITH ORDINALITY a(elem, nr);

PostgreSQL's unnest() function guarantees that elements are output in storage order, so the ordinal numbers generated by WITH ORDINALITY accurately reflect the original positions of elements.

PostgreSQL 8.4 to 9.3: Using generate_subscripts()

In earlier versions where WITH ORDINALITY is not available, the generate_subscripts() function can be used to obtain array subscripts and thereby derive element ordinal numbers.

SELECT id, arr[nr] AS elem, nr
FROM  (
   SELECT *, generate_subscripts(arr, 1) AS nr
   FROM  (SELECT id, string_to_array(elements, ' ') AS arr FROM tbl) t
   ) sub;

If the data is already in array format, this can be further simplified:

SELECT id, arr[nr] AS elem, nr
FROM  (SELECT *, generate_subscripts(arr, 1) AS nr FROM tbl) t;

This method works by generating the valid subscript range of the array and then using array subscript access to retrieve elements and their corresponding positional information.

PostgreSQL 8.1 to 8.4: Custom Function Solutions

In even earlier versions where many modern array functions are unavailable, custom functions need to be created to achieve similar functionality.

CREATE FUNCTION f_unnest_ord(anyarray, OUT val anyelement, OUT ordinality integer)
  RETURNS SETOF record
  LANGUAGE sql IMMUTABLE AS
'SELECT $1[i], i - array_lower($1,1) + 1
 FROM   generate_series(array_lower($1,1), array_upper($1,1)) i';

This function iterates through array subscripts using generate_series() and calculates the ordinal number for each element. It is important to note that array subscripts may not start at 1, so normalization using array_lower() is necessary.

Differences Between Array Subscripts and Ordinal Numbers

In PostgreSQL, array subscripts and element ordinal numbers are two distinct concepts. Array subscripts are the actual index positions in the array, while ordinal numbers represent the relative position of elements in the array (typically starting from 1).

CREATE FUNCTION f_unnest_ord_idx(anyarray, OUT val anyelement, OUT ordinality int, OUT idx int)
  RETURNS SETOF record
  LANGUAGE sql IMMUTABLE AS
'SELECT $1[i], i - array_lower($1,1) + 1, i
 FROM   generate_series(array_lower($1,1), array_upper($1,1)) i';

With this enhanced function, element values, ordinal numbers, and original subscripts can be obtained simultaneously:

SELECT id, arr, (rec).*
FROM  (
   SELECT *, f_unnest_ord_idx(arr) AS rec
   FROM  (
      VALUES
        (1, '{a,b,c}'::text[])
      , (2, '[5:7]={a,b,c}')
      , (3, '[-9:-7]={a,b,c}')
      ) t(id, arr)
   ) sub;

This example clearly demonstrates ordinal number calculation under different array subscript systems: regardless of where array subscripts start, ordinal numbers always increment from 1.

Performance Considerations and Best Practices

1. Version Selection: Use the latest PostgreSQL version whenever possible, as newer versions provide more concise and efficient functions and syntax.

2. Data Types: If possible, store data as native array types rather than delimited strings, as this provides better performance and simpler query syntax.

3. Index Usage: For large datasets, consider creating GIN indexes on array columns to accelerate queries.

4. NULL Handling: Using LEFT JOIN LATERAL ... ON true ensures proper handling of cases involving empty arrays or NULL values.

5. Code Readability: Although PostgreSQL supports various shorthand syntaxes, in production code it is recommended to use explicit column aliases and table qualifiers to improve code maintainability.

Conclusion

PostgreSQL provides multiple methods for adding original position numbers to elements generated by the unnest() function. From the latest string_to_table() WITH ORDINALITY to traditional generate_subscripts(), and even custom functions, different PostgreSQL versions have corresponding solutions. Understanding the principles and applicable scenarios of these technologies can help us handle denormalized data more effectively. Although storing delimited values violates database design best practices, PostgreSQL's powerful array processing capabilities enable us to efficiently extract and analyze such data.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.