SQL UNION Operator: Technical Analysis of Combining Multiple SELECT Statements in a Single Query

Nov 11, 2025 · Programming · 14 views · 7.8

Keywords: SQL Query | UNION Operator | Multi-table Data Combination | Database Performance | SELECT Statement Combination

Abstract: This article provides an in-depth exploration of using the UNION operator in SQL to combine multiple independent SELECT statements. Through analysis of a practical case involving football player data queries, it详细 explains the differences between UNION and UNION ALL, applicable scenarios, and performance considerations. The article also compares other query combination methods and offers complete code examples and best practice recommendations to help developers master efficient solutions for multi-table data queries.

Introduction

In database query practices, there is often a need to retrieve information from multiple tables that have similar structures but contain independent data. Traditional JOIN operations are suitable for correlating data between tables, but when no relationships exist between tables, different technical approaches are required. This article analyzes the technical details of using the UNION operator to solve such problems, based on a typical football player data query case study.

Problem Scenario Analysis

Consider the following practical requirement: retrieving specific player statistics from two independent player data tables, tblMadrid and tblBarcelona. Both tables share the same column structure (name, games, goals) but store data for players from different teams, with no relationships between the tables.

The initial incorrect attempt used nested SELECT statements:

SELECT 
    (SELECT name, games, goals FROM tblMadrid WHERE name = 'ronaldo') AS table_a,
    (SELECT name, games, goals FROM tblBarcelona WHERE name = 'messi') AS table_b
FROM DUAL
ORDER BY goals

The fundamental issue with this approach is that it attempts to combine two independent row results into different columns of the same row, violating basic principles of relational database normalization.

Core Principles of the UNION Operator

The UNION operator is a key SQL standard keyword specifically designed for vertically combining result sets from multiple SELECT statements. Its basic syntax structure is:

SELECT column1, column2, ... FROM table1 WHERE condition1
UNION [ALL]
SELECT column1, column2, ... FROM table2 WHERE condition2
[ORDER BY column]

The correct solution should employ the UNION operator:

(SELECT name, games, goals FROM tblMadrid WHERE name = 'ronaldo')
UNION
(SELECT name, games, goals FROM tblBarcelona WHERE name = 'messi')
ORDER BY goals DESC

Technical Differences Between UNION and UNION ALL

In practical applications, UNION and UNION ALL have significant performance differences:

UNION operator automatically removes duplicate rows from the result set. This process involves sorting and deduplication operations, which can create substantial performance overhead for large datasets. Its internal implementation typically includes the following steps:

  1. Execute individual SELECT statements to obtain initial result sets
  2. Perform merge sorting on all result sets
  3. Scan the sorted results and remove duplicate rows

UNION ALL operator simply combines all result sets without deduplication, resulting in higher execution efficiency. When duplicate rows are known not to exist in the result set or when duplicates are acceptable, UNION ALL should be preferred.

Data Type Compatibility Requirements

When using the UNION operator, each SELECT statement must meet strict data type compatibility requirements:

Consider the following extended example demonstrating how to handle data type conversion:

-- Assuming tblMadrid's goals column is INT type, tblBarcelona's goals column is VARCHAR type
(SELECT name, games, CAST(goals AS VARCHAR) AS goals FROM tblMadrid WHERE name = 'ronaldo')
UNION
(SELECT name, games, goals FROM tblBarcelona WHERE name = 'messi')
ORDER BY CAST(goals AS INT) DESC

Extended Applications in Complex Query Scenarios

In actual business systems, the UNION operator can be applied to more complex query scenarios. Referencing the store sales case from supplementary materials, we can construct similar query patterns:

-- Combining query results from multiple conditions
(SELECT store_name, 'CA Store' AS category FROM stores WHERE state = 'CA')
UNION ALL
(SELECT store_id AS store_name, 'High Quantity' AS category FROM sales WHERE qty > 20)
UNION ALL
(SELECT store_id AS store_name, 'Special Terms' AS category FROM sales WHERE payterms > 'Net 30')
ORDER BY category, store_name

This pattern is particularly suitable for generating comprehensive reports or data overviews where data from different sources needs to be presented in a unified format.

Performance Optimization Considerations

In big data environments, performance optimization of UNION queries is crucial:

Performance testing example code:

-- Using EXPLAIN to analyze query execution plan
EXPLAIN 
(SELECT name, games, goals FROM tblMadrid WHERE games > 50)
UNION
(SELECT name, games, goals FROM tblBarcelona WHERE goals > 30)
ORDER BY goals DESC

Comparison with Other Combination Methods

Beyond the UNION operator, SQL provides other data combination methods:

JOIN Operations: Suitable for horizontally merging data from related tables, requiring explicit relationships between tables

-- Example with assumed relationships (not applicable to this case)
SELECT m.name, m.games, m.goals, b.team 
FROM tblMadrid m 
INNER JOIN team_info b ON m.team_id = b.id

Subqueries: Suitable for scenarios requiring nested queries but unable to achieve vertical result combination

Practical Application Recommendations

Based on the analysis in this article, the following practical application recommendations are proposed:

  1. Prioritize UNION/UNION ALL when tables are unrelated but structurally similar
  2. Clarify business requirements regarding duplicate rows and choose the appropriate operator
  3. Ensure column count and data type compatibility across all SELECT statements
  4. For large datasets, consider paginated queries or conditional filtering to reduce performance overhead
  5. Thoroughly test query performance in production environments

Conclusion

The UNION operator provides a powerful and flexible tool for vertically combining data from multiple independent sources. By deeply understanding its working principles, performance characteristics, and applicable scenarios, developers can construct efficient and reliable database query solutions. In actual projects, reasonably applying UNION and related technologies in combination with specific business requirements and data characteristics can significantly enhance data processing capabilities and system performance.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.