Keywords: SQL Joins | NULL Handling | Outer Joins
Abstract: This article provides an in-depth exploration of the different behaviors of IS NULL and IS NOT NULL in SQL join conditions versus WHERE clauses. Through theoretical explanations and code examples, it analyzes the generation logic of NULL values in outer join operations such as LEFT JOIN and RIGHT JOIN, clarifying why NULL checks in ON clauses are typically ineffective while working correctly in WHERE clauses. The article compares result differences across various query approaches using concrete database table cases, helping developers understand SQL join execution order and NULL handling logic.
NULL Handling Mechanisms in SQL Join Operations
In relational database queries, join operations are fundamental techniques for data integration. Particularly, outer joins (such as LEFT JOIN and RIGHT JOIN) allow us to preserve all records from one table while associating matching records from another. However, when dealing with NULL value checks, developers often encounter a puzzling phenomenon: using IS NULL or IS NOT NULL in join conditions (ON clauses) frequently fails to achieve expected results, while moving the same conditions to WHERE clauses works correctly. This article will thoroughly analyze the reasons behind this phenomenon through theoretical analysis and practical examples.
Fundamental Principles of Join Operations
First, we need to understand the basic execution logic of SQL join operations. Consider two simple tables:
-- Parent table
id | name
---|------
1 | Alex
2 | Bill
3 | Cath
4 | Dale
5 | Evan
-- Child table
pid | name
----|------
1 | Kate
1 | Lia
3 | Mary
NULL| Pan
When executing an inner join (INNER JOIN), the database returns only records that match the conditions from both tables:
SELECT parent.id, parent.name AS parent_name,
child.pid, child.name AS child_name
FROM parent INNER JOIN child
ON parent.id = child.pid;
The result shows all parent records with matching child records: Alex (matching Kate and Lia) and Cath (matching Mary).
Outer Joins and NULL Value Generation
When we use a left outer join (LEFT JOIN), the situation changes:
SELECT parent.id, parent.name AS parent_name,
child.pid, child.name AS child_name
FROM parent LEFT JOIN child
ON parent.id = child.pid;
Now, the result includes not only matching records but also all unmatched records from the parent table. For these unmatched records, child table fields are filled with NULL values. This is key to understanding the issue: NULL values are generated during the join operation, not pre-existing in the tables.
Example result:
id | parent_name | pid | child_name
---|-------------|------|-----------
1 | Alex | 1 | Kate
1 | Alex | 1 | Lia
3 | Cath | 3 | Mary
2 | Bill | NULL | NULL
4 | Dale | NULL | NULL
5 | Evan | NULL | NULL
Note that Bill, Dale, and Evan records have NULL values for child table fields because they have no matching records in the child table.
Execution Timing Differences: WHERE vs ON Clauses
The execution order of SQL queries determines the different effects of condition checks in WHERE and ON clauses:
- FROM and JOIN operations execute first: The database joins tables based on ON clause conditions, generating an intermediate result set. For outer joins, unmatched records are filled with NULL values.
- WHERE clause applies afterward: Filters the joined result set, where NULL values already exist, allowing
IS NULLchecks to work correctly.
Consider this query:
SELECT parent.id, parent.name AS parent_name,
child.pid, child.name AS child_name
FROM parent LEFT JOIN child
ON parent.id = child.pid
WHERE child.pid IS NULL;
This query correctly returns all parent records without child records (Bill, Dale, Evan) because the WHERE clause checks for NULL values after the join is complete.
Logical Contradiction of NULL Checks in ON Clauses
Now, moving the IS NULL condition to the ON clause:
SELECT parent.id, parent.name AS parent_name,
child.pid, child.name AS child_name
FROM parent LEFT JOIN child
ON parent.id = child.pid
AND child.pid IS NULL;
This query contains a fundamental logical contradiction:
- Condition
parent.id = child.pidrequires the two fields to be equal - Condition
child.pid IS NULLrequires the child table field to be NULL - In SQL's three-valued logic, NULL does not equal any value, including itself
Therefore, these two conditions can never be satisfied simultaneously. For left joins, when ON conditions evaluate to FALSE, the database still returns all records from the left table, but with all child table fields as NULL. This explains why using IS NULL checks in ON clauses typically yields unexpected results.
Practical Case: Shipments and Returns Query
Consider a practical e-commerce scenario with shipments and returns tables:
-- Problematic query: Using IS NULL in ON clause
SELECT *
FROM shipments s
LEFT OUTER JOIN returns r
ON s.id = r.id
AND r.id IS NULL
WHERE s.day >= CURDATE() - INTERVAL 10 DAY;
This query attempts to find shipment records without returns in the last 10 days, but due to the logical contradiction mentioned above, it actually returns all shipment records regardless of return status.
The correct approach is moving the NULL check to the WHERE clause:
SELECT *
FROM shipments s
LEFT OUTER JOIN returns r
ON s.id = r.id
WHERE s.day >= CURDATE() - INTERVAL 10 DAY
AND r.id IS NULL;
This query correctly filters shipment records without returns because the WHERE clause evaluates after the join, when NULL values already exist.
Special Properties of NULL in SQL
To fully understand this issue, one must grasp the special properties of NULL in SQL:
- NULL represents "unknown" or "non-existent" values, not empty strings or zeros
- Comparisons between NULL and any value (including NULL itself) return UNKNOWN, not TRUE or FALSE
- NULL values must be checked using
IS NULLorIS NOT NULL, not= NULL
This three-valued logic (TRUE, FALSE, UNKNOWN) is a unique characteristic of SQL and the source of much confusion.
NULL Handling in Other Join Types
Right joins (RIGHT JOIN) behave similarly to left joins but preserve all records from the right table:
SELECT parent.id, parent.name AS parent_name,
child.pid, child.name AS child_name
FROM parent RIGHT JOIN child
ON parent.id = child.pid;
This query displays all child records, including Pan (with pid as NULL) who has no parent record.
Full outer joins (FULL OUTER JOIN) in database systems that support them display all records from both tables, regardless of matches. MySQL doesn't support full outer joins but can simulate them using UNION.
Best Practices and Recommendations
Based on the above analysis, we propose the following best practices:
- Avoid NULL checks in ON clauses: Unless specifically required, place
IS NULLandIS NOT NULLconditions in WHERE clauses. - Understand query execution order: Remember SQL query execution order (FROM/JOIN → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT).
- Choose join types deliberately: Select appropriate join types based on requirements, understanding each type's impact on NULL value handling.
- Test edge cases thoroughly: Especially for queries involving NULL values, test various edge cases comprehensively.
Conclusion
The behavioral differences of IS NULL and IS NOT NULL in SQL join conditions stem from join operation execution mechanisms and the special properties of NULL values. In ON clauses, these conditions evaluate during join establishment, typically causing logical contradictions; in WHERE clauses, they evaluate after join completion, correctly filtering generated NULL values. Understanding this distinction is crucial for writing correct and efficient SQL queries. By mastering join operation principles and NULL value handling logic, developers can avoid common pitfalls and write more reliable database query code.