Keywords: Teradata | Aggregate Functions | Window Functions | Error 3504 | SQL Optimization
Abstract: This technical article provides an in-depth analysis of Error 3504 encountered when mixing aggregate functions with window functions in Teradata. By examining SQL execution logic order, we present two effective solutions: using nested aggregate functions with extended GROUP BY, and employing subquery JOIN alternatives. The article details the execution timing of OLAP functions in query processing pipelines, offers complete code examples with performance comparisons, and helps developers fundamentally understand and resolve this common issue.
Problem Background and Error Analysis
In Teradata database development, developers often need to compute both grouped aggregate values and partition-based window function values simultaneously. The original query attempts to retrieve the last completion date for specific courses per employee, while also calculating the maximum completion date for each course code across all employees. The query statement is as follows:
SELECT employee_number,
MAX(course_completion_date)
OVER (PARTITION BY course_code) AS max_course_date,
MAX(course_completion_date) AS max_date
FROM employee_course_completion
WHERE course_code IN ('M910303', 'M91301R', 'M91301P')
GROUP BY employee_numberThis query execution results in Error Code 3504: "Selected non-aggregate values must be part of the associated group". The root cause of this error lies in the logical execution order of SQL queries.
SQL Execution Order and Error Root Cause
In standard SQL specification, query execution follows a specific sequence: FROM → WHERE → GROUP BY → HAVING → Window Functions → SELECT. Window functions (OLAP functions) execute after GROUP BY, meaning they can only access columns included in the GROUP BY clause or columns processed by aggregate functions.
In the original query, MAX(course_completion_date) OVER (PARTITION BY course_code) attempts to partition by course_code after grouping by employee_number. However, the course_code column is neither in the GROUP BY clause nor processed by an aggregate function, thus violating SQL's logical execution rules.
Solution 1: Nested Aggregation and GROUP BY Extension
The first solution, based on standard SQL specification, achieves the requirement through nested aggregate functions and extended GROUP BY clause:
SELECT employee_number,
MAX(MAX(course_completion_date))
OVER (PARTITION BY course_code) AS max_course_date,
MAX(course_completion_date) AS max_date
FROM employee_course_completion
WHERE course_code IN ('M910303', 'M91301R', 'M91301P')
GROUP BY employee_number, course_codeThe execution logic of this approach is: first group by employee_number and course_code to calculate the maximum completion date for each employee per course, then perform secondary aggregation in the window function to compute the maximum completion date for each course code across all employees. Teradata also supports using column aliases to simplify expressions:
SELECT employee_number,
MAX(max_date)
OVER (PARTITION BY course_code) AS max_course_date,
MAX(course_completion_date) AS max_date
FROM employee_course_completion
WHERE course_code IN ('M910303', 'M91301R', 'M91301P')
GROUP BY employee_number, course_codeSolution 2: Subquery JOIN Alternative
As the accepted best answer, the second solution employs subqueries and JOIN operations to separate computation logic, avoiding the mixing of aggregate and window functions:
SELECT employee_number,
course_code,
MAX(course_completion_date) AS max_date,
lcc.LAST_COURSE_COMPLETED
FROM employee_course_completion ecc
LEFT JOIN (
SELECT employee_number,
MAX(course_completion_date) AS LAST_COURSE_COMPLETED
FROM employee_course_completion
WHERE course_code IN ('M910303', 'M91301R', 'M91301P')
) lcc
ON lcc.employee_number = ecc.employee_number
WHERE course_code IN ('M910303', 'M91301R', 'M91301P')
GROUP BY employee_number, course_code, lcc.LAST_COURSE_COMPLETEDThe advantage of this approach lies in its clear logic, ease of understanding and maintenance. The subquery independently calculates the last completion date for each employee across all target courses, while the main query focuses on computing maximum completion dates grouped by employee and course.
Performance and Applicability Analysis
Both solutions have their respective strengths and weaknesses. The nested aggregation approach offers concise code but may incur additional computational overhead with large datasets. The subquery JOIN approach, while slightly more verbose, typically yields more optimized execution plans, particularly within Teradata's parallel processing architecture.
In practical applications, it's recommended to select the appropriate solution based on data volume, query frequency, and system resources. For frequently executed queries, the subquery JOIN approach generally provides better stability and predictable performance.
Best Practice Recommendations
To avoid similar errors, developers writing queries that mix aggregate and window functions should always: clearly understand SQL execution order, ensure columns accessed by window functions are either in GROUP BY or processed by aggregates, and prioritize using subqueries to separate computation logic in complex scenarios. These practices apply not only to Teradata but also to other database systems adhering to SQL standards.