Keywords: SQL Query | NOT LIKE | Pattern Matching | String Filtering | Database Optimization
Abstract: This article provides an in-depth analysis of combining NOT LIKE and IN operators in SQL queries, explaining common errors and presenting correct solutions. Through detailed code examples, it demonstrates how to use multiple NOT LIKE conditions to exclude multiple pattern matches, while discussing implementation differences across database systems. The comparison between SQL Server and Power Query approaches to pattern matching offers valuable insights for effective string filtering in data queries.
Problem Background and Common Errors
During SQL query development, developers frequently need to filter data based on string patterns. A common requirement involves excluding records containing specific substrings. Many developers attempt to use syntax combinations like NOT LIKE IN, but this is not supported in standard SQL.
For example, the following query statement will generate a syntax error:
SELECT *
FROM Table1
WHERE EmpPU NOT Like IN ('%CSE%', '%ECE%', '%EEE%');
This error occurs because in SQL language specifications, the LIKE operator is designed for pattern matching with single strings, while the IN operator checks whether a value exists within a specified list of values. These two operators have different syntactic structures and cannot be directly combined.
Correct Solution
To achieve the requirement of excluding multiple pattern matches, the correct approach is to use multiple NOT LIKE conditions connected by the AND logical operator:
SELECT *
FROM Table1
WHERE EmpPU NOT LIKE '%CSE%'
AND EmpPU NOT LIKE '%ECE%'
AND EmpPU NOT LIKE '%EEE%';
This query will return all records from Table1 where the EmpPU column does not contain the substrings "CSE", "ECE", or "EEE". Each NOT LIKE condition independently checks one pattern, and a record is included in the result set only when all conditions are satisfied.
Detailed Explanation of Pattern Matching Wildcards
In SQL's LIKE operator, the percent symbol (%) serves as a placeholder for zero or more arbitrary characters. Therefore:
%CSE%matches any value containing the "CSE" substring%ECE%matches any value containing the "ECE" substring%EEE%matches any value containing the "EEE" substring
By combining multiple NOT LIKE conditions, we implement a logical "AND" operation for multiple exclusion patterns.
Comparison with Other Technologies
The reference article discusses similar requirements in the Power Query environment. Unlike SQL, Power Query does not have a direct LIKE operator but provides more specific string matching functions:
Table.SelectRows(TableName, each not Text.StartsWith([Description], "") and not Text.StartsWith([Product], "CMI"))
In Power Query, developers need to use functions like Text.Contains, Text.StartsWith, or Text.EndsWith to achieve similar pattern matching functionality, using the lowercase not operator for logical negation.
Performance Considerations and Best Practices
When dealing with large datasets, multiple LIKE operations may impact query performance. Here are some optimization recommendations:
- Minimize the number of pattern matches, including only necessary exclusion conditions
- Consider using full-text indexes or other string search optimization techniques
- When possible, use more precise matching conditions instead of wildcard patterns
- Be mindful of query folding issues, particularly when using tools like Power Query
Extended Application Scenarios
This multi-condition exclusion pattern has wide-ranging applications in practical scenarios:
- Data cleaning: Excluding records containing specific error codes or markers
- Category filtering: Excluding specific categories in classification systems
- Content screening: Filtering out records containing sensitive vocabulary in text content
- Log analysis: Filtering out specific types of system-generated log entries
By deeply understanding how SQL pattern matching works and mastering the correct syntax usage, developers can more effectively handle complex string filtering requirements, improving both the accuracy and efficiency of data queries.