Using Parentheses for Logical OR Matching in Regular Expressions: A Case Study with Numbers Followed by Time Units

Dec 02, 2025 · Programming · 12 views · 7.8

Keywords: regular expression | parentheses | logical OR

Abstract: This article explores a common regular expression issue—matching strings with numbers followed by "seconds" or "minutes"—by analyzing the role of parentheses. It explains why the original expression fails, details the correct use of parentheses for logical OR matching, and provides an improved expression. Additionally, it discusses alternative optimizations, such as simplified grouping and non-capturing groups, to offer a comprehensive understanding of parentheses usage and best practices in regex.

In regular expressions, the logical OR operator (|) is used to match one of multiple patterns, but without proper use of parentheses, it can lead to unintended matching results. This article delves into how to correctly use parentheses for logical OR matching through a specific case study.

Problem Description and Analysis of the Original Expression

Suppose we need to match strings consisting of an integer followed by "seconds" or "minutes", such as "5 seconds" or "10 minutes". The original expression is: ([0-9]+)\s+(\bseconds\b)|(\bminutes\b). This expression correctly captures the number and "seconds" when matching "5 seconds", but for "5 minutes", the capture groups result in ";;minutes", meaning the number and space are not captured properly.

Root Cause: Missing Parentheses Leading to Incorrect Logical OR Scope

The issue with the original expression lies in the low precedence of the logical OR operator (|) and the lack of parentheses to define its scope. The expression ([0-9]+)\s+(\bseconds\b)|(\bminutes\b) is actually parsed as two separate parts: ([0-9]+)\s+(\bseconds\b) OR (\bminutes\b). This means it either matches a number plus space plus "seconds", or just "minutes" alone, not a number plus space plus "seconds" or "minutes". Therefore, when inputting "5 minutes", since the first part doesn't match, the regex engine tries the second part, matching only "minutes", causing the number and space to be uncaptured.

Solution: Using Parentheses to Define Logical OR Scope

To fix this, add parentheses around the logical OR operator to clarify its scope. The improved expression is: ([0-9]+)\s+((\bseconds\b)|(\bminutes\b)). Here, the outer parentheses treat (\bseconds\b)|(\bminutes\b) as a single unit, ensuring the logical OR applies to "seconds" and "minutes", not the entire expression. This allows the expression to correctly match strings with numbers followed by "seconds" or "minutes", capturing all relevant parts.

Code Example and Explanation

Below is an example using PHP's preg_match function to demonstrate the improved expression:

<?php
$pattern = '/([0-9]+)\s+((\bseconds\b)|(\bminutes\b))/';
$string1 = "5 seconds";
$string2 = "10 minutes";

if (preg_match($pattern, $string1, $matches1)) {
    echo "Matching '5 seconds': " . print_r($matches1, true);
}

if (preg_match($pattern, $string2, $matches2)) {
    echo "Matching '10 minutes': " . print_r($matches2, true);
}
?>

The output will show that for "5 seconds", the capture groups include the number "5" and "seconds"; for "10 minutes", they include "10" and "minutes". This verifies the correctness of the improved expression.

Reference to Other Optimization Approaches

Beyond the primary solution, other optimizations can be considered. For example, using a single group to simplify the expression: ([0-9]+)\s*(seconds|minutes). Here, \s* allows zero or more spaces, increasing flexibility, and (seconds|minutes) directly captures the time unit without extra grouping. However, note that this approach might match "5seconds" (no space), so adjust based on requirements.

Summary and Best Practices

This article emphasizes the importance of parentheses in defining logical OR scope in regular expressions through a concrete case study. Key takeaways include: always use parentheses to clarify the scope of logical OR operators to avoid precedence issues; choose grouping methods based on needs, such as using non-capturing groups (?:...) for better performance. In practice, tools like regex101.com are recommended for testing expressions to ensure they match as expected. Mastering these techniques enables more effective regex writing and maintenance.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.