Keywords: Python | Mathematical Expression Evaluation | Pyparsing | Secure Parsing | String Processing
Abstract: This paper explores effective methods for securely evaluating mathematical expressions stored as strings in Python. Addressing the security risks of using int() or eval() directly, it focuses on the NumericStringParser implementation based on the Pyparsing library. The article details the parser's grammar definition, operator mapping, and recursive evaluation mechanism, demonstrating support for arithmetic expressions and built-in functions through examples. It also compares alternative approaches using the ast module and discusses security enhancements such as operation limits and result range controls. Finally, it summarizes core principles and practical recommendations for developing secure mathematical computation tools.
Background and Challenges
In Python programming, handling mathematical expressions in string form is a common requirement. For example, given the string "2^4", the expected result is the numerical value 16. Direct use of int("2^4") raises a ValueError: invalid literal for int() with base 10: '2^4' error, as the int() function only converts pure numeric strings. While the eval() function can execute string code and return results, its security is concerning: malicious inputs like "__import__('os').remove('important file')" may lead to arbitrary command execution, or expressions like "9**9**9**9**9**9**9**9" could exhaust computational resources. Therefore, developing a secure and controllable method for expression evaluation is crucial.
Solution with Pyparsing Library
Pyparsing is a powerful parsing library suitable for building custom syntax analyzers. Based on its example fourFn.py, a NumericStringParser class can be encapsulated to parse and evaluate mathematical expressions. This approach enhances security by defining strict grammar rules and operator mappings, preventing the execution of arbitrary code.
Grammar Definition and Parsing Structure
The core of NumericStringParser involves using Pyparsing components to define the grammar of mathematical expressions. The grammar rules are based on context-free grammar, including:
- Atomic elements: Numbers, constants (e.g., PI, E), function calls, or parenthesized expressions.
- Operator precedence: Exponentiation (
^) has the highest priority, followed by multiplication and division (*,/), then addition and subtraction (+,-). By layering definitions offactor,term, andexpr, correct operation order is achieved. - Parse actions: The
setParseActionmethod pushes parsed elements onto a stack for subsequent evaluation.
For instance, numbers are defined via Combine and Word to support scientific notation; function identifiers consist of letters, digits, and specific characters. This design ensures only predefined mathematical elements are parsed, excluding unsafe code.
Operator and Function Mapping
The parser maintains two mapping dictionaries internally:
opndictionary: Maps operator symbols to Python'soperatormodule functions, such as"^"tooperator.powfor exponentiation.fndictionary: Supports built-in mathematical functions likesin,cos,exp, etc., calling corresponding functions from themathmodule.
This mapping mechanism restricts executable operations, preventing users from injecting custom functions or dangerous calls.
Recursive Evaluation Algorithm
The evaluation process is implemented via the evaluateStack method, using a stack-based recursive approach:
- Pop elements from the expression stack.
- If an operator, recursively evaluate operands and apply the mapped function.
- If a constant or function, return the corresponding value or call the function.
- The final result is returned through the
evalmethod, supporting floating-point output.
For example, the expression "2^4" is parsed into the stack [2, 4, '^'], yielding 16.0 after evaluation. This process isolates expression logic from the Python execution environment, enhancing security.
Usage Examples and Performance Analysis
After instantiating NumericStringParser, string expressions can be evaluated directly:
nsp = NumericStringParser()
result = nsp.eval('2^4')
print(result) # Output: 16.0
result = nsp.eval('exp(2^4)')
print(result) # Output: 8886110.520507872
The parser supports complex expressions like "1 + 2*3^(4^5) / (6 + -7)", correctly handling precedence and parentheses. Performance-wise, Pyparsing parsing overhead is higher than direct eval, but through pre-compiled grammar and optimized stack operations, it meets most application scenarios. Security tests show that malicious inputs like "__import__('os').remove('file')" are parsed as invalid identifiers or raise exceptions, avoiding code execution.
Alternative Approaches and Security Enhancements
Beyond Pyparsing, the ast module offers another secure evaluation method. By parsing the Abstract Syntax Tree (AST) and customizing evaluation functions to limit operation types:
import ast
import operator as op
operators = {ast.Add: op.add, ast.Sub: op.sub, ast.Mult: op.mul,
ast.Div: op.truediv, ast.Pow: op.pow}
def eval_expr(expr):
node = ast.parse(expr, mode='eval').body
# Recursively evaluate AST nodes, allowing only predefined operations
return eval_node(node)
This method also avoids the security risks of eval but requires handling more node types. To enhance security, additional measures can be implemented:
- Operation limits: Override operator functions to check parameter ranges. For example, limit the base and exponent sizes in
powoperations to prevent resource exhaustion. - Result range controls: Use decorators to limit the magnitude of intermediate results, avoiding numerical overflow or excessive computation.
These measures are applicable in both Pyparsing and AST approaches, further improving system robustness.
Conclusion and Best Practices
Securely evaluating mathematical expressions in strings requires balancing functionality and risk. The NumericStringParser based on Pyparsing provides a structured solution, effectively defending against code injection attacks through strict grammar definitions and restricted operation mappings. Key practices include:
- Avoid using
eval: Prefer parser-based solutions unless in fully controlled environments. - Define clear grammar: Limit expression elements to mathematical constructs, excluding potentially dangerous structures.
- Implement runtime checks: Add parameter validation and result monitoring to prevent abuse.
- Consider performance and scalability: For high-performance needs, optimize parsing logic or cache results.
By combining Pyparsing's flexibility with custom security strategies, developers can build reliable and secure mathematical expression evaluation tools, suitable for applications in educational software, calculator apps, or data analysis systems.