Getting Started with ANTLR: A Step-by-Step Calculator Example from Grammar to Java Code

Keywords: ANTLR | Grammar Parsing | Java Programming | Arithmetic Calculator | Compiler Construction

Abstract: This article provides a comprehensive guide to building a four-operation calculator using ANTLR3. It details the complete process from grammar definition to Java code implementation, covering lexer and parser rule design, code generation, test program development, and semantic action integration. Through this practical example, readers will gain a solid understanding of ANTLR's core mechanisms and learn how to transform language specifications into executable programs.

Understanding ANTLR Grammar File Structure

ANTLR grammar files form the foundation of language definition, consisting of lexer rules and parser rules. Lexer rules start with capital letters and identify basic symbols in input, while parser rules start with lowercase letters and define language structure hierarchies.

Here's a basic grammar for a four-operation calculator:

grammar Exp;

eval returns [double value]
    :    exp=additionExp {$value = $exp.value;}
    ;

additionExp returns [double value]
    :    m1=multiplyExp       {$value =  $m1.value;} 
         ( '+' m2=multiplyExp {$value += $m2.value;} 
         | '-' m2=multiplyExp {$value -= $m2.value;}
         )* 
    ;

multiplyExp returns [double value]
    :    a1=atomExp       {$value =  $a1.value;}
         ( '*' a2=atomExp {$value *= $a2.value;} 
         | '/' a2=atomExp {$value /= $a2.value;}
         )* 
    ;

atomExp returns [double value]
    :    n=Number                {$value = Double.parseDouble($n.text);}
    |    '(' exp=additionExp ')' {$value = $exp.value;}
    ;

Number
    :    ('0'..'9')+ ('.' ('0'..'9')+)?
    ;

WS  
    :   (' ' | '\t' | '\r'| '\n') {$channel=HIDDEN;}
    ;

Grammar Rule Design Principles

The key to designing calculator grammar lies in properly handling operator precedence and associativity. ANTLR's hierarchical rule structure naturally expresses these language characteristics.

Precedence handling: Grammar rules are arranged from lowest to highest precedence. additionExp handles addition and subtraction (lowest precedence), multiplyExp handles multiplication and division (higher precedence), and atomExp handles numbers and parenthesized expressions (highest precedence). This ensures that expression 2+3*4 is correctly parsed as 2+(3*4) rather than (2+3)*4.

Associativity handling: The * quantifier indicates zero or more repetitions, implementing left associativity. For example, in the additionExp rule, the structure multiplyExp (('+'|'-') multiplyExp)* ensures that expression 1+2+3 is parsed as ((1+2)+3).

Code Generation and Compilation Process

After completing grammar definition, use the ANTLR tool to generate Java code. First download the ANTLR-3.2.jar file, then execute:

java -cp antlr-3.2.jar org.antlr.Tool Exp.g

This command generates three files: ExpLexer.java (lexer), ExpParser.java (parser), and Exp.tokens (token definitions). These files contain all Java code implementing the grammar rules.

Test Program Implementation

To verify the generated parser, create a test program. Here's a complete example:

import org.antlr.runtime.*;

public class ANTLRDemo {
    public static void main(String[] args) throws Exception {
        ANTLRStringStream in = new ANTLRStringStream("12*(5-6)");
        ExpLexer lexer = new ExpLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        ExpParser parser = new ExpParser(tokens);
        System.out.println(parser.eval());
    }
}

Compiling and running the test requires proper classpath configuration:

javac -cp .:antlr-3.2.jar ANTLRDemo.java
java -cp .:antlr-3.2.jar ANTLRDemo

Semantic Action Integration

ANTLR allows embedding Java code (semantic actions) within grammar rules, executed during parsing. In the calculator example, semantic actions implement expression evaluation.

Each grammar rule declares its return type via returns [double value]. Within rule bodies, use $variable to reference these returns. For example in the atomExp rule:

atomExp returns [double value]
    :    n=Number {$value = Double.parseDouble($n.text);}
    |    '(' exp=additionExp ')' {$value = $exp.value;}
    ;

Here $n.text retrieves the text content of the Number token, and Double.parseDouble() converts it to a double-precision floating-point number.

Error Handling Mechanisms

ANTLR provides robust error detection and reporting. When input violates grammar rules, the parser throws exceptions with detailed error messages.

For example, with incomplete expression "12*(5-6" (missing closing parenthesis), the parser outputs:

line 0:-1 mismatched input '<EOF>' expecting ')'

This indicates unexpected input at end-of-file, expecting a closing parenthesis. ANTLR's error recovery mechanism can continue parsing subsequent content, though in this simple example, errors cause parsing termination.

Practical Application Extensions

The basic calculator example can be extended with additional features. For instance, add variable support by passing a Map<String, Double> to store variable values. Also add unary operator support like negative signs -5, and more complex function calls.

For newer ANTLR versions (like ANTLR4), note API changes. For example, ANTLRStringStream becomes ANTLRInputStream, and return value access differs. Additionally, ANTLR4 simplifies lexer rule action syntax.

Summary and Best Practices

Through this four-operation calculator example, we see ANTLR's basic development workflow: define grammar rules, generate parser code, write test programs, add semantic actions. Key points include:

Clearly distinguish lexer and parser rules
Properly design rule hierarchies for precedence and associativity
Effectively use semantic actions to implement language semantics
Thoroughly test various input scenarios

While this example is relatively simple, it demonstrates ANTLR's core concepts and working principles. For more complex languages, build upon this foundation by gradually adding more grammar rules and semantic processing logic.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.