Project #7 - Parse infix expression P.J. Drongowski COMP15 Fall 2004 ********************************************* More on the grammar ********************************************* The original expression grammar is: E ::= E '+' T | E '-' T | T T ::= T '*' F | T '/' F | F F ::= '(' E ')' | IDENTIFIER | NUMBER The first production says that an expression (E) is either: * An expression (E) followed by a '+' followed by a term (T), OR * An expression (E) followed by a '-' followed by a term (T), OR * A term (T.) When the grammar is rewritten to eliminate left recursion, each production is replaced by two productions (e.g., the original E is replaced by two new production rules, E and E'.) E ::= T E' E' ::= '+' T E' | '-' T E' | EMPTY T ::= F T' T' ::= '*' F T' | '/' F T' | EMPTY F ::= NUMBER | IDENTIFIER | '(' E ')' EMPTY is a new non-terminal that replaces the terminal on the left hand side of the BNF definition operator when the current input symbol (token) does not match a legal lookahead symbol. So, the use of E' in the production rules: E ::= T E' E' ::= '+' T E' | '-' T E' | EMPTY makes an E' *optional*. In the first rule, an expression can be just a term (T), OR it can be a term followed by an E'. Notice that the second rule is now right recursive, which is conducive for recursive descent (top-down) parsing. The rule handles expressions in which a '+' is followed by a term (T) which may *optionally* be followed by an E'. These rules handle expressions of the form: a + b + c The identifier 'a' matches the term (T) in the first rule. The rest of the expression is handled by the second rule. The first '+' matches the '+' in rule for E', 'b' matches the term (T.) Then the E' rule is applied again and the second '+' matches the '+' in the E' rule and 'c' matches a term (T.) The third time an attempt is made to apply the E' rule, the parse succeeds by matching the EMPTY string and a fourth try is not attempted. ********************************************* Building the parser ********************************************* The transformed rules are easily translated to code as shown in the on-line note on recursive descent. http://www.cs.tufts.edu/comp/15/notes/recur_descent Also, be sure to check out the on-line example that shows how to parse prefix expressions. http://www.cs.tufts.edu/comp/15/notes/prefix ********************************************* Building the tree ********************************************* If we could directly apply the left recursive rules, it would be easy to build the tree. When processing an operator, pointers to both subtrees would be immediately available and it would be easy to create a node for the operator, then link the left and right subtrees in. Unfortunately, the rewritten rules complicate tree building a little bit. Each private parse function (expr, expr_prime, term, term_prime, factor, number, identifier) is still a little factory that creates a node when the parse is successful. However, when calling the "prime" parse functions, it is necessary to pass forward (downwards) a pointer to the left subtree. Again, consider the expression: a + b + c. + / \ a + b + c => ((a + b) + c) => + c / \ a b When applying the rules: E ::= T E' E' ::= '+' T E' | '-' T E' | EMPTY The 'a' matches the term (T) in the first rule. This will become the left subtree of the final expression tree. A pointer to the node representing 'a' must be passed to the parse function for the E' rule. The rest of the expression is handled by the E' rule which parses and creates nodes in the usual recursive manner. Your code simply needs to test for the non-EMPTY case in which another '+' T sequence has been detected. For the non-EMPTY case, you will need to link in another subtree and return a pointer to the larger subtree. ********************************************* Incremental development style ********************************************* I strongly urge the use of an incremental development style. Do not try to write the whole program and debug it in its entirety from the start. Here is a suggested development sequence. 1. Build a parser that correctly recognizes well-formed expressions. Do not try to build the tree in this initial version. 2. Use the "success" function as shown in the prefix expression example to trace execution of your parser. This will give you insight as to how the recursive parse functions are called. 3. Implement the preorder expression display function. Use this function for debugging. Remember, it should be able to display any expression tree, so it should be able to display your intermediate results. 4. Add code to generate nodes for the factor (F) rule. Producing nodes for an identifier, number and parenthesized subexpression should be trivial. Test your code with simple expressions consisting of an identifier, a number, an identifier in parens, and a number in parens. 5. Attack the T (term) and T' (term_prime) rules next. Test with the expressions, a+b and a+b+c. Get this case right and you will be able to copy and modify your solution for the E (expr) and E' (expr_prime) rules. ********************************************* Expressions for debugging ********************************************* Here are the expressions that I used for debugging. They show how I used an incremental development process to build up to the full program. 80 var (90) (var) 3*4 a*b*c a+b a+b+c a+b*c (a + b) * c (a - b) / c ((x2 - x1) * (x2 - x1)) + ((y2 - y1) * (y2 - y1)) (x2 - x1) * (x2 - x1) + (y2 - y1) * (y2 - y1)