CS 410 Programming Assignment 2

Fall 2006 [Bono]
Due date: Thursday, October 5, 11:59pm

Note: updated 9/22/06 to correct the link to Java precedence/associativity summary.

Introduction

In this assignment you will write a parser for the Espresso language, using the tool bison. The bison documentation in available here. There is also a pdf version linked from the Documentation section of the course web page. You will also need to refer to two other documents to complete this assignment: (1) the Espresso Language Reference, which has a grammar for Espresso (using an extended grammar notation) and (2) the Espresso Abstract Syntax Trees document. These are both also linked from the Project section of the course web page.

There will be no face-to-face grading meeting (i.e., demo) required for this assignment. Students are required to work individually on this assignment (no groups). Save your solution to this assignment, because you will need it for later assignments.

Getting the assignment files

To be able to work on the assignment, create a new directory, PA2, for your assignment (distinct from your PA1 directory), cd into it, and then give the following Unix command.
 gmake -f ~csci410/espresso/parser/Makefile getfiles 
This Makefile copies the files that you will need to modify for the assignment to your directory, but also symbolically links the files you will need access to, but that you should not modify.

Also, copy your solution to PA1, that is, your espresso.flex file, into your PA2 directory. Do not copy any other files from your PA1 directory: all the necessary files for compiling your parser (testparser) and stand-alone lexical analyzer (testlexer) should already have "appeared" in your PA2 directory.

The assignment files

Most of these files are described in more detail in later parts of this document. The files in bold below are ones you create and/or modify and submit. The files are:

Parser

The parser generated by bison will produce an AST for the Espresso program parsed, or if there are errors, it will report the first one it finds and exits. You are not required to do any error recovery, but if you'd like to add it, you can read more about bison facilities for this in the section of the bison manual called Error Recovery.

You will be writing an LALR(1) grammar and actions for each grammar rule. The actions will build an AST for the program parsed. The start symbol is program; when we reduce to this symbol, we put the resulting AST in the global variable ast (the rule for program has already been written for you). The test driver (testparser.cc) calls dump on the ast after yyparse returns (if it exited successfully). Note that the dump functions print no_type for all of the expressions: this is fine; we will be adding type information to our trees in PA3.

The output of the parser when there are errors are just the error messages from the parser. We have defined a function yyerror for you that automatically gets called when there is an error. You should not modify this function, nor should you call it yourself.

A grammar for Espresso (using an extended grammar notation) is given in the Espresso Language Reference. As with other parts of this project, you may also need to refer to the Java Language Specification. That document, however, has no single easy place (that I could find) where the precedence and associativity rules for Java are summarized. For your convenience we provide a link to a page from a work-in-progress manuscript by Bill Venners called Objects and Java found on the artima.com web site that does have such a summary. Once you follow the link you should tell your browser to search for the term precedence, and you'll get to the relevant section on Operator Precedence and Associativity.

The AST classes are documented in the Espresso Abstract Syntax Trees document.

A note on function initParser. This function is called from main before the call to yyparse, but is defined in espresso.y. This is a "hook" for you to add any necessary initialization code, without having to change testparser.cc. It is currently empty, and you probably won't need to add to it, but see comments in the function for a way to use it to temporarily generate more debugging output.

Compiling and testing the Parser

The command
gmake testparser
compiles the parser. That is, it feeds espresso.y to bison, then compiles the resulting partial C++ program (which is in espresso-parse.cc) with the test driver (in testparser.cc) to make the executable, testparser. As mentioned earlier, please look at the Makefile to see exactly what happens.

The command

testparser inputfile
runs the resulting parser on the file inputfile, which does not have to have the .esp suffix. When your parser is complete, you need to verify that it accepts all syntactically correct Espresso programs and rejects all others. Syntactically correct does not mean correct, however: your complete parser will accept some invalid Espresso programs, because some errors will be handled by the semantic analyzer (PA3). To give just two examples: (1) any ID can be used in places where a type can go, because we haven't done type-checking; (2) the source file name prefix does not have to match the name of the class containing main (e.g., source file "foo.esp" has main class "Test").

Development strategy

Once again, we strongly encourage you to use incremental development. We have given you a small working subset to start with. For the subsets you create you will need to develop the grammar and the associated actions at the same time. If you develop a grammar first, you may find your rules don't easily generate the required trees and you will have to rewrite the grammar anyway. For example, we use STL vectors to represent list ASTs. You must add new elements to the end of STL vectors; additionally you need the list elements to appear in the vector in the same order as they were in the input file (because this order is significant): this will constrain how you can write the parts of your grammar that generate lists of things.

For each subset you build, once all your bison errors are out (see next section), you will need to test the parser it constructs. You can see if the parser is doing the right thing by examining the tree that gets printed, checking that it matches the structure of the Espresso source code.

We gave you the rules for the main method, in part, because it's a little bit tricky (has a hard-coded value for type "String[]" added to the string table). Some of the actions in the espresso.y we gave you involve local C++ variables, but for the most part, your actions will not need to define additional variables. In fact, most of the actions will be just one line looking something like:

 { $$ = new SomeTreeType(. . .); } 

There are a few articles with some general debugging hints from the PA2 clarifications file from past semesters that we kept in the clarifications file this semester.

Shift-reduce and reduce-reduce errors

Your submitted program is not allowed to have any shift-reduce (S-R) or reduce-reduce (R-R) errors, except for the if vs. if-else ambiguity (i.e., the dangling-else problem we discussed in lecture). That one will generate exactly one shift-reduce error.

To figure out why you got a S-R or R-R error, it's helpful to look at the file espresso.output, which bison produces when run in verbose mode (that's how we're running it in our Makefile). In that output file you can look at the exact rules and the states where the error occurred to figure out why your grammar has a conflict.

Using the typed value stack

The types of the values for $$, $1, etc., are determined by %type declarations of your non-terminals. Unlike the examples using semantic attributes we have done in lecture, in our parser, different nonterminals will need to have different types. Thus, we are going to use a union to represent the type for elements on the value stack. Because some of these semantic values also come from tokens, the types for the values from the lexical analyzer are also part of this union. This union is defined with the %union bison statement given to you in espresso.y (this is what YYSTYPE comes from).

The %token and %type statements associate a value type with symbols used in the grammar by giving the name (not the type) of the component of the union that is the desired type. All the ones for tokens have already been done for you, but as you add more nonterminals to your grammar, you will have to add %type declarations for them. Here is an example from espresso.y:

 %type <program> program 
says that the nonterminal program (the one on the right) uses the type of the component named program (in the brackets) from the union, which is type Program*, one of the AST node types.

The type you use for a particular nonterminal depends on what kind of tree you will need to store there. If you use the wrong type, the g++ compiler will complain (either with an error or a warning) when you try to pass the value in as a parameter to a constructor for other nodes as you build up your tree.

Expressions

You are allowed to use bison's precedence declarations (%left, %right, etc.), but only for expressions. This allows your grammar to have some ambiguities, but makes it easier to read. For example, you may have productions like E-->E+E, rather than E-->E+T, etc. When you use the precedence declarations, it should show no additional S-R conflicts in the bison output that goes to the screen, but it will show lines in the espresso.output file saying there were a bunch of S-R conflicts that were resolved. See the Operator Precedence section of the Bison manual for more information on this feature.

Hard-coded strings in the string table

In a few places in the parser you will add hard-coded strings to the string table. You can see an example of this in the mainMethod production, where we added the string "String[]", to be used as the type of the parameter to main. The two other places we need to do this are: (1) for the built-in types of Espresso, since type names are going to generally be represented as Symbol objects and (2) for this, which is a "special" variable. All this specialness will be handled in the semantic analyzer.

We don't need to worry about these special variables and types conflicting with the names of any programmer-defined variables and types because they were treated as reserved words during lexical analysis.

What to turn in

The Makefile includes a rule for submitting the assignment. You run it by typing gmake submit. It copies your espresso.y, README, and Test.esp to the course account using the local submit utility. The last version submitted is the one that will be graded: each submit overwrites the previous one.

The University of Southern California does not screen or control the content on this website and thus does not guarantee the accuracy, integrity, or quality of such content. All content on this website is provided by and is the sole responsibility of the person from which such content originated, and such content does not necessarily reflect the opinions of the University administration or the Board of Trustees