CS 410 Programming Assignment 2
Fall 2006 [Bono]
Due date: Thursday, October 5, 11:59pm
Note: updated 9/22/06 to correct the link to Java
precedence/associativity summary.
Introduction
In this assignment you will write a parser for the Espresso language,
using the tool bison. The bison documentation
in available
here.
There is also a pdf version linked from the Documentation section of
the course web page. You will also need to refer to two other
documents to complete this assignment: (1) the Espresso Language Reference, which has a
grammar for Espresso (using an extended grammar notation) and (2)
the Espresso Abstract Syntax Trees
document. These are both also linked from the Project section of the
course web page.
There will be no face-to-face grading meeting
(i.e., demo) required for this assignment. Students are required to
work individually on this assignment (no groups). Save your solution
to this assignment, because you will need it for later assignments.
Getting the assignment files
To be able to work on
the assignment, create a new directory, PA2, for your assignment (distinct from
your PA1 directory), cd into it, and then give the
following Unix command.
gmake -f ~csci410/espresso/parser/Makefile getfiles
This Makefile copies the files that you will need to modify for the
assignment to your directory, but also symbolically links the files
you will need access to, but that you should not modify.
Also, copy your solution to PA1, that is, your
espresso.flex file, into your PA2 directory. Do not copy
any other files from your PA1 directory: all the necessary files for
compiling your parser (testparser) and stand-alone lexical analyzer
(testlexer) should already have "appeared" in your PA2
directory.
The assignment files
Most of these files are described in
more detail in later parts of this document. The files in bold below
are ones you create and/or modify and submit. The files are:
-
espresso.y. The bison input file. This
is where you will write all of your code for the assignment. As with
PA1, we gave you a starter version that recognizes a small subset of
Espresso when used with a working lexical analyzer.
-
README.
See directions in the file itself.
-
Test.esp. The current contents is a
valid Espresso program that can be parsed by the version of espresso.y
we gave you. Add to this file so that it tests every legal construct
in Espresso. Thus, the version you create will be a syntactically
correct Espresso program (will not generate a parse error).
-
TestBad.esp. This file contains an invalid Espresso
program. You can run this on the parser produced by the current
espresso.y, to see what happens when you get parse errors
in a bison-generated parser.
- ast.h.
The header file for the abstract syntax tree
package, which is documented in the separate document,
Espresso ASTs.
- astDump.cc. The AST member functions
for printing out ASTs.
-
espresso.flex. You need your solution to
PA1 as described above.
-
testparser.cc.
The test driver for the parser. We
have written this for you. You will find it useful to look at to see
what the program should do.
-
espresso.tab.h. The declarations that need to be known to
both the lexical analyzer and the parser. This file will be
automatically created from the espresso.y we gave you when
you run bison on espresso.y. Do not modify this
file.
-
parser.h. Forward declarations for types used in espresso.tab.h
-
Makefile. This has a rule for getting the files
(getfiles, described previously), for compiling the program,
and for submitting the program. See comments at the top of the file
for details. You should not need to modify this file unless you add
additional source files (for which you would need to add make rules
and change the submit rule). We recommend you familiarize yourself
with the workings of this Makefile.
-
StringTab.h
and
StringTab.cc. A string table class (StringTab) and
associated Symbol class. You will be using these classes in your
parser in a few places. They are documented in the .h
file.
-
printToken.h.
and
printToken.cc. A function for printing out the name of a
token when given its number. It knows about the #defines
in espresso.tab.h. This is used by the error-reporting routine,
yyerror, defined in espresso.y.
-
lexer.h.
Used by flex input file testlexer.cc.
-
testlexer.cc. Provided in case you want to run your lexer
stand-alone in this directory.
Parser
The parser generated by bison will produce an AST for the Espresso program
parsed, or if there are errors, it will report the first one it finds
and exits. You are not required to do any error recovery, but if
you'd like to add it, you can read more about bison facilities for this
in the section of the bison manual called Error Recovery.
You will be writing an LALR(1) grammar and actions for each grammar
rule. The actions will build an AST for the program parsed. The
start symbol is program; when we reduce to this symbol, we
put the resulting AST in the global variable ast (the rule
for program has already been written for you). The test
driver (testparser.cc) calls
dump on the ast after yyparse
returns (if it exited successfully). Note that the dump
functions print no_type for all of the expressions: this
is fine; we will be adding type information to our trees in PA3.
The output of the parser when there are errors are just the error
messages from the parser. We have defined a function
yyerror for you that automatically gets called when there
is an error. You should not modify this function, nor should you call
it yourself.
A grammar for Espresso (using an extended grammar notation) is given
in the Espresso Language Reference.
As with other parts of this project, you may also need to refer to the
Java Language Specification. That document, however, has no
single easy place (that I could find) where the precedence and
associativity rules for Java are summarized.
For your convenience
we provide a link to
a page from a work-in-progress manuscript by Bill Venners called
Objects and Java found on the artima.com web site
that does have such a summary. Once you follow the link
you should tell your browser to search for the term
precedence, and you'll get to the relevant section on Operator
Precedence and Associativity.
The AST classes are documented in
the Espresso Abstract Syntax
Trees document.
A note on function initParser. This function is called
from main before the call to yyparse, but is
defined in espresso.y. This is a "hook" for you to add any
necessary initialization code, without having to change
testparser.cc. It is currently empty, and you probably
won't need to add to it, but see comments in the function for a way
to use it to temporarily generate more debugging output.
Compiling and testing the Parser
The command gmake testparser
compiles the parser.
That is, it feeds
espresso.y
to bison, then
compiles the resulting partial C++ program (which is in
espresso-parse.cc) with the test driver (in
testparser.cc) to make the executable, testparser.
As mentioned earlier, please look at the
Makefile
to see exactly what happens.
The command
testparser inputfile
runs the
resulting parser on the file inputfile, which does not have
to have the .esp suffix. When your parser is complete, you
need to verify that it accepts all syntactically correct Espresso
programs and rejects all others. Syntactically correct does not mean
correct, however: your complete parser will accept some invalid
Espresso programs, because some errors will be handled by the semantic
analyzer (PA3). To give just two examples: (1) any
ID can be used in places where a type can go, because we
haven't done type-checking; (2) the source file name prefix does not
have to match the name of the class containing main (e.g.,
source file "foo.esp" has main class "Test").
Development strategy
Once again, we strongly encourage you to use incremental development.
We have given you a small working subset to start with. For the
subsets you create you will need to develop the grammar and the
associated actions at the same time. If you develop a grammar first,
you may find your rules don't easily generate the required trees and
you will have to rewrite the grammar anyway. For example, we use STL
vectors to represent list ASTs. You must add new elements to the
end of STL vectors; additionally you need the list elements to appear
in the vector in the same order as they were in the input file
(because this order is significant): this will constrain how you can
write the parts of your grammar that generate lists of things.
For each subset you build, once all your bison errors are out (see
next section), you will need to test the parser it constructs. You
can see if the parser is doing the right thing by examining the tree
that gets printed, checking that it matches the structure of the Espresso
source code.
We gave you the rules for the main method, in part, because
it's a little bit tricky (has a hard-coded value for type "String[]"
added to the string table). Some of the actions in the
espresso.y we gave you involve local C++ variables, but for
the most part, your actions will not need to define additional
variables. In fact, most of the actions will be just one line looking
something like:
{ $$ = new SomeTreeType(. . .); }
There are a few articles with some general debugging hints from the
PA2 clarifications file from past semesters that we kept in the
clarifications file this semester.
Shift-reduce and reduce-reduce errors
Your submitted program is not allowed to have any shift-reduce (S-R)
or reduce-reduce (R-R) errors, except for the if
vs. if-else ambiguity (i.e., the dangling-else problem we
discussed in lecture). That one will generate exactly one
shift-reduce error.
To figure out why you got a S-R or R-R error, it's helpful to look at
the file espresso.output, which bison produces when run in
verbose mode (that's how we're running it in our Makefile). In that
output file you can look at the exact rules and the states where the
error occurred to figure out why your grammar has a conflict.
Using the typed value stack
The types of the values for $$, $1, etc., are
determined by %type declarations of your non-terminals.
Unlike the examples using semantic attributes we have done in lecture,
in our parser, different nonterminals will need to have different
types. Thus, we are going to use a union to represent the
type for elements on the value stack. Because some of these semantic
values also come from tokens, the types for the values from the
lexical analyzer are also part of this union. This union is defined
with the %union bison statement given to you in espresso.y (this is what
YYSTYPE comes from).
The %token and %type statements associate a value type with symbols
used in the grammar by giving the name (not the type) of the component
of the union that is the desired type. All the ones for tokens have
already been done for you, but as you add more nonterminals to your
grammar, you will have to add %type declarations for them. Here is an
example from espresso.y:
%type <program> program
says that the nonterminal program (the one on the right)
uses the type of the component named program (in the
brackets) from the union, which is type Program*, one of
the AST node types.
The type you use for a particular nonterminal depends on what kind of
tree you will need to store there. If you use the wrong type, the g++
compiler will complain (either with an error or a warning) when you
try to pass the value in as a parameter to a constructor for other
nodes as you build up your tree.
Expressions
You are allowed to use bison's precedence declarations (%left, %right,
etc.), but only for expressions. This allows your grammar to have
some ambiguities, but makes it easier to read. For example, you may
have productions like E-->E+E, rather than E-->E+T, etc. When you
use the precedence declarations, it should show no additional S-R
conflicts in the bison output that goes to the screen, but it will
show lines in the espresso.output file saying there were a
bunch of S-R conflicts that were resolved.
See the Operator Precedence section of the Bison manual for more
information on this feature.
Hard-coded strings in the string table
In a few places in the parser you will add hard-coded strings to the
string table. You can see an example of this in the
mainMethod production, where we added the string
"String[]", to be used as the type of the parameter to
main. The two other places we need to do this are: (1) for
the built-in types of Espresso, since type names are going to
generally be represented as Symbol objects and (2) for
this, which is a "special" variable. All this specialness
will be handled in the semantic analyzer.
We don't need to worry about these special variables and types
conflicting with the names of any programmer-defined variables and
types because they were treated as reserved words during lexical analysis.
What to turn in
The
Makefile
includes a rule for submitting the assignment. You run
it by typing gmake submit. It copies your
espresso.y,
README,
and
Test.esp
to the course account using the local submit utility. The
last version submitted is the one that will be graded: each submit
overwrites the previous one.
The University of Southern California does not screen or control the content on this website and thus does not guarantee the accuracy, integrity, or quality of such content. All content on this website is provided by and is the sole responsibility of the person from which such content originated, and such content does not necessarily reflect the opinions of the University administration or the Board of Trustees