To complete this program you'll need your completed lexical analyzer and parser.
The Espresso Language Reference describes some of the semantics of Espresso, and we will discuss some more about semantic rules here. For anything not described in these places, Espresso follows the rules for Java, which are described in the Java Language Reference. Both of these documents are also linked from the course web page.
There will be a face-to-face grading meeting (i.e., demo) required for this assignment; you'll be getting more information about that as deadline approaches. Students are required to work individually on this assignment (no groups). Save your solution to this assignment, because you will need it for later assignments.
This assignment is much
larger than the first two, so you'll need to start early.
Getting the assignment files
To be able to work on the assignment, create a new directory for your assignment, cd into it, and then give the following Unix command:
gmake -f ~csci410/espresso/semant/Makefile getfilesThis Makefile copies the files that you will need to modify for the assignment to your directory, but also symbolically links the files you will need access to, but that you should not modify.
Also, copy your espresso.flex and espresso.y files from your PA2 directory. You should not copy other files from your PA1 or PA2 directories. The "getfiles" command, shown above, links all the other files necessary to compile testsemant, as well as testparser and testlexer.
The assignment files
The files in bold below are ones you
create and/or modify and submit. The files are:
Semantic Rules
The beginning of this document gives links to the
two documents you need to refer to for Espresso semantic rules. For
instance, we recommend you reread the sections of the Espresso
reference on Espresso Semantics and on the "main" method. As mentioned
in the Espresso Language Reference, if you are unsure of some Java
rule, try out a small test program on the Java compiler. In some
cases this will be easier than trying to find it in the Java Language
Specification. Although in other parts of this document we discuss in
detail the Java scoping rules that apply to Espresso, there are many
other semantic rules that are not enumerated here, primarily related
to type-checking the various types of statements and expressions in
the Espresso subset of Java.
As mentioned in the Espresso Language Reference, methods in Espresso may not be overloaded. This means that each class may have at most one method with a given name. In contrast, overloading is allowed in Java (and C++). Only one version of a method per class means we can identify a method uniquely by knowing its class and name -- we don't have to examine its parameter list to identify it.
Two "special" rules you will need to check are the restriction on the filename of a Java program, and the other is related to this in a static routine. Espresso doesn't have static routines in general, but main is static. Java does not allow the use of this in static functions, since there is no associated object instance.
The SymTab template class
The Symtab template class is a lookup table that maps keys
-- represented as Symbol* objects -- to values whose type is
specified by the template parameter; it can also maintain an arbitrary
number of nested scopes. When you instantiate the template parameter
with a type, it actually is saying that the values will be
pointers to objects of that type. Please see the header file,
SymTab.h, for complete documentation
of this class, and the testSymTab.cc file for a small
example of its use. We also went over this class in lecture.
In the next section, you will see that you will need multiple symbol tables, some of them needing a different value type (i.e., template instantiation) than others. This class is general enough to use in all the cases mentioned.
Scoping Rules and using the symbol table
The section on semantic analysis passes
described the part of the scoping rules related to whether something
needs to be declared before it is used. We'll summarize some of the
other relevant Java scoping rules here:
Because of the scoping rules, you'll need to save the classSymTab, fieldSymTab (per class), and methodSymTab (per class) in the appropriate tree node for later use in type-checking (i.e., these get created in passes (1) and (2) and used in pass (3) -- see previous section on Major passes for information on what these passes are). You should also save the varSymTab variable environment for each method (i.e., don't throw it out after you type-check a method, but save it in the method node). The reason we need to save them is we will be using these symbol tables again when we do code generation (PA4) -- if we don't save them, the code generation pass will have to rebuild them.
In the later section on Type-Checking, the type environment, which we pass down to the type checking routines, refers to all the names in scope in a portion of the program. For a statement, for example, the names visible include the classes, all the methods of the classes, and all the variables that are in scope. This can be represented as two symbol tables: (1) the classSymTab and (2) the varSymTab.
This type environment gets passed around a lot, but where does it get used? The class table gets used any time we encounter a type name, to make sure that type is defined, including every time we process a declaration, and in new expressions. The var table is used for type-checking variable references and assignment expressions. We also need the type environment to type-check method calls. Recall that a method is identified by its class and name. To look up the method for a class, first you look up the class in the classSymTab, and part of its value will be its methodSymTab, from which you can look up the method you are interested in.
Please see the section on Output of the Semantic Analyzer
for more on printing the contents of the symbol tables you create.
The VarInfo classes
We created a set of classes for you, defined in semant.h, to use as the types for values
associated with symbols in your variable symbol tables. For other
symbol tables, you will probably be using some AST node type for the
value type. However, for variables, there is no one AST node that is used
for all the declarations. Remember, when you refer to a variable, it
might have been declared as a local, a formal, a field, or it might be
this, which has no declaration in the tree. That means
there are four kinds of variables. The inheritance hierarchy has a
base class called VarInfo and four subclasses, one for each
of the kinds of variables mentioned: LocalInfo,
FormalInfo, FieldInfo, ThisInfo,
Since its Espresso type is the only piece of information we need to know about a variable during type-checking, we could have used Symbol as the value type to associate with any of the kinds of variables in the symbol table. However, this will require us to rebuild these symbol tables during code generation, since during code generation we need to store additional information about the variable, namely its location; and we can't modify the Symbol class. (Note: the example program testSymTab.cc uses Symbol as the value type.)
The reason why there is not just one type for VarInfo, but an inheritance hierarchy, is that the way you generate code for assignments to and references to the four different kinds of variables is slightly different, because they will be found in different places in memory. Thus, this hierarchy will allow us to use polymorphic function(s) to do this task. You don't have to understand how to do this now, but by using these classes now, you will save yourself some work in PA4.
The idea of using these classes is you want to be able to have entries of any of the different kinds of variable in the same variable symbol table. But, when you create an single entry it needs to be of the particular kind you are declaring. For example, if you are adding a local variable declaration to a symbol table, you will create the value by doing
new LocalInfo(type)
You may be wondering why we aren't just using the AST nodes related to the variable declarations (e.g., Field, Local, etc.) for this. It's because they aren't related by inheritance in the AST class hierarchy, so we wouldn't be able to stick instances of them in the same symbol table in a type-safe way.
The only place this is used in the AST is as a
VarRef. To type-check such nodes with it we need to know
what type this is. One way to handle the issue is to add a
special symbol table entry for this in our variable symbol
table. By "special", we mean that it won't correspond to any
declaration in the AST. Since this is in scope and has the
same type for any method of a particular class, you could add it to
your symbol table for fields for the class (even though it's not
really a field). Note: the VarInfo inheritance hierarchy
described in the previous section has a special subtype
ThisInfo for for entries for this.
The general form of a typeCheck routine on an expression is:
Dealing with this
Type-checking
Type-checking is done as a traversal of the AST. Usually each
statement and expression has its own rule for doing type-checking.
This will be implemented as polymorphic typeCheck functions
for the various node types. That is, most AST node types will have
their own typeCheck method to typeCheck that
particular construct of Espresso. They won't all necessarily have the
same interface, however. But all Stmts, for example, will
need to have the same polymorphic typeCheck interface,
and similar with Exprs.
Symbol *myexpr::typeCheck(theTypeEnvironment) // this is just pseudo-code
{
// type-check subexpressions
Symbol *type1 = subtree1->typeCheck(theTypeEnvironment);
Symbol *type2 = subtree2->typeCheck(theTypeEnvironment);
do type check rule for this kind of expression, which probably
involves looking at types of subtrees (type1, and type2) to figure out type
of this instance (call it mytype). this part will print out error messages
if the type rules are not followed.
setType(mytype); // set type field in node
return mytype;
}
So we pass the type environment (see Scoping
rules) down the AST, but compute the inferred types in a
bottom-up manner (i.e., typecheck subtrees of node before
typechecking node). We can see the advantage of polymorphism here,
because we never have to check what kind of expression is
actually in the subtrees, because it will automatically use the
correct version of typeCheck based on the type of the
object that the pointer is pointing to.
You need to use the type field we have already defined for you to decorate the AST with types as you infer them. This is accessed via the getType and setType methods in the Expr class (i.e., all expression nodes inherit this field). The AST dump routines already print this field for us, so we will be able to see the results of type-checking via the existing call that dumps the whole tree (in testsemant.cc). Note: If we have an invariant that all typeCheck routines for Expr subclasses set the type field, then we don't have to actually return the type too, because the caller can find it in the subtrees.
Type-checking for non-Expr constructs will still involve making the calls on the subtrees, passing the correct environment down. For statements, we still need to check that the types of the subparts are correct although statements themselves have no type. For methods (i.e., definitions, not calls) the method-scope variables will be added to the type environment before calling typeCheck on the statements and return expression in that method.
The dump routines declared and implemented ast.h and astDump.cc, respectively, will be useful for
you to study as a model for how to do a polymorphic tree traversal.
One aspect you can look at is how how common code between subclasses
is put into superclasses to avoid repetition, and how that code gets
called. The diagrams of the AST class hierarchy given in the AST
document may also be helpful when you want a big picture of the
classes involved in type-checking and the relationships between them
The diagram does not show the part-subpart relationships between
objects in an AST; but the section describing each of the the
constructors does.
Error reporting and recovery
Error output should be sent to the stream cerr. Unlike in
the parser, here it will be easy to give informative error messages,
because you will report the error at the point that you detect it. If
you are unsure of how to describe an error, or whether something
should even be an error, for things that would also be errors in a Java
compiler you can try out a sample Espresso program in the Java compiler. You
are not required to give the same messages as a Java compiler, and in
some instances your message might be very different than Java's,
because it might arise from restrictions in the Espresso grammar. For
example, the expression
this=xwould produce a parse error from the Espresso compiler (since we treat this as a keyword), whereas the Java compiler would generate a message about not being able to use this on the left-hand side of an assignment.
We have provided you with a convenient class that has a routine to help format error messages (it will print the line number where the error occurred). See SError.h for details of how to use it. Your error messages should include any relevant identifying information, such as the identifier involved. Here is an example error message:
line 9: error: type 'Blob' for field 'y' is undefined.
Unlike in the parser, here you should be able to recover from most errors and continue processing. Of course, this can sometimes lead to many cascading errors, since you are making some kind of assumption about what was intended. You will have to decide how you want to do this for the various major categories of errors. For example, if something is doubly-defined, you can just ignore the second instance of it, and use the first one. If it is a class that is defined twice, you could still process the innards of the second class, but any other reference to that class will still mean the first one defined. If it's something like an invalid operand to a + expression, you will generate an error message, but you should just set the type of the plus expression to int, so higher parts of the tree will consider it legal.
Recovering from reference to an undefined type, you can again use int as a substitute, although as you may guess, this is not a panacea, since int, unlike all user-defined types, isn't even a reference type (i.e., you aren't allowed to call methods on int).
The general idea of error recovery is to put things in such a way that the program won't crash if you continue (e.g., always assign a valid type value in the AST while type-checking), and such that you avoid generating spurious errors. The second part of that is much more difficult to do; we will mainly be concerned here that you do the first. You are going to be documenting your your error recovery methods in the README.
Output of the Semantic Analyzer
Besides any error output, described above, your program must print
the AST for the program and, when requested by the user, the
symbol tables as well.
We have already provided the call to dump the AST, but you need to
write the calls to dump the symbol tables. This output, which will
only be generated when the debug flag is set, should be sent to the
stream cout. (As mentioned previously, the error output
should be sent to the stream cerr.) This way, if error
output interleaves with other output, it will be easy to separate them
by sending the streams two different places.
The output of the symbol tables is controlled by an optional debug option to testsemant. Running it as
testsemant -d Test.espsets the global flag semantDebug. You will only generate your symbol table output when this flag is set. The symbol table output will appear before the AST output.
Note: debug output is to help you debug! Don't wait until you have completed your program to add the code to print out the symbol tables. This output will be helpful to verify that subsets work as you are incrementally developing the program. More on this in the next section.
For each symbol table you print, it should begin with a header line describing what it is a symbol table for, for example:
Methods of class Foo:Then it should use the provided dump or dumpKeys functions of Symtab to do the actual printing. Please see SymTab.h for more about using these functions. These functions print the key (i.e., name) for each entry; in addition, for each of the kinds of symbol tables you must print the following information about the value of each entry:
You can either print the symbol tables as you create them, or else, after you are done with the rest of a pass, make another pass over the tree to print the symbol tables created in that pass. The advantage to the latter is it helps you check that you actually saved the symbol tables (as opposed to putting them in a local var that disappears when you go out of scope for this pass). Either way all the symbol table printing needs to happen before the return from Semant::analyzeProg.
The main program we gave you calls dump on the AST after it returns from the semantic analyzer module. This will print out the type you added to each expression node during the type-checking pass (the part that came out as no_type during PA2).
Development strategy
Once again, we strongly encourage you to use incremental development.
It's especially important here, because there is a lot more code to
write than in the previous assignments in this class. In the grading,
as usual, we will favor a working subset over a large mess that
doesn't compile or doesn't work. Fortunately this program (unlike
some) lends itself to building testable subsets.
Some natural subsets here are each of the different major passes mentioned near the beginning of the document. When apportioning your time, do not think of these as three equal pieces: the first pass is a small amount of code, the second pass is a little larger, and the third pass is a lot more code than the first two. After you develop each of the first two passes you can check your results by dumping the appropriate symbol tables: this output is required for the assignment anyway.
The type-checking pass (third pass) itself can be broken into separately testable subsets. Adding the formal/local variables for each method to symbol tables can be the first subset. You can test this subset by printing the resulting symbol tables. And, of course, you can incrementally build the type-checking routines for different constructs without doing all of them at once, testing the current subset by running it with an Espresso program that only uses the constructs you have implemented.
A few other hints for successful development: