To complete this program you'll need your completed lexical analyzer, parser, and semantic analyzer. The semantic analyzer and code generator are much more closely connected in code than the previous parts were, in that they both operate on the same data structures (the AST and the symbol tables). In fact, doing this assignment may involve making small modifications to your semantic analyzer code, so when you submit, you will be resubmitting your code for the semantic analyzer too. We will not be testing the error-checking parts of the previous phases, so if you didn't have time to get those implemented, you do not need to do so now. Put another way, we will only test your Espresso compiler with correct Espresso programs, since they are the only ones for which code gets generated.
In the last assignment we were concerned with static semantics, i.e., the parts of the semantic rules that can be checked at compile-time. In this assignment you will be concerned with operational semantics, or how an Espresso program behaves in execution. While the Espresso Language Reference describes the subset we are using, it doesn't expressly describe the run-time semantics. For anything not described in the Espresso document, Espresso follows the rules for Java, which are described in the Java Language Reference. Both of these documents are also linked from the course web page. Recall also that you can write little test Java programs if you are unsure of the behavior of some feature.
There will be a face-to-face grading meeting (i.e., demo) required for this assignment; you'll be getting more information about that as deadline approaches. Students are required to work individually on this assignment (no groups). This assignment is at least as big as PA3, so start right away.
Oh, yeah, the early bonus deadline. This 10% bonus is encouragement
for you to complete the assignment early.
Setting up your development environment
First, you will need to modify your path variable so that you can
use the spim MIPS simulator.
Put the following line at the end of your .login file:
set path=(~csci410/espresso/bin $path)To have the change take effect immediately, do the Unix command source .login.
Getting the assignment files. To be able to work on the assignment, create a new directory for your assignment, cd into it, and then give the following Unix command:
gmake -f ~csci410/espresso/codegen/Makefile getfilesThis Makefile copies the files that you will need to modify for the assignment to your directory, but also symbolically links the files you will need access to, but that you should not modify.
You will need to copy the following files from your PA3 directory to your PA4 directory:
The assignment files
The files in bold below are ones you
create and/or modify and submit. The files are:
The "rest" includes assigning offsets for formals and locals, but this can be combined with the rest of code generation in a single pass, since the declarations come before any uses.
Entry point for the code generator
The main program in espc.cc calls the
other parts of the compiler, then, if there were not errors in the
other parts, it opens the ".s" file that the MIPS instructions will get
written to, it creates a CodeGen object, and then calls the
member function genCode on this object. You may add code
to the CodeGen constructor as well, but the main entry
point to your code-generation code is CodeGen::genCode.
The CodeGen constructor gets three things from the driver program:
In PA3 we mentioned that you were allowed to change what the valuetype
was for entries in classSymTab. If you did that, i.e., if your
classSymTab is not type
SymTab<Class>,
you will have to make your own version
of esp.cc that passes in the correct variable for the
class table, and you will need to change the Makefile so that it
submits your version of esp.cc.
Use of the symbol tables in code generation
We will need to add additional information to our symbol table entries
to do code generation. What that code looks like depends on how you
instantiated the symbol table class for the various symbol tables.
In PA4 we are using symbol tables to get information about generating code for symbols. That means we need to store additional information to associate with a name. For example, when we generate code for the expression new Foo() we need to get information about the size of a Foo object. When we generate code for a reference to a variable (AST node VarRef), we need to know whether this variable is on the heap (fields) or on the stack (formals and locals), and what its offset is (depending on what kind of variable it is, the offset will be relative to the this object or in the stack frame). These are not the only uses of the symbol tables, just a few examples.
So what does this imply for the symbol tables we already created? In code generation we will need to traverse over all the declarations of symbols (in the AST) to add the additional information needed to their symbol table entries for generating code. When we generate code for expressions, such as the ones mentioned above, is where we have to use the information added when we processed the associated declaration.
The implementation issue that comes up is: is the value-type used in a given symbol table amenable to adding fields to store this additional information? For a class symbol table with value type Class, i.e., symbol table is type SymTab<Class>, the answer is yes; and similarly if the value type for a method table is type Method. We can just add any fields or methods needed for code generation to the Class or Method AST classes.
If you used VarInfo and associated subclass objects for entries in your variable symbol tables, then this will not be a problem (more on this in the next section). However, if you used SymTab<Symbol>, you will not be able to add additional C++ fields for code generation, because the Symbol represents the name of a type: it's not an object specific to one symbol table entry (e.g., all variable table entries with type Foo use the same Symbol object for Foo).
If your variable table values are Symbols, we recommend you just leave the semantic analysis module as is, and just build new variable symbol tables for use in code generation. The VarInfo class hierarchy was discussed in more detail in a section on it in the PA3 assignment description, and in comments in semant.h, where the classes are defined.
Using the VarInfo hierarchy in code generation
As we explained in PA3, the reason we have different classes for the
symbol table entries for different kinds of variables is that the way
you generate code for assignments to and references to the
four different kinds of variables may be slightly different, because they
will be found in different places in memory. Thus, this hierarchy
will allow us to use polymorphic function(s) to do this task; e.g.,
you could have polymorphic genVarRef and
genVarUpdate function which behaves differently for each of
the four subclasses. Just like with any such hierarchy, if there is
code which is common to all or some of the subclasses, you can push it
up to the superclass. One style of doing this is to make a
protected superclass function to do the common part; this
function may be called by the virtual function whose implementation is
class-specific: defaultDump in the VarRef
hierarchy is an example of this.
MIPS and the spim MIPS Simulator
Your code generator is going to generate instructions for the MIPS
machine. We are going to run that generated code under the
MIPS simulator called spim.
For information on the MIPS architecture and instruction set, please refer to the documentation on spim that is accessible in the Documentation section of the web page. This documentation also explains how to use the spim and xspim tools.
In an earlier section we described how to set up your environment so you can run spim. There are actually two versions of this tool, a command-line version called spim, and a GUI version called xspim. The first version is probably easier for just running a MIPS program generated by your compiler, but the second one might be easier for debugging a MIPS program generated by your compiler (although both have the same basic debugging commands available).
The program testemit.cc generates a small MIPS program, which you can run on the simulator to try it out. The next section spells out how to compile and run testemit.
Emit functions and NameGenerator
The files emit.h and emit.cc define a few useful functions
for formatting MIPS instructions. A few of these functions don't
actually emit code to a file, but instead generate a string
that can be passed to an emit function. See testemit.cc for an example of how
to use given the functions together. You can also compile and run it
to see the code it generates. You can then run the MIPS program it
generates under spim. Here is a sample command sequence to
do this. (Note that, unlike your Espresso compiler,
testemit generates its code to cout.)
gmake testemit testemit > testemit.s spim testemit.s
Feel free to define similar functions and constants (there are some of the latter defined in testemit.cc) to help simplify your code generation code. E.g., you could write a function for generating the code to push something on the stack. You should put these definitions in your espc.cc file.
The NameGenerator class is the other thing defined in emit.h. It is a class for generating unique names for labels when you generate code. When you generate code for control structures, you will need labels for locations that your code jumps to. If, for example, there are 10 loops in the generated code they can't all use the same label names, so this class gives you a new unique label name every time you call gen(), e.g., the first three calls to gen() on a NameGenerator return the strings "label0", "label1", "label2". We created a NameGenerator object for you: the global variable, lgen is defined for you near the top of codegen.cc.
For more details on the tools provided in the emit package, see the
comments in emit.h
The Run-Time System
The run-time system consists of some hand-coded MIPS code that is
automatically loaded when you run spim (or xspim). We will document
the interface to that code here, but if you want to look at this code
it is in ~csci410/espresso/lib/exceptions.s
The code you generate will have to conform to the interface of the run-time system, which we will describe here. The run-time system has these main parts: startup code, which invokes your main method; routines for doing I/O and for allocating dynamic data; one routine for a runtime error you need to catch; and default exception handling code for MIPS. The rest of this section describes what the run-time system assumes about the generated code, and what the run-time system provides for the generated code.
Your code generator must define the global label main.
As discussed in lecture, you will have to design an activation record layout and a call and return sequence for methods, since all methods have to be in agreement as to how this proceeds. For function calls and other expressions you will need to be concerned with Java requirements for the order of evaluation of subexpressions: the rules are generally that things must be evaluated from left to right.
Since every method call has the one parameter that appears before the dot (i.e., the object we're calling the method on), you may want to pass it in a register, such as $a0, rather than putting it in the activation record. That value corresponds to this in the called function. You may also want to keep the current value of this in a register during method execution, because any reference to a field is going to involve its value, since it holds the starting address of the object containing the field. This is an example of a long-lasting value that would be preserved across function calls, so you would want to use a callee-saved register, such as $s0. What that means is that the callee saves the old value of the register (here, the "old $s0", that is, the caller's this pointer), before writing the new value of the register (the current this pointer), and then restores the old value before returning. Note that storing this in a register implies the way you generate code for references to this will be different from how you generate code for uses of other variables.
Naming conventions
We recommend you use a logical naming scheme for labels for
methods so that it is easy to reference them. For method
<method> in class <class> , you should
use the name
<class>.<method>One exception to this is the previously mentioned main method, which has to have that exact name, since the run-time system is going to call it.
You won't be able to use fixed names for labels related to control structures. Handling this is discussed in more detail in the section on the name generator.
Your generated code will not need any static data, so you won't need any names for addresses in the data segment.
In Java (and Espresso) when two reference type values (i.e., anything that's not int or boolean) are compared for equality (== or !=), a pointer compare is done. Put another way, we are comparing if the two operands refer to the same object (same identity), not whether the two objects have the same value. This has implications for the way we create objects. In the case of creating instances of a class that has no fields, it means we still need to create an object on the heap for it, so it has a different identity other instances of the same class. You can achieve this by allocating a (dummy) word for those objects. Alternatively you can have an extra header field at the top off all of your objects; you could use it to store the size of the object, although the generated code for Espresso won't ever need to use the size information (but it could be useful if we were recovering unused memory via garbage collection).
Also, don't forget that && is a conditional and (i.e., a
control structure). See Java manual for details.
Running the Compiler
Once you compile the compiler, you can run it with a command such as,
espc Test.espwhich will produce the file Test.s. You can then run the resulting MIPS program with the command
spim Test.sIn addition you can run the compiler in two kinds of debug modes, controlled with the following compiler options:
Development strategy
You can develop the code for the first pass (see section on passes) as a testable subset, where you check your
results by dumping the symbol tables, including additional information
you add to them for code generation (with printing controlled by
codegenDebug flag described above). Note: the
VarInfo class hierarchy has dump functions
defined for each class that you can modify to print additional
information.
But to get a subset that generates code that will run in spim you will need to do somewhat more. In particular, you'll need to design your activation record and call and return sequence before you'll be able to generate much testable MIPS code. This is because you need the activation record to access local variables or parameters.
Once you design your activation record, you can build subsets that dump more symbol tables as you modify them, or that generate code for part of the call and return sequence (i.e., code that we probably can't run in spim yet).
Once you get calls and returns working, you can build code generation for the various language features incrementally.
If you do not complete the project, a desirable working subset is a working compiler for a subset of Espresso (i.e., generates code that runs). In contrast, generating correct instructions for individual features is not so great if the instructions aren't put together correctly to make a MIPS program that is equivalent to the given Espresso program (e.g., if you generate the correct code to access a parameter, but your code to set up the parameters doesn't work).
What to turn in
The Makefile includes a rule for
submitting the assignment. You run it by typing gmake
submit. It copies your ast.h,
semant.h, semant.cc, codegen.h,
codegen.cc, README, and
Test.esp to the course account
using the local submit utility. The last version submitted
is the one that will be graded: each submit overwrites the previous
one.