Interpreter-based Projects for a

Traditional Programming Languages Course

 

John Barr and L. A. Smith King

Deptartment of Computer Science, Ithaca College

 

Introduction

The Programming Languages course is one of a handful of seminal courses in the undergraduate curriculum that transforms students into computer scientists. Hopefully the course starts the student down the path from just writing programs to thinking about programming. Students are encouraged to understand how the elements of programming languages can be used to create powerful abstractions and how these very same elements simultaneously constrain programming styles. After this course, students should no longer approach a problem as a program to be written, but as a solution to built from multiple abstractions.

Most programming languages courses convey the essential elements of languages by comparing and contrasting the semantic and syntactic features of a number of distinct programming languages. This comparative approach broadens the students' experience with the spectrum of language choices and reveals the common concepts that underlie most languages. This approach is also on firm philosophical foundations, following Aristotle and Locke both of whom argued that specific experience is required in order to recognize general principles.

Language features, however, cannot be explored in isolation. In particular, every construct provided in a language raises implementation considerations. This concern is characteristic of Computer Science as a discipline in which theoretical and empirical science cohabitate. Thus programming language design (theory) drives implementation, just as experience and computer architectures (empirical evidence) drive language design.

Accordingly, the programming languages course presents a pedagogical challenge: how best to expose a diversity of languages in a unified way while fulfilling implementation considerations. The comparative approach exposes the student to language diversity but often the relationship between implementation and language features is not satisfactorily explained.

An attractive approach is to take an interpreter-based perspective. An interpreter acts as a direct, complete and unambiguous explanation of the run-time semantics of the language. Secondly, building and examining an interpreter gives direct, concrete experience with the implemented language. Furthermore, examining interpreters for several languages exhibits the common principles shared between languages. And finally, the interpreter-based approach exposes implementation concerns in a natural way.

A difficulty with the interpreter-based approach is the danger of becoming bogged down in the minutiae of the implementation, and thus miss the larger issues. The languages and interpreters must illustrate essential programming features but retain the simplicity needed to ease implementation. Interpreters for most actual languages are generally too complicated for use in the classroom. As a consequence, interpreter-based programming language courses often implement syntactically and semantically simplified subsets of actual languages. The challenge is to find a suitable language interpreter and access to the source code for modifications/extensions.

In this paper, we consider interpreter-based programming languages projects using SOOP, a simple, object oriented programming language that was originally designed for teaching programming concepts in Computer Literacy. Although only a few very basic constructs are provided in SOOP, it is expressive enough to solve a wide variety of problems and has some novel features. And of course, one has complete access to the interpreter source code. The motivation is to have a simple language worth studying combined with an element of fun to hold students' interest.

We envision a interpreter-based project using SOOP as most applicable within a traditional programming languages course. In this context, students undertake the comparative study of a number of programming languages, augmented with an interpreter-based project using SOOP. The project has two goals: to reinforce an understanding of some of the language features presented in the course and to give students experience relating language syntax and semantics with an implementation. The first goal is accomplished by the interpreter study, the second by modifying the interpreter. This approach offers the insights associated with an interpreter-based course without sacrificing the robust understanding of diverse languages found in a traditional programming languages course.

In the remainder of this paper, we outline the SOOP interpreter, present a complete programming languages project, and sketch some other project possibilities. The source code for SOOP interpreter is available from either author.

SOOP Description

SOOP and its interpreter have several novel aspects which make it attractive for interpreter-based projects. First, the interpreted language, SOOP, is interesting. SOOP has object-oriented features such as objects, messages, and polymorphism but lacks inheritance. To support data structures, SOOP defines objects for variables, records, lists, stacks, and queues. SOOP also includes iterative and selector (if-statement) objects which control execution. Additionally, SOOP also has a visual dimension with each object graphically represented as a window.

Second, the SOOP interpreter is written in a functional language, Scheme. Scheme's excellent abstraction facilities enable the creation a substantial interpreter that is still compact and comprehensible (with a reasonable amount of effort). This provides an experimental environment in which students have to master a variety of programming language concepts to successfully complete projects, but the actual amount of code involved is small. Furthermore, Scheme's support of first-class procedures simplified the implementation of SOOP objects. To understand the interpreter, students must combine techniques from the functional and object-oriented paradigms and see connections between the two.

The interpreter binds objects to names and handles message passing to the objects. The language allows only two types of actions, binding a name to an object and passing a message to an object. Objects are implemented as functions by taking advantage of the functional notion of first-class procedures (e.g. procedures can be assigned to variables and procedures can be the result returned from a function).

For example, the function representing a stack object includes a local data structure implementing the stack, and local procedures like push and pop. Arguments to the stack function are "message names," so that, for example, when the function is given a pop argument, the function evaluates the appropriate local procedure to change the data structure. Thus, the encapsulation of state (objects) is implemented through the encapsulation of control (functional abstraction).

Every construct in SOOP is an object and every object is represented on the screen as a window. When an object is instantiated, its corresponding window appears on the screen. The user interacts with the SOOP system via a transcript window where SOOP statements are typed. This arrangement is similar to the interface for Smalltalk, and Scheme. For example, a variable (called a box in SOOP) is created and assigned the name x by the call (define x (make-box)). When this call is made in the transcript, the window in Figure X1 appears on the screen.

Figure X1. Sample Screen

As in any object oriented system, objects are manipulated by sending messages. The syntax for passing a message to an object is (obj message args) where obj is the name of the object, message is a message, and args is a list consisting of zero or more arguments. To give the variable x defined above a value, for example, we would have to pass the message set-val and an argument, e.g. 5. The message would take the form (x 'set-val 5) and the value 5 would appear in the box.

There are numerous constructs available in SOOP. A list of the available classes and their constructor procedures are given in Figure X2. A selector (similar to a cond statement in Scheme) and a iterator (a simple while loop) are both included as are other common data structures such as unordered lists, stacks and queues.

Class Creator Procedure

box (variable) make-box

selector (if) make-select

iterator (while) make-loop

lists make-list

ordered lists (stacks, queues) make-stack, make-queue

Figure X2. Classes available in SOOP

For the purposes of this paper, only box (or variable) objects and the SOOP interpreter are presented.

SOOP Interpreter Architecture

The architecture for this software system consists of three parts, a base interpreter, an object system and an expression interpreter. The base interpreter is the Scheme interpreter itself. It is used to recognize simple assignment statements of the form var = (make-obj) where var is a variable name and make-obj creates an object of class obj. The base interpreter also recognizes message-passing statements of the form (obj message args) where obj is an object, message is a legal message, and args are zero or more arguments. All other expressions are handled through messages passed to the objects.

The object system consists of a number of class definitions. Objects are implemented as functions. Methods are modeled through local procedures and instance variables are merely local variables. As an example, the "make-box" procedure is given in Figure X3. This procedure receives no arguments and returns a function that represents a box object. In this way, messages to a box object are interpreted as arguments to the function that represents the box object. The msg is an argument of a varying number of arguments, i.e. all of the arguments passed to the procedure are bundled into a list of arbitrary length.

The procedure returned by make-box consists of the procedure body, its locally defined variables and procedures and the environment in which it was defined. A Scheme letrec expression is used to maintain instance variables as a local part of the environment associated with a box object. In the case of a box object, these variables are a value named "box", a window named "win" and the name of the window, called "box-name". These variables are local to the box object and can be modified by the box instantiation.

The box object understands a variety of messages shown in Figure X4. The value contained in the box and the title of the window can be modified using the set-name and set-val messages respectively. Additionally, simple arithmetic expressions (addition, subtraction, multiplication and division) can be passed as messages as can a few boolean expressions (< and >). Only simple expressions involving a single operator and two operands are understood. Extensions to multiple operators/operands are used for projects. These are parsed by examining the first item in the parameter "msg".

(define make-box

(lambda () (letrec ( (box 0) (box-name '()) (win (make-text-window)) ) (lambda msg (case (car msg) ((type) "box") ((empty?) (eq? box '())) ((set-name) (begin (window-select win) (set! box-name (cadr msg)) (if (string? (cadr msg)) (window-set-title (cadr msg)) (window-set-title (symbol->string (cadr msg)))) (window-select transcript))) ((set-val) (if (null? (cdr msg)) (display "Need to send value also") (begin (if (pair? (cadr msg)) (set! box (eval-pair (cadr msg))) (set! box (cadr msg))) (window-select win) (text-clean win) (display box win) (window-select transcript)))) ((get) (if (null? box) (display "box is empty" win) box)) ((read) (begin (display "enter value: " win) (set! box (read)))) ((print) (display box win)) ((+) (+ box (find-val (cadr msg)))) ((-) (- box (find-val (cadr msg)))) ((*) (* box (find-val (cadr msg)))) ((/) (/ box (find-val (cadr msg)))) ((>) (if (eq? (cadr msg) '=) (>= box (find-val (caddr msg))) (> box (find-val (cadr msg))))) ((<) (if (eq? (cadr msg) '=) (<= box (find-val (caddr msg))) (if (eq? (cadr msg) '>) (<> box (find-val (caddr msg))) (< box (find-val (cadr msg)))))) (else (begin (display box-name) (display " Invalid message:") (display msg) (newline)))) ))))

Figure X3. The box class

empty?
set-name
set-val
get
read
print
+
-
*
/
>
<

Figure X4. Box messages

This implementation mechanism makes inheritance difficult, but has the advantage of demonstrating the relationship between state and objects. To understand how objects are created, students must understand that an object is similar to a persistent procedure that captures state locally. A direct relationship is made between the concept of classes and objects and the implementation mechanism.

The last part of the system architecture is an expression interpreter. This interpreter is used within the "loop" and "selector" objects to parse and execute the expressions in their bodies. Arithmetic messages that are sent directly to the objects are interpreted by the objects themselves. SOOP provides only the outline of the interpreter, details are left to students as a project.

Project

Last semester, students were asked to implement a recursive-descent parser for arithmetic expressions in order to extend SOOP's expression interpreter. This is a standard programming languages project, but has the benefit of requiring the student to understand the underlying object system and the command interpreter. Concepts from several different paradigms must be mastered in addition to the details of the interpreter. The project embodies a number of educational goals, namely:

1) To give experience with language implementation issues and practical experience modifying a language interpreter.

2) To present an object-oriented programming language implementation which can be compared and contrasted with other object-oriented language implementations.

3) To offer programming experience with a functional programming language, Scheme.

4) To present the role of parsing in language implementations.

To allow the students to successfully complete their project, several lectures were devoted to presenting background material concerning SOOP and its implementation in Scheme. The lectures emphasized general programming language implementations issues while presenting SOOP specific implementation details. The lectures were often accompanied by interactive demonstrations of SOOP, including tracing execution-path for SOOP functions in the Scheme code. These demonstrations were made possible using a laptop computer attached to flat display screen suitable for use with an overhead projector. In this way, the students were given an overview of SOOP and its implementation. In addition, some lecture time was devoted to recursive-descent parsing and some recursive-descent parsing examples were developed in Scheme. Consequently, when asked to extend SOOP's expression evaluator, the students were prepared to code a Scheme function that was a parser for arithmetic expressions that included SOOP objects, and to find the right place to substitute their function in SOOP's implementation.

A grammar was used to express the class of arithmetic expressions to be parsed. The grammar used for the Project is given below:

E ::= T + E | T - E | T

T ::= F * T | F / F | F

F ::= number | object_id | < E >

The grammar is quite ordinary, with the exception that an "object_id" can appear in the expression. In their implementation, students must interact with the existing SOOP implementation to find the object represented by object id and use object methods to get the "value" of the object to be used to compute the arithmetic expression. A completed recursive-descent parser written in Scheme appears in Appendix A.

Other Project Possibilities

Extending SOOP's expression evaluator offered an opportunity to experiment with parsing within a working interpreter. This is a more traditional programming languages project, which emphasized the interpreter over other parts of SOOP. However there are many other project possibilities within this multi-paradigm system. For example, another project would be to add more objects to the system. For example, after studying a stack object, students could add the queue or list objects to SOOP. These projects would not require the student to understand much about the interpreter's parser, since a new object would require a change of only a few lines of code, but would require a more sophisticated understanding of objects, their creation, and the relationship between first class procedures and objects.

A second project would be to add sequential execution to allow the SOOP user to send messages to several objects in succession without entering each individually and waiting for the results. This introduces a syntactic notion of block, for example indicated by the begin keyword, followed by several message dispatches, followed by the end keyword. An extension of this project would be to allow multiple nesting of blocks with static scoping. Thus, objects could be declared within different scopes and would be visible only within the proper static scope. This would reinforce various scoping strategies through their implementation.

A third project would be to introduce some notion of type into SOOP. This would requiring adding some mechanism for type declaration and then modifying the SOOP interpreter to do type-checking on operations. This project would provide a vehicle for comparing typed and non-typed languages, and the concept of a dynamically typed language.

Finally, students could be required to implement a more substantial project which would require making changes that effect all three parts of the system. For example, students could implement a nested if object. The student would have to first implement the object and secondly, ensure that the command interpreter successfully decodes messages for the object. Lastly, the expression evaluator would have to be expanded to recognize the if-else expressions that a nested-if would contain in its body. To complete the project, the student would have to master a variety of programming language concepts , even though the actual amount of code that she would have to write is small.

Results

Our experience with extending SOOP's expression evaluator as a project was quite positive. The students enjoyed the graphical representation of objects as windows and grasped the connection between the object abstraction and the Scheme code used to implement objects. In particular, the implementation emphasized the dynamic binding of message names with object methods, which is usually a difficult concept to convey. The use of a functional language to implement objects was a good vehicle for discussing the functional concept of first class procedures and object-oriented programming.

Secondly, most students seemed comfortable with recursive-descent parsing and its implementation in Scheme. In fact, we actually presented recursive-descent parsers in both Pascal and Scheme, and a number of students preferred the Scheme implementation.

Finally, using a laptop computer and overhead projection to present the Scheme interpreter interactively during lecture was indispensable. As a result, the students had little difficulty finding the correct place to modify the interpreter, and were able to focus their efforts on writing the expression parser.

However, we discovered that the current SOOP implementation has some idiosyncrasies which hinder its use for interpreter-based programming languages project. This is not surprising since SOOP was originally intended for a completely different purpose, namely teaching programming concepts in Computer Literacy. Consequently, the SOOP architecture was not designed for readability nor as a model interpreter implementation. For example the implementation does not separate SOOP's expression evaluator from the underlying Scheme interpreter in a clean way. Although we were able to overcome these limitations through classroom lectures, rewriting parts of the SOOP interpreter would improve its utility for interpreter-based programming languages projects.

Nonetheless, the SOOP language and its interpreter were useful as a basis for interpreter-based programming language projects, particularly because of the visual representation of objects and functional language implementation of object-oriented language features. The SOOP implementation was simple enough for students to modify without becoming overly bogged down in SOOP-specific detail. Although we found deficiencies in SOOP's implementation, our positive experience seems to justify the effort required to rewrite portions of the implementation for future use.

The Programming Languages course should expose students to a diversity of languages, explore programming language similarities and differences, and discuss language implementation issues. In our experience, interpreter-based projects can be a useful in meeting all these educational goals, and provide a particularly natural context for considering language implementation concerns.

Appendix A.

This appendix contains the Scheme code for a recursive-descent parser for arithmetic expressions. It is included here to illustrate exactly what students are expected to code for the project described previously. Also, it might be a useful code fragment for other purposes, for example as a programming project in a Scheme course.

;;
;;  Recursive descent parser for the following grammar:
;;
;;  E ::= T + E | T - E | T
;;  T ::= F * T | F / T |F
;;  F ::= no | obj | < E >
;;
;;  The approach taken was to write a parse-function for each
;;  non-terminal.  The parse-functions are mutually recursive
;;  (e.g. parse-factor calls parse-expression.  Each parse-function
;;  returns a "variant record" containing the "value" of the non-terminal
;;  and the remainder of the expression left to be parsed.
;;
;;  I found it easiest to begin by writing the parse-function for
;;  the "bottom" non-terminal, and work my way up, but theoretically
;;  the order makes no difference.
;;

;;
;; These are the "variant record" definitions used in the parser
;;
(define-record EXPR (val remainder))
(define-record TERM (val remainder))
(define-record FACTOR (val remainder))
(define-record ERROR (err_str remainder))
;;  remainder is the tokens left t be parsed

;;
;;  Parse expression: E ::= T + E | T - E | T;
;;
;;  Parse a term.  If the next token is + or -, 
;;  parse an expression and return an expression which contains the 
;;  sum/diff of the term and expression.  Otherwise, just return an
;;  expression whose value is the same as a term
;;
(define parse-expr
  (lambda (tokens)
    (cond
      ((not (list? tokens)) (make-ERROR "illegal tokens" tokens))
      (else (variant-case (parse-term tokens)
              [TERM (val remainder)
                      ;; save the values returned in TERM over the recursive call
                 (letrec ([save_val val] [save_rem remainder])
                   
                 (cond
                   ((null? remainder) (make-EXPR val remainder))  ;; the T rule
                   
                   ((eq? (car remainder) '+)
                     (variant-case (parse-expr (cdr remainder))
                       [EXPR (val remainder) (make-EXPR (+ val save_val) remainder) ]
                       [ERROR (err_str remainder) (make-ERROR err_str remainder) ]
                     ))                                           ;; the T + E rule
                   
                   ((eq? (car remainder) '-)
                     (variant-case (parse-expr (cdr remainder))
                       [EXPR (val remainder) (make-EXPR (- save_val val) remainder) ]
                       [ERROR (err_str remainder) (make-ERROR err_str remainder) ]
                     ))                                           ;; the T - E rule
                   
                   (else (make-EXPR val remainder)))) ]
              [ERROR (err_str remainder) (make-ERROR err_str remainder) ]
            ))
     )
  )
)

;;
;;  Parse term: T ::= F * T | F / T | F;
;;
;;  Parse a factor.  If the next token is * or /, 
;;  parse a term and return a term which contains the 
;;  mult/div of the factor and term.  Otherwise, just return a
;;  term whose value is the same as parsed factor
;;
(define parse-term
  (lambda (tokens)
    (cond
      ((not (list? tokens)) (make-ERROR "illegal tokens" tokens))
      (else (variant-case (parse-factor tokens)
              [FACTOR (val remainder)
                 (letrec ([save_val val] [save_rem remainder])
                 (cond
                   ((null? remainder) (make-TERM val remainder))  ;; the F rule
                   
                   ((eq? (car remainder) '*)
                     (variant-case (parse-term (cdr remainder))
                       [TERM (val remainder) (make-TERM (* val save_val) remainder) ]
                       [ERROR (err_str remainder) (make-ERROR err_str remainder) ]
                     ))                                           ;; the F * E rule
                   
                   ((eq? (car remainder) '/)
                     (variant-case (parse-term (cdr remainder))
                       [TERM (val remainder) (make-TERM (/ save_val val) remainder) ]
                       [ERROR (err_str remainder) (make-ERROR err_str remainder) ]
                     ))                                           ;; the F / E rule
                   
                   (else (make-TERM val remainder)))) ]
              [ERROR (err_str remainder) (make-ERROR err_str remainder) ]
            ))
     )
  )
)

;;
;;  Parse factor: F ::= no | obj | < E > 
;;
;;  If first token is number, return factor containing value of number.
;;  If first token is obj, return factor containing obj value.
;;  If first token is < then
;;      strip it off, parse an expression
;;      if expression succeeds look for >
;;      if all well, return factor with value of expression otherwise error.
;;
(define parse-factor
  (lambda (tokens)
    (cond
      ((not (list? tokens)) (make-ERROR "illegal factor" tokens))
      
      ((number? (car tokens)) (make-FACTOR (car tokens) (cdr tokens)))
      
                ;; in MAKE-factor clause, get is a message to the obj to get its value
      ((not (null? (find-obj (car tokens)))) 
                              (make-FACTOR ((find-obj (car tokens)) 'get) (cdr tokens)))
      
      ((eq? (car tokens) '<)
               (variant-case (parse-expr (cdr tokens))    ;; case on what parse-expr returns
                  [EXPR (val remainder) 
                     (if (eq? (car remainder) '>)         ;; match ">"
                        (make-FACTOR val (cdr remainder))
                                                       ;; return val & rest of expr after ">"
                         (make-ERROR "missing right >" remainder)) ]
                  [ERROR (err_str remainder) (make-ERROR err_str remainder) ]
                           ;; expression not good -- propagate the error that expr made
                ))
       (else (make-ERROR "illegal factor" tokens))
     )
   )
)
                          
;;
;;  Not part of the parser.  This is an example of a wrapper 
;;  function returning a box object which contains the result of computing 
;;  an arithmetic expression as its value.
;;
(define box-expr
  (lambda (expr)
    (define abox (make-box))
    (abox 'set-val (variant-case (parse-expr expr)
                     [EXPR (val remainder) val]
                     [ERROR (err_str remainder) (make-ERROR "uh oh" remainder) ]
                     ))
    abox
  )
)