CS2 data structures

Welcome to my source material for a one semester undergraduate course about data structures. This course roughly corresponds to the old ACM CS2 curriculum. The programming language is C++ although the examples were originally derived from Java. Therefore, it shouldn’t be too difficult to translate the examples back to Java, again! The course covers the most standard data structures and a few basic topics in complexity, compiling and software engineering. As such, it is a good fit with the new ACM 2013 curriculum which does not rigidly assign topics to specific courses.

This page has links to three kinds of course material:

  1. General information about the course including grading policy,
  2. Detailed lecture notes, and
  3. Projects.

Teachers and students alike are welcome to browse and borrow.

General information

The course design is driven by several beliefs and principles.

  • The course must prepare students for advanced topics in computer science.
  • Students need to learn to think and work like professional software engineers.
  • We use what we build. Projects are not throwaway. Students need to experience what it’s like to reuse their code for better and for worse!
  • Students must learn to build and test to an interface specification.
  • Projects and problems are drawn from system programming (compilers, operating systems, etc.) Projects and problems must be relevant, not make work.
  • Performance (elapsed time and complexity) is an important design issue.

One semester is a short period of time and data structures is a broad area with many subtopics. Plus, early courses need to build a basis for future work in computing, such as software engineering. Rather than cover all varieties of trees and so forth, I chose to cover a set of high priority data structures: array, C-style string, C/C++ struct/class, linked list, tree and hash table. A good system programmer should be competent with all of these data structures.

Another temporal issue is the amount of time available for projects. Seven or eight non-trivial projects make for a very busy semester! The first few projects are standalone. Other projects build on one another. Code from earlier projects is reused in later projects and students have a much higher stake in code quality, reusability and meeting schedule. An instructor must be prepared to handle the inevitable train wrecks and disasters wrought through procrastination and carelessness!

The course was offered twice (Fall and Spring) at Tufts University. Fortunately, I had excellent students who could (usually) handle the work load. Student background and depth varied as is the case in most introductory CS courses. Students were generally first year and sophomore level. Most students knew the rudiments of either C++ or Java. Some students had completed the introductory CS1 course; some students entered with Advanced Placement credit. Most were computer science majors and the remainder were computing engineering students from the electrical engineering department. The EE department requested C++ and the computer science department obliged.

If teaching the course again, I would strongly consider a short section on either the Standard Template Library (STL) or Boost. This would complement the current material. Most contemporary software is constructed with the aid of class libraries. The existing course shows students how the most common data structures are implemented and let’s them assess the space/time complexity of the implementations. Without this look “under the hood,” students do not appreciate the performance implications of STL or Boost classes/objects.

I asked students to fill out a course planning survey on the first day of class. The survey helped me to learn about my students: their CS background and goals. It always helps to know your students!

The course overview page is an introduction to the course, including values and ethics. The computer lab information page is specific to Tufts and is out-of-date. (Please excuse any broken links!)

Lecture notes

This course was offered twice with roughly the same topical structure each semester. The lectures occasionally diverged because projects were somewhat different for each semester. (Please see the section below on “Projects” for more detail.)

The following comments and tables are the course roadmap. The comments provide a little bit more of the rationale behind the course design. The links lead to lecture and supplementary notes. The notes are raw ASCII text or HTML.

The concept and application of abstraction is essentially to modern computer science and problem solving. The first lectures cover the software development process and the role of abstraction and encapsulation. I assume that CS2 students are ready to commit to computer science as a field and career. Thus, the course provides essential, basic grounding in software engineering as well as a start in algorithms and complexity.

Week #1
Topics Course overview
Software development process
Abstraction and encapsulation
Notes Data structures
Deconstructing iTunes
Functional decomposition

The second week continues with abstraction and introduces the notion of an abstract datatype. I assume that students are familiar with mathematical sets and use the bit set as the first example. Students are also learning Linux and the mechanics of the text editor, the compiler and so forth. They must also learn the rudiments of C++, building on their experience with either C++ (from the preceding CS1 course) or Java (Advanced Placement).

Week #2
Topics Abstract datatypes (ADT)
Mathematical sets
Set abstraction and implementation
Notes Review of set theory
Tour of the BitSet skeleton
Testing BitSet
C++ standard I/O streams
Hexadecimal and bit twiddling
GNU Emacs reference card
Compiling C++ programs
Project #1 Mathematical set ADT

The third week introduces basic testing terminology and techniques. Students need to learn about and respect the need for thorough testing as early as possible. Students must exercise their projects during development. C/C++ bit twiddling is presented in order to help students with the bit set assignment.

Week #3
Topics Testing
Manipulating bit fields
Notes Bit field functions
More information about BitSet
Typedef, enum, struct, union
Debugging
Project #2 Robot parade (scene drawing)

The fourth week continues with abstraction and introduces classes, objects and the relationships between classes/objects. The concepts are illustrated through scene graphs from computer graphics. Students render simple scene graphs in PostScript. They must organize and code their scene graphs in C++ classes/objects. Nested geometric transformations sneak in the concept of stack.

Week #4
Topics Classes and objects
Relationships between objects
Using make to build modular programs
PostScript file ADT
Scene graphs and coordinate transformation stack
Notes PostScript file API
Scene graphs
Using make to build programs
PostScript basics
Advanced PostScript
Software architecture: Import/export
Quiz #1 In class, closed book

The fifth week moves into traditional CS2 territory with the implementation of simple stacks. The first stack implementation uses a statically allocated array. Runtime stacks are presented as a vital application. Students start to see that the underlying software/hardware machinery beneath their programs is not trivial.

Week #5
Topics Stacks and their uses
Stack implementation (static array)
Execution/function call stack
Notes Array stack
Algebraic datatypes and testing
Execution (call) stack
Reading x86 assembler code
Project #3 Stack abstract datatype (ADT)

The sixth week dives into pointers starting with C-style character arrays. I believe that it’s easier to grok the equivalence of pointers and addresses without thinking about list structure, too. It’s time for dynamic storage allocation, too, before coding gets too complicated.

Week #6
Topics Pointers, arrays, C-style strings
Dynamic storage allocation
Notes C-style strings
Pointers, arrays
Dynamic storage allocation
Command line arguments
C/C++ background

Now the course begins to build on what is learned. The projects are more complicated and some projects use code that is developed in earlier projects. A major course theme is “We use what we build”. Students should not look at assignments as one-off, throw-away projects. Functional correctness is important for future success, too.

Parsing plays an important role in computer science and in real-world applications. Languages are defined in terms of tokens and syntax. C-style strings are applied to lexical analysis which produces tokens. Reverse Polish Notation (RPN) expressions are evaluated on stacks.

Week #7
Topics Lexical analysis (token parsing)
RPN and expression evaluation
Notes C-style strings
Lexical analysis
C++ file streams
Standard C library
Project #4 Read and evaluate RPN
Lexical analyzer class will be reused

The linked list is an essential CS2 topic. Students learn to use pointers to represent relationships and links, not just a fast way to walk through a C-style character string. Students must rewrite the simple stack ADT using linked lists and dynamically allocated storage. The stack ADT interface must be honored and preserved. The project reuses the lexical analyzer and the RPN evaluator. The new stack ADT implementation should plug right in.

Week #8
Topics Linked lists
Stack implementation (linked list)
Notes Function call (execution stack)
Dynamic storage allocation
Stack using dynamic storage
Linked lists
Project #5 Rewrite stack ADT to use linked list and dynamic storage
Reuse lexical analyzer and RPN evaluation
Midterm examination In class, closed book

The mid-term examination usually has one coding problem, one or more essay questions and one or more “short answer” problems. I struck on one technique which drew a positive response from my students. Students must write an English paragraph describing the behavior of their code. This activity clarified their thinking and led to better solutions (under pressure).

Trees are a high priority data structure topic. Recursion and trees go hand-in-hand, so recursion is discussed in depth. (No pun intended.) Depth-first and breadth-first traversal are presented. The eventual goal is to represent a simple arithmetic expression as a tree and to walk the expression tree. Recursive descent parsing is used, providing a more relevant example of recursion in action than Fibonacci numbers!

Week #9
Topics Recursion
Trees and their uses
Notes Tree data structures
Recursion
Recursive descent parsing
Prefix expression parsing
Project #6 Parse, build and traverse expression tree

Recursion and trees are complicated enough to require two full weeks of discussion. It’s a lot for students to get their heads around! Recursion is too important for future study to just blow through it.

Week #10
Topics Trees and recursion
Tree implementation (array and pointer)
Recursive descent parsing
Notes Tree data structures
Recursion
Recursive descent parsing
Prefix expression parsing
Infix to RPN/postfix
Infix expression parsing

Search is a natural co-topic with linked lists and trees. Space/time complexity and big-O notation are introduced. The projects are fairly involved at this point and much class time is devoted to project-related problem definition, design, student questions and so forth. I enjoyed interactively talking through problems, design and pseudo-code. One simply cannot sling complex projects at beginning students and let them sink beneath the waves!

Week #11
Topics Searching and complexity
Big-O notation
Simple search algorithms
Notes Simple search
Binary search trees
Big-O notation

Half of a lecture period is spent taking the quiz and another half period is spent discussing the solution to the quiz. My quizzes require students to write code to solve a simple problem using one of the data structures. If students are allowed to collaborate on projects, then you really need examples of individual work for evaluation/assessment.

Week #12
Topics Searching and complexity
Binary search trees
Notes Simple search
Binary search trees
Big-O notation
Project #7 Binary search tree (BST)
Quiz #2 In class, closed book

Hash tables are a high priority topic. They are used for symbol storage and look up when compiling. They are a fast an alternative for searching (compared to a binary search tree, for example.) Hashing function design is important. Discuss ways to assess a hash function.

Week #13
Topics Searching and complexity (continued)
Hash tables
Notes Hash tables

Discrete event simulation (DES) was the capstone project at the end of the first semester. The second semester’s capstone project was a dictionary ADT implemented via hash table. The students also participated in a contest to see who had the fastest dictionary look-up. I presented DES at the end of the second semester, too. DES demonstrates the use of queues for scheduling.

Week #14
Topics Queues (arrays and linked list)
Discrete event simulation (DES)
Notes Queues
Discrete event simulation
Project #8 Hash table search
Week #15
Topics Queues (arrays and linked list)
Discrete event simulation (DES)
Notes Queues
Discrete event simulation
Final examination Scheduled by university, closed book

Projects

The projects for each semester follow a similar progression. The first few projects illustrate abstraction and encapsulation. The set ADT is self-contained and is relatively easy to carry out. The robot parade project is a fun project that students enjoy. Both projects give students a sense of confidence while they are learning basic software development on Linux.

Project Title
1 Bit set abstract datatype
2 PostScript graphics
3 Robot parade
4 Stack ADTs
5 RPN evaluation
6 Stack ADT using linked list
7 Parsing infix expressions
8 Draw expression tree
9 Discrete event simulation
First semester projects
Project Title
1 Mathematical set abstract datatype
2 Robot parade
3 Stack ADT
4 RPN evaluation
5 Stack ADT using linked list
6 Parse, build, traverse expression tree
7 Dictionary ADT (binary search tree)
8 Dictionary ADT (hash table)
Second semester projects

Nine projects are too many for a 15 to 16 week course, so the PostScript graphics assignment was eliminated for the second term. The PostScript graphic project developed a class to emit simple PostScript operations to a file, providing a rudimentary interface to generate a scene expressed in PostScript. The scene could then be submitted to a PostScript engine like Ghostscript to render the scene. A PostScript graphics class was provided to students during the second term in order to reduce the number of projects and save time. Eight projects in one semester is still quite aggressive!

Students were required to implement stack ADTs during both terms. The first implementation uses a static array representation and the second implementation uses linked lists and dynamic memory allocation. The ADTs have the same external interface and the exercises demonstrate the value of transparency and encapsulation.

Also common to both semesters, students implemented a lexical analyzer and a recursive descent parser for expressions. The parser uses the lexical analyzer. The details of the lexical analyzer and expression language were different each term to eliminate reuse (AKA cheating) from semester to semester.

The advanced projects were different for the two semesters.

  • During the first semester, students wrote code to traverse an expression tree (built by their parsers) and draw a simple representation of the tree using the PostScript graphics module. The final project was a discrete event simulator where the students provide the queue implementation.
  • During the second semester, students wrote code to traverse an expression tree and print the expression in prefix, infix and postfix form. PostScript graphics was dropped to save time and effort. The final projects implemented a dictionary ADT using a binary search tree and a hash table. Students were required to measure the execution time for each implementation, demonstrating the much faster look-up time for the hash table implementation.

The dictionary ADT projects are more satisfying from the CS perspective. Discrete event simulation is a more practical problem and solution.

One or more C++ source files were given to students to jump-start design and implementation. At the very least, the include (.h) file for the key abstract datatype or module was provided. Several files were provided for the more substantial projects. For discrete event simulation, students were given a robust implementation of a simulator with a hole in the middle — the place for their priority queue ADT. This approach lets students attack significant problems and raises their interest level.

Looking to the future, I would draw on problems in “big data” especially for projects that measure performance or space/time complexity. Three possible application areas are genetics, climate analysis/forecasting and medical imaging. Many kinds of large data sets are available on the Web. For example, fully sequenced genomes are readily available on the WEB and would provide large real-world data for experiments in sequencing.

Copyright © 2005-2013 Paul J. Drongowski