Computer Design
A computer aided design and VLSI approach

Paul J. Drongowski

Chapter 2 - Computer architecture.

In the early 1960's, computer vendors recognized that software
compatibility had become a crucial issue with the user community.
Customers had invested considerable sums of money in application
programs only to find that the new machines had incompatible
instruction sets and their investment was lost. (Remember that
much of the software of that era was written in assembly language.)
The designers of the IBM 360 sought to alleviate the compatibility
problem by defining and developing a whole series of machines
with the same instruction set, but varying levels of performance.
Thus, the notion of a computer family or architecture was born.

The contemporary idea of computer architecture has expanded
beyond the definition of common instruction sets to include
standard communication protocols, resource management schemes
and even languages. We will discuss these aspects of computer
architecture in this chapter. How a machine is structured and
built to realize a particular architecture is called "computer
organization," the primary topic of this book.

Section 1 - Instruction set architecture.

The instruction set architecture (ISA) is a "programmer's
eye view" of the operation of a machine with that ISA.
All members of an architectural family must operate in
accordance with the abstraction specification of the ISA
to guarantee software compatibility at the machine code
level. The ISA consists of a description of the registers,
instructions (operations and addressing modes), and special
programming features for input/output (I/O), memory and
processor management.

The registers provide a relatively small amount of fast
memory for scratchpad computations and control. They are
used to store information with a short lifetime and when
the use of the information is very frequent or urgent.
The programmer ordinarily has one or more registers
available for direct use by the application program.
(The exception is a stack architecture in which all
intermediate results are carried in a push-down stack.)
Other registers are used primarily for control and access
to them may be limited or restricted in some way. The PDP-11
architecture, for example, provides eight 16-bit general
registers. However, two of the registers are special
purpose and are the stack pointer and program counter.
The other six registers are truly general purpose and may
be used for arithmetic and logical computations, memory
addressing (indirect and indexed memory access), and
subroutine linkage.

The instruction set itself is often separated into operations
and addressing modes. It is not unusual to find a common set
of memory access modes which may be used in a regular manner
across a large subset of the operations. Machines that
employ this concept are said to be "orthogonal." Conceivably,
the operations and addressing modes could be arranged along
two orthogonal axes and any combination of operation and
addressing mode in that space is meaningful. It is generally
easier to construct a compiler for an orthogonal machine as
fewer peculiarities or special cases must be handled when
generating code.

Architectures may be classified by the number of addresses
permitted within a single instruction. The common types of
machines are listed below.

  Zero address. Stack architectures are zero address machines.
  Operands and results are maintained on a push-down stack. To
  perform an addition, for example, the two operands are popped
  from the stack and added together. The sum is then pushed
  onto the stack where it will await the next operation. The
  location of the operands are implicit in the instructions
  (the top of the stack) and hence, no explicit address is
  necessary. Compiler code generation for a stack machine is
  easy because the code for expression evaluation is identical
  to the Polish prefix form of an expression.

  One address. One address architectures are sometimes called
  "load and store" machines. Each operand is loaded from primary
  memory into a separate register using a one address load
  instruction. The operation is performed and a one address
  store instruction moves the result to memory. As an alternative,
  dyadic operations may be performed on a register operand and
  a single memory value with the result stored in the register.
  
  Two address. In a two address machine, both operands are
  explicitly identified. The result is always stored to the
  location of the second operand. The double operand instructions
  of the PDP-11 are a good example.

  Three address. Three address instructions specify the location
  of the two operands and the destination of the result. Code
  generation for a three address machine is also straight forward
  since may be easily selected from and written to any address
  in primary memory.

Choice of a zero, one, two or three address architecture will
depend upon the ease and efficiency of compiling, program
size size and memory utilization requirements and the desired
frequency of relatively slow primary memory access.

At least eight kinds of memory addressing modes are in common
use. General register architectures that use a file (vector)
of word registers have a register (or register to register)
mode for selecting a register operand from the file. In direct
or absolute addressing, the identity of a memory location is
part of the instruction itself. An indirect address specifies
a register or memory location which contains the absolute
address of the operand or destination. Constants can be
introduced into a program through "immediate" mode operands.
Here the constant value is stored with the instruction and
is used during the execution of the instruction.

Indexed addressing is used to access the elements of a vector
or array in memory. The base address of the array is stored
explicitly as part of the instruction and is added to an index
value from a general register. A special case of indexed
addressing is "program counter relative" addressing. In this
case, the program counter register is added to offset stored
with the instruction. Code which uses program counter relative
addressing exclusively is easily relocated (shifted about)
in memory. An index value may be automatically increased or
decreased by a constant (one or two in the PDP-11) as a side
effect of the instruction. This is sometime called "auto increment"
or "auto decrement" addressing. The increment or decrement operation
may either precede or follow actual access to the memory location
for "pre" and "post" increment (decrement) addressing, respectively.
The PDP-11 uses pre decrement and post increment register indirect
addressing. This naturally yields stack push and pop operations
when applied to a stack pointer.

Instruction set operations include such arithmetic and logical
operations as addition, subtraction, two's and one's complement
(logical NOT), bit-wise AND, bit-wise OR, and shifts (logical,
arithmetic and rotating.) Machines which are tailored for a
particular market segment often include specialized instructions
that increase the speed of important customer applications in
the segment. For example, an architecture for the scientific
market might have an instruction to evaluate polynomials, an
operation that occurs during the computation of sines, cosines
and other transcendental functions. A business machine might
have instructions for editing character strings. As a final
example, queue and process management instructions could be
built into a computer for the realtime processing market where
the scheduling of activities is important and frequent.

The increasing complexity of machine instruction sets created
a backlash in the early 1980's. A return to "reduced instruction
set computers" (RISC) was proposed with the intent of increasing
processor speed through the reduction of hardware complexity.
The early machines of von Neumann and others were RISC's since
the cost and unreliability of active vacuum tube circuitry in the
1940's and 50's dictated the minimization of hardware and
instruction set size.

RISC's employ very elementary instructions that can be evaluated
during one clock period. Increased speed is gained by pipelined
parallelism, simple datapath circuitry and a short clock period.
As a side benefit, the functional density of very large scale
integrated (VLSI) RISC is substantially increased. Complex
instruction set computers (CISC) use large control stores to
retain the low level (micro) programming needed to implement a
large set of complicated (macro) instructions. RISC proponents
argue that most CISC instructions are under (or never) utilized
and that large control stores are fundamentally incompatible with
VLSI circuit technology.

As virtually all interesting programs make decisions and have one
or more loops, instructions that alter program flow are available.
Conditional goto's test the current state of the registers or
condition flags and switch the execution flow to a new location.
Unconditional goto's merely force instruction selection to a new
address. Common or frequently used instruction sequences may be
stored and called as subroutines. The instruction set usually
includes an instruction to call a subroutine by saving the current
value of the program counter register and jumping to the subroutine
entry location. When the subroutine completes its work, it executes
a return instruction which restores the return address to the program
counter register. The use of programming languages with complex
subroutine calling sequences has led to specialized call instructions
in complex instruction set computers. The VAX CALLS instruction, for
example, builds an entire call frame on the execution stack.

As we will discuss in Section 3, architecture often includes
features for the management of computing resources. The ISA must
provide instructions assist the resource management task. One
task, input/output between primary memory and secondary storage
devices, may be accommodated in one or two ways through programmed or
memory-mapped I/O. With programmed I/O, special instructions are
provided to control and communication with I/O devices over the
I/O bus. These instructions send and receive data and control
information, initiate device operations, and sense error and device
completion signals. In memory-mapped I/O, devices are manipulated
through special registers which are accessible as standard primary
memory locations. No specialized instructions are required as the
standard memory access instructions may be used for device control
and communication.

Section 2 - Communications.

As computer vendors began to use a common ISA across members
of a product family, they came to recognize the importance of
well-defined, standard interfaces. If the I/O bus can be
standardized, for example, peripherals can be shared between
family members. Further, the peripherals do not need to be
discarded due to an upgrade to a faster processing unit if the
new processing unit is I/O bus compatible with the old unit.
This protects a customer's investment in peripheral equipment
and makes a product family more attractive.

Busses may be proprietary or they may be an industry standard.
Proprietary busses may protect the technological advantage
of a vendor by hiding details about the operation of the bus.
The use of a proprietary bus may also limit the size and number
of "third party" vendors who provide an alternative source of
equipment for a vendor's product line. Sometimes a proprietary
bus becomes popular enough to be standardized. For example,
the Shugart Associates System Interface (SASI) is the basis
for the Small Computer System Interface (SCSI) for personal
computer equipment. The Intel Multibus became IEEE standard
STD-796.

Section 3 - Resource management.

Operating systems have also had a substantial impact on
computer architecture. The primary function performed by
an operating system is resource management -- the allocation
and control of computing resources. Three kinds of physical
resources are central to any computing system: the processor,
primary memory and the I/O devices (including secondary
storage.) It is the job of the operating system to transform
these physical resources into programs (processes or activities),
files and virtual I/O channels.

The processor is the locus of all computational activity in
the system. In a modern multiprogramming system where a number
of user jobs share a computing resource, the operating system
must schedule those jobs for execution. Scheduling is usually
broken into two levels: strategic resource management and physical
scheduling. Strategic level scheduling uses process or job
queues to arrange user jobs by priority or need. The physical
scheduler actually moves a job onto the physical processor for
execution in accordance with the strategic scheduling strategy.
Moving idle or waiting jobs off and read to run jobs onto the
processor is a time consuming task. Special instructions for
queue management and physical scheduling ("context switching")
can be provided to speed this task.

Primary memory holds the program code and data during and between
execution on the processor. To accommodate large programs and to
better utilize the primary store, a scheme called "virtual memory"
is employed. In virtual memory, only the portion of a program
which is essential to its execution, its "working set," needs to
be in primary memory for execution. Thus, many more jobs may be
in memory waiting to run and the total memory requirements of a
job may exceed the physical size of primary store. When an essential
piece of the program is found missing (a "memory fault"), the
operating system must bring that piece in from secondary storage.
If another job is ready to run, it may be physical scheduled onto
the processor while the missing piece is brought in.

Virtual memory requires a means to partition a program into "pieces."
The two most common schemes are "paging," which uses fixed size
pieces called "pages," and "segmentation," which uses variable size
pieces called "segments." The architecture must define the procedure
from mapping virtual program addresses into physical memory addresses
such that a program may be relocated anywhere in memory. Special
instructions may be needed to set-up and control the mapping procedure
and to move information between virtual memory spaces.

An important task in any operating system is the protection of
information from either programming errors or malicious use.
Protection is accomplished by controlling to disk files, pages or
segments. Again, special instructions or memory address mapping
operations must be included in the architectural specification.

Application programmers are optimists. System programmers and
operating system designers are pessimists because they must deal
with the errors and unusual circumstances created by the optimists!
Unanticipated conditions and events are called "exceptions"
and the operating system must cleanly handling exceptions without
interfering with the other users or worse, crashing the system.
Exceptions cover a broad range of conditions including virtual
memory faults, arithmetic errors (e.g., divide by zero), attempted
execution of undefined (illegal) instructions, I/O device errors,
unauthorized attempts to access protected information and power
failures. Exceptions are reported by "traps" or "interrupts"
which disturb the normal program flow. Execution is switched to
an operating system routine that was provided to handle the exception.
The mechanisms for detecting an exception, switching to the
operating system, and resuming normal execution must be specified
for the architecture.

Section 4 - Architecture and technology.

The discussion in this chapter has centered upon a particular
kind of architecture known as the "von Neumann machine." In the
late 1940's, von Neumann, Banks and Goldstine first defined
what has become the most popular and prevalent kind of computing
engine.

The von Neumann architecture has several distinguishing features
or principles. First, primary memory is organized into a linearly
addressable array of fixed length binary words. Both instructions
and data are stored in primary memory and are indistinguishable
from one another (both data and instructions are binary values.)
A special register known as an instruction pointer or program
counter is used to select and fetch instructions from memory for
execution. Goto instructions are available to conditionally and
unconditionally alter the selection of instructions by changing
the value of the instruction pointer. The program counter is
advanced instruction by instruction in sequential order through
memory for ordinary instructions. The hardware is synchronous
and is controlled by a central clock. Finally, all arithmetic is
performed in parallel rather than bit-serial fashion using
relatively primitive data operations.

Several of the von Neumann principles are self-reinforcing.
For example, the sequential selection of instruction and linear
addressing of the memory are consistent. The binary encoding
of instructions and data permits the mechanical manipulation of
instructions as data, leading to assemblers, compilers and
higher level languages. It is probably the self-consistency of
the von Neumann engine which has given it beauty and longevity.

The von Neumann architecture, however, has certain drawbacks.
All instructions and data must move through the single
channel between the processor and memory. Due to the synchronous,
sequential control of the machine, operations can only be
performed one at a time. Therefore, several computations may
not proceed at once concurrently even if the data for those
computations is available. Thus, the opportunities for exploiting
program parallelism are limited. All data structures (records,
linked lists, arrays, etc.) must be mapped onto the linearly
addressable store. Certain kinds of data structures like sets
and bags have notably inefficient implementations on von Neumann
machines. Newer programming styles, especially declarative
languages like functional or logic programming, are incompatible
with the von Neumann architecture. Even the lowly goto, whose
use is considered harmful by structured programmers, is fundamental
to the von Neumann style.

Although we will not cover them in depth here, several modifications
to the von Neumann architecture have been proposed.

  Vector machines have instructions that operate on arrays instead
  of single binary words. By providing many channels between the
  processor and memory and by "pipelining" the processor, much higher
  throughput can be obtained. The use of vector instructions reduces
  the overhead on the processor-memory channel through a reduction in
  the number of instructions fetched per useful operation performed.

  Massive parallelism may be employed by connecting tens, hundreds
  or thousands of processing units together. Thus, a very large
  number of operations can be performed concurrently by the processors.
  Operands and results are exchanged by the processors when necessary.

  The notion of a linear program representation and sequential instruction
  execution can be eliminated. This is the approach pursued in "dataflow
  machines" that use data dependency graphs as their program representation.
  A data dependency graph shows the producer-consumer relationships
  between program operations. If a sufficient number of physical processing
  elements are available, many producers can be executing at once.

This chapter is only a brief introduction to computer architecture.
As architecture moves away from the von Neumann paradigm, the field
poses many exciting engineering challenges for the future.

Copyright (c) 1987-2013 Paul J. Drongowski