Computer Design
A computer aided design and VLSI approach

Paul J. Drongowski

Chapter 14 - Modeling the organization.

This chapter discusses the modeling of the machine organization.
Again we will construct a C language program to simulate and
analyze the machine behavior. In order for the analysis to be
meaningful, we must simulate the behavior of the hardware and
microcode as closely as possible to the real thing. The simulation
must account for all of the physical registers (ISA and non-ISA)
the design, the behavior of the combinational logic and the operation
of the controller.

The organization level model is an opportunity to evaluate the
proposed system implementation before committing resources to
detailed design at the switching (logic) and geometric levels of
abstraction. The model is a good testbed for debugging the
microcode as observability and controllability are very high.
Indeed, it also makes sense to develop "microdiagnostics" at
this time to check-out the actual hardware once it has been
fabricated. Detailed timing information can be added to the
model based upon desk calculations for clock period and expected
microinstruction execution times. Thus, the execution speed of
complex system functions can be estimated through simulation.
Finally, the level of modeling detail is still sufficiently
"coarse" that it is feasible to run the operating system (or
some other ISA-level program) on the organization level model --
the ratio of wall clock time to simulated time is high enough
to obtain good turnaround. It is not presently possible to
run the operating system on a gate or transistor level model.

Section 1 - Representing storage.

As in the ISA model, storage is represented by C language
variables. The designer may have included non-ISA registers
and memory elements into the organization for performance
or other pragmatic reasons. Individual registers may have
been combined into a register file as well. Both ISA and
non-ISA registers must be modeled using simple and
one-dimensional array variables as before.

A simple variable should be declared for each bus in the
datapath. Bus variables will be used to set-up storage
values during the first phase of the 2-phase non-overlapping
clock. The values will be stored during the second phase.

Section 2 - Combinational logic.

Combinational logic can be modeled by C language functions
that accept a group of arguments (the inputs to the combinational
block) and return a result. A two to one multiplexer, for example,
is modeled by the function shown in Figure 1. The three inputs
to the multiplexer are the two data inputs and the selection
value. The switch statement selects between the two data inputs
and returns the requested value. The default case catches possible
errors. The C assignment statement:
    Z = Mux21(X, Y, XYSelect) ;
simulates the behavior of the datapath segment appearing in Figure 2.


        int Mux21(A, B, Select) int A, B, Select ;

          {
          switch ( Select )
            {
            case 0: return(A) ;
            case 1: return(B) ;
            default:
              {
              printf("* Error * Improper selection value.\n");
              }
            }
          }

               Figure 1 - Two to one multiplexer. 


      ---     ---
     | X |   | Y |
      ---     ---
       |       |
       V       V
      -----------
     |   Mux21   |<---- XYSelect
      -----------
           |
           V
          ---
         | Z |
          ---

    Figure 2 - Multiplexing example.

Figure 3 is the C language code for an ALU block. It has four
inputs: the left and right operands, the operation code and
the carry in bit. The function performs one of eight arithmetic
and logical operations. If a block produces more than output
as is often the case with an ALU, we cannot directly model the
block behavior in C due to the limited number of values (one)
which can be returned from a function. It may be necessary to
introduce a global variable to hold the value of the ALU result
and another C function to compute the value of the carry out,
zero or overflow flag. The function in Figure 4, for example,
computes the carry-out from the ALU block.


    int ALU(A, B, Op, Cin) int A, B, Op, Cin ;

      {
      switch( Op )
        {
        case 0: return( A + B + Cin );  /* Addition */
        case 1: return( A - B - Cin );  /* Subtraction */
        case 2: return( A & B );        /* Logical AND */
        case 3: return( A | B );        /* Logical OR */
        case 4: return( A ^ B );        /* Exclusive OR */
        case 5: return( ~ A );          /* Logical NOT */
        case 6: return( - A );          /* Two's complement */
        case 7: return( A );            /* Pass A unmodified */
        }
      }

                Figure 3 - ALU example.


    int CarryOut(A, B, Op, Cin) int A, B, Op, Cin ;

      {
      int Out ;

      switch( Op )
        {
        case 0: Out = (A + B + Cin) >> 8 ;
        case 1: Out = (A - B - Cin) >> 8 ;
        case 6: Out = (- A) >> 8 ;
        default: Out = 0 ;
        }
      return( Out & 1 ) ;
      }

         Figure 4 - ALU carry out example.


Section 3 - Simulating the controller.

The two most prevalent forms of control are the hardwired finite
state machine (FSM) and microprogrammed styles. In either case,
the use of a table-based control simulation is natural. The
control design is easily modified because it is centrally located
in a table and not "hard coded" into the C simulation program.

The general structure of a C program for FSM controllers is given
in Figure 5. The variable "ThisState" keeps track of the current
execution state of the machine. The two dimensional tables
"NextState" and "Action" are indexed by the current state and the
condition inputs to be sensed by the controller. The switch
statement within the non-terminating for loop dispatches to a
set of C language statements that implement the operations to
be performed during that machine cycle. The "Action" table provides
the dispatch value. One the action has completed, the simulation
proceeds to the next machine state as selected from the "NextState"
array.


    int ThisState ;

    int NextState[STATES][INPUTS] ,
        Action[STATES][INPUTS] ;

    for(;;)
      {
      switch ( Action[ThisState][Input] )
        {
        case 0: ...
          ...
        default: ...
        }
      ThisState = NextState[ThisState][Input] ;
      }

         Figure 5 - FSM simulation.


The structure of a PLA-based, microprogrammed control simulation
is similar. Again, a simple variable retains the current machine
state. Instead of action and next state tables, two tables are
allocated to store the AND and OR planes of the PLA. The number of
rows in each table is determined by the number of product terms
(and microinstructions) to be stored in the PLA. Each element of
the array contains a '0', '1', or 'X' character value. (The OR plane
only contains '0' and '1' values, an observation that can be exploited
for execution efficiency.) Enough columns must be allocated to
accommodate the condition inputs, next state values and control
outputs.


    int ThisState, ThisTerm ;

    char Inputs[INPUTS] ;

    char AndPlane[TERMS][NEXT+INPUTS] ,
         OrPlane[TERMS][NEXT+OUTPUTS] ;

    for(;;)
      {
      ThisTerm = Match(ThisState, Inputs) ;

      ... assignments ...

      ThisState = OrPlane[ThisTerm][NEXTFIELD] ;
      }

     Figure 6 - PLA-based, microprogrammed style.

The PLA hardware is fast because all product terms are evaluated
in parallel; it is combinational logic. In C language, however,
the individual product terms must be evaluated one at a time
until the one true term is found. The search for the true term
is a pattern matching operation and in Figure 6, this operation
is performed by the function "Match." A naive implementation of
"Match" is given in Figure 7. This function will search each
row of the AND-plane and compare the this state and next state
information, and the current value of the condition inputs against
the corresponding '0' or '1' values in the table. (Notice that
don't care 'X' inputs are ignored.) A negative value is returned
whenever a matching row is not found.

  int Match(ThisState, Inputs) int ThisState ; char Inputs[] ;

    {
    int Row, Column, In, Found ;

    for (Row = 0; Row < TERMS; Row = Row + 1)
      {
      if (AndPlane[Row][NEXT] == ThisState)
        {
        Found = 1 ;
        for(Column = 0; Column < INPUTS; Column = Column + 1)
          {
          if ((AndPlane[Row][Column] != 'X') &&
              (AndPlane[Row][Column] != Inputs[Column]))
            {
            Found = 0 ;
            }
          }
        if (Found) return( Row ) ;
        }
      }
    return( -1 ) ;
    }

         Figure 7 - Naive implementation of "Match."

The call to function "Match" returns the row number of the true term
which is stored in "ThisTerm." The row number is used to select the
OR-plane signals that control the datapath assignments. The row number
and next state field from the OR-plane gives the value of the next
machine state.

The implementation of "Match" in Figure 7 is relatively slow since
it performs a linear search through the AND-plane table. Better
run time performance can be gained by using an index and link look-up
technique (Figure 8.) Microinstructions are stored as records
which are arranged into linked lists. A vector (one dimensional
array) of pointer variables is defined where each entry points to
the head of a list of microinstructions. The microinstructions in
each list share the same current state selection value. By indexing
the list with the current state value, the search for the next
microinstruction is shortened since only those microinstructions
on the list are associated with that state value. A linear search
through the list will find the target microword.

      ---
     | 0 |---->
     |---|      ---------       ---------
     | 1 |---->| 0 | ... |---->| 1 | ... |
     |---|      ---------       ---------
     | 2 |---->
     |---|
     | 3 |---->
     |---|
      ...

         Figure 8 - Index and link representation.

Section 4 - Clocks.

Once the appropriate microinstruction has been found, it must be
executed. Because we will employ a two-phase, non-overlapping clock
in later chapters, we discuss simulation in that context.

To use a two-phase, non-overlapping clock properly, certain system
operations must be consistently assigned to the different clock
phases. We will assign operations in the following way.
  * During phase one (Phi-1), data inputs will be enabled into
    the combinational logic. New results (the next state) will be
    computed and set-up to the inputs of the storage elements.
  * The new data values will be enabled into the storage elements
    and stored during Phi-2.
The effect of 2-phase clocking can be simulated through a two
step process.
  * In the first step, bus values are computed and stored in the
    bus variables. This step will probably involve several function
    calls as combinational blocks compute new data values.
  * Bus values are transferred to destination storage elements
    during step 2.
Computations and storage operations are controlled by the current
microinstruction.


  WriteBus  ----
       --->| IR |
      |     ----
      |     ----
      |--->| AC |-----------
      |     ----            |
      |     ----            |
      |--->| MD |--------   |
      |     ----         |  |
      |     ----         |  |
      |--->| CR |-----   |  |
      |     ----      |  |  |
      |     ----      |  |  |
      |--->| SR |--   |  |  |
      |     ----   |  |  |  |
      |            V  V  V  V
      |     ----   ----------
      |--->|  B | |   Mux41  | ReadBus
      |     ----   ----------
      |       |         |
      |       V         V
   -------  ---------------      ---
  | Mux21 ||      ALU      |--->| Z |
   -------  ---------------      ---
    ^   ^          |
    |   |          |
  --     ----------

     Figure 9 - Example Sp.1 datapath.


An example two bus SP.1 datapath appears in Figure 9. The AC, MD,
CR and SR registers can be routed to the right input of the ALU
via the "ReadBus." This bus is implemented with a four to one
multiplexer. The ALU result and an external input value can be
transfered to any of the registers via "WriteBus." The ALU
result and input are selected for WriteBus by a two to one
multiplexer.

     /* Step 1 */

     ReadBus  = Mux41(AC, MD, CR, SR, ReadSelect) ;
     WriteBus = Mux21(In, ALU(B, ReadBus, Op, Cin), WriteSelect) ;
     ZBit     = ALUZ(B, ReadBus, Op, Cin) ;

     /* Step 2 */

     if (WRAC) AC = WR ;
     if (WRMD) MD = WR ;
     if (WRCR) CR = WR ;
     if (WRSR) SR = WR ;
     if (WRB)   B = WR ;
     if (WRIR) IR = WR ;
     if (WRZ)   Z = ZBit ;

        Figure 10 - Two step datapath simulation.

The C code for simulating the example datapath is given in Figure 10.
During the first step, the bus values "ReadBus," "WriteBus" and "ZBit"
are computed. Four functions are called which correspond to the ALU
and multiplexer blocks in the datapath. Their operation is controlled
by the microinstruction fields "ReadSelect," "WriteSelect," "Op" and
"Cin." Step one values are conditionally transfered to the destination
registers during step two. Each of the conditional expressions is a
microinstruction field.

Section 5 - Design project 3.

Construct a C program which simulates your paper design of the SP.1
at the organization level. The program should use C language
functions to simulate combinational blocks and should use the
two step assignment process outlined in Section 4. The simulated
SP.1 must successfully execute the SP.1 diagnostic program which
was developed in the first project.

To complete this assignment, you must encode the flow graph into
a PLA-based microprogram. You should store the PLA programming
information in a separate file such that the information may also
be used to generate a switch level model for the PLA and eventually
the PLA circuit layout.

There are four products from this assignment.
  * A listing of the C simulation program.
  * A listing of the encoded SP.1 emulator (microprogram.)
  * A trace of the microengine executing the SP.1 ISA
    diagnostic program.
  * An essay describing the decisions made while constructing
    the simulation, its limitations and its accuracy.
It will be much easier to debug the microcode at this level of
abstraction than the switch (logic) level. Thus, any additional
effort here will pay off in later assignments.

Copyright (c) 1987-2013 Paul J. Drongowski