Computer Design A computer aided design and VLSI approach Paul J. Drongowski Chapter 14 - Modeling the organization. This chapter discusses the modeling of the machine organization. Again we will construct a C language program to simulate and analyze the machine behavior. In order for the analysis to be meaningful, we must simulate the behavior of the hardware and microcode as closely as possible to the real thing. The simulation must account for all of the physical registers (ISA and non-ISA) the design, the behavior of the combinational logic and the operation of the controller. The organization level model is an opportunity to evaluate the proposed system implementation before committing resources to detailed design at the switching (logic) and geometric levels of abstraction. The model is a good testbed for debugging the microcode as observability and controllability are very high. Indeed, it also makes sense to develop "microdiagnostics" at this time to check-out the actual hardware once it has been fabricated. Detailed timing information can be added to the model based upon desk calculations for clock period and expected microinstruction execution times. Thus, the execution speed of complex system functions can be estimated through simulation. Finally, the level of modeling detail is still sufficiently "coarse" that it is feasible to run the operating system (or some other ISA-level program) on the organization level model -- the ratio of wall clock time to simulated time is high enough to obtain good turnaround. It is not presently possible to run the operating system on a gate or transistor level model. Section 1 - Representing storage. As in the ISA model, storage is represented by C language variables. The designer may have included non-ISA registers and memory elements into the organization for performance or other pragmatic reasons. Individual registers may have been combined into a register file as well. Both ISA and non-ISA registers must be modeled using simple and one-dimensional array variables as before. A simple variable should be declared for each bus in the datapath. Bus variables will be used to set-up storage values during the first phase of the 2-phase non-overlapping clock. The values will be stored during the second phase. Section 2 - Combinational logic. Combinational logic can be modeled by C language functions that accept a group of arguments (the inputs to the combinational block) and return a result. A two to one multiplexer, for example, is modeled by the function shown in Figure 1. The three inputs to the multiplexer are the two data inputs and the selection value. The switch statement selects between the two data inputs and returns the requested value. The default case catches possible errors. The C assignment statement: Z = Mux21(X, Y, XYSelect) ; simulates the behavior of the datapath segment appearing in Figure 2. int Mux21(A, B, Select) int A, B, Select ; { switch ( Select ) { case 0: return(A) ; case 1: return(B) ; default: { printf("* Error * Improper selection value.\n"); } } } Figure 1 - Two to one multiplexer. --- --- | X | | Y | --- --- | | V V ----------- | Mux21 |<---- XYSelect ----------- | V --- | Z | --- Figure 2 - Multiplexing example. Figure 3 is the C language code for an ALU block. It has four inputs: the left and right operands, the operation code and the carry in bit. The function performs one of eight arithmetic and logical operations. If a block produces more than output as is often the case with an ALU, we cannot directly model the block behavior in C due to the limited number of values (one) which can be returned from a function. It may be necessary to introduce a global variable to hold the value of the ALU result and another C function to compute the value of the carry out, zero or overflow flag. The function in Figure 4, for example, computes the carry-out from the ALU block. int ALU(A, B, Op, Cin) int A, B, Op, Cin ; { switch( Op ) { case 0: return( A + B + Cin ); /* Addition */ case 1: return( A - B - Cin ); /* Subtraction */ case 2: return( A & B ); /* Logical AND */ case 3: return( A | B ); /* Logical OR */ case 4: return( A ^ B ); /* Exclusive OR */ case 5: return( ~ A ); /* Logical NOT */ case 6: return( - A ); /* Two's complement */ case 7: return( A ); /* Pass A unmodified */ } } Figure 3 - ALU example. int CarryOut(A, B, Op, Cin) int A, B, Op, Cin ; { int Out ; switch( Op ) { case 0: Out = (A + B + Cin) >> 8 ; case 1: Out = (A - B - Cin) >> 8 ; case 6: Out = (- A) >> 8 ; default: Out = 0 ; } return( Out & 1 ) ; } Figure 4 - ALU carry out example. Section 3 - Simulating the controller. The two most prevalent forms of control are the hardwired finite state machine (FSM) and microprogrammed styles. In either case, the use of a table-based control simulation is natural. The control design is easily modified because it is centrally located in a table and not "hard coded" into the C simulation program. The general structure of a C program for FSM controllers is given in Figure 5. The variable "ThisState" keeps track of the current execution state of the machine. The two dimensional tables "NextState" and "Action" are indexed by the current state and the condition inputs to be sensed by the controller. The switch statement within the non-terminating for loop dispatches to a set of C language statements that implement the operations to be performed during that machine cycle. The "Action" table provides the dispatch value. One the action has completed, the simulation proceeds to the next machine state as selected from the "NextState" array. int ThisState ; int NextState[STATES][INPUTS] , Action[STATES][INPUTS] ; for(;;) { switch ( Action[ThisState][Input] ) { case 0: ... ... default: ... } ThisState = NextState[ThisState][Input] ; } Figure 5 - FSM simulation. The structure of a PLA-based, microprogrammed control simulation is similar. Again, a simple variable retains the current machine state. Instead of action and next state tables, two tables are allocated to store the AND and OR planes of the PLA. The number of rows in each table is determined by the number of product terms (and microinstructions) to be stored in the PLA. Each element of the array contains a '0', '1', or 'X' character value. (The OR plane only contains '0' and '1' values, an observation that can be exploited for execution efficiency.) Enough columns must be allocated to accommodate the condition inputs, next state values and control outputs. int ThisState, ThisTerm ; char Inputs[INPUTS] ; char AndPlane[TERMS][NEXT+INPUTS] , OrPlane[TERMS][NEXT+OUTPUTS] ; for(;;) { ThisTerm = Match(ThisState, Inputs) ; ... assignments ... ThisState = OrPlane[ThisTerm][NEXTFIELD] ; } Figure 6 - PLA-based, microprogrammed style. The PLA hardware is fast because all product terms are evaluated in parallel; it is combinational logic. In C language, however, the individual product terms must be evaluated one at a time until the one true term is found. The search for the true term is a pattern matching operation and in Figure 6, this operation is performed by the function "Match." A naive implementation of "Match" is given in Figure 7. This function will search each row of the AND-plane and compare the this state and next state information, and the current value of the condition inputs against the corresponding '0' or '1' values in the table. (Notice that don't care 'X' inputs are ignored.) A negative value is returned whenever a matching row is not found. int Match(ThisState, Inputs) int ThisState ; char Inputs[] ; { int Row, Column, In, Found ; for (Row = 0; Row < TERMS; Row = Row + 1) { if (AndPlane[Row][NEXT] == ThisState) { Found = 1 ; for(Column = 0; Column < INPUTS; Column = Column + 1) { if ((AndPlane[Row][Column] != 'X') && (AndPlane[Row][Column] != Inputs[Column])) { Found = 0 ; } } if (Found) return( Row ) ; } } return( -1 ) ; } Figure 7 - Naive implementation of "Match." The call to function "Match" returns the row number of the true term which is stored in "ThisTerm." The row number is used to select the OR-plane signals that control the datapath assignments. The row number and next state field from the OR-plane gives the value of the next machine state. The implementation of "Match" in Figure 7 is relatively slow since it performs a linear search through the AND-plane table. Better run time performance can be gained by using an index and link look-up technique (Figure 8.) Microinstructions are stored as records which are arranged into linked lists. A vector (one dimensional array) of pointer variables is defined where each entry points to the head of a list of microinstructions. The microinstructions in each list share the same current state selection value. By indexing the list with the current state value, the search for the next microinstruction is shortened since only those microinstructions on the list are associated with that state value. A linear search through the list will find the target microword. --- | 0 |----> |---| --------- --------- | 1 |---->| 0 | ... |---->| 1 | ... | |---| --------- --------- | 2 |----> |---| | 3 |----> |---| ... Figure 8 - Index and link representation. Section 4 - Clocks. Once the appropriate microinstruction has been found, it must be executed. Because we will employ a two-phase, non-overlapping clock in later chapters, we discuss simulation in that context. To use a two-phase, non-overlapping clock properly, certain system operations must be consistently assigned to the different clock phases. We will assign operations in the following way. * During phase one (Phi-1), data inputs will be enabled into the combinational logic. New results (the next state) will be computed and set-up to the inputs of the storage elements. * The new data values will be enabled into the storage elements and stored during Phi-2. The effect of 2-phase clocking can be simulated through a two step process. * In the first step, bus values are computed and stored in the bus variables. This step will probably involve several function calls as combinational blocks compute new data values. * Bus values are transferred to destination storage elements during step 2. Computations and storage operations are controlled by the current microinstruction. WriteBus ---- --->| IR | | ---- | ---- |--->| AC |----------- | ---- | | ---- | |--->| MD |-------- | | ---- | | | ---- | | |--->| CR |----- | | | ---- | | | | ---- | | | |--->| SR |-- | | | | ---- | | | | | V V V V | ---- ---------- |--->| B | | Mux41 | ReadBus | ---- ---------- | | | | V V ------- --------------- --- | Mux21 || ALU |--->| Z | ------- --------------- --- ^ ^ | | | | -- ---------- Figure 9 - Example Sp.1 datapath. An example two bus SP.1 datapath appears in Figure 9. The AC, MD, CR and SR registers can be routed to the right input of the ALU via the "ReadBus." This bus is implemented with a four to one multiplexer. The ALU result and an external input value can be transfered to any of the registers via "WriteBus." The ALU result and input are selected for WriteBus by a two to one multiplexer. /* Step 1 */ ReadBus = Mux41(AC, MD, CR, SR, ReadSelect) ; WriteBus = Mux21(In, ALU(B, ReadBus, Op, Cin), WriteSelect) ; ZBit = ALUZ(B, ReadBus, Op, Cin) ; /* Step 2 */ if (WRAC) AC = WR ; if (WRMD) MD = WR ; if (WRCR) CR = WR ; if (WRSR) SR = WR ; if (WRB) B = WR ; if (WRIR) IR = WR ; if (WRZ) Z = ZBit ; Figure 10 - Two step datapath simulation. The C code for simulating the example datapath is given in Figure 10. During the first step, the bus values "ReadBus," "WriteBus" and "ZBit" are computed. Four functions are called which correspond to the ALU and multiplexer blocks in the datapath. Their operation is controlled by the microinstruction fields "ReadSelect," "WriteSelect," "Op" and "Cin." Step one values are conditionally transfered to the destination registers during step two. Each of the conditional expressions is a microinstruction field. Section 5 - Design project 3. Construct a C program which simulates your paper design of the SP.1 at the organization level. The program should use C language functions to simulate combinational blocks and should use the two step assignment process outlined in Section 4. The simulated SP.1 must successfully execute the SP.1 diagnostic program which was developed in the first project. To complete this assignment, you must encode the flow graph into a PLA-based microprogram. You should store the PLA programming information in a separate file such that the information may also be used to generate a switch level model for the PLA and eventually the PLA circuit layout. There are four products from this assignment. * A listing of the C simulation program. * A listing of the encoded SP.1 emulator (microprogram.) * A trace of the microengine executing the SP.1 ISA diagnostic program. * An essay describing the decisions made while constructing the simulation, its limitations and its accuracy. It will be much easier to debug the microcode at this level of abstraction than the switch (logic) level. Thus, any additional effort here will pay off in later assignments. Copyright (c) 1987-2013 Paul J. Drongowski