Computer Design A computer aided design and VLSI approach Paul J. Drongowski Chapter 13 - A design example - The PDP-11/40 Although the PDP-11 series machines do not represent the most current technology, they are an interesting and well-documented computer family. The design of a contemporary machine usually involved proprietary a design embodied in custom (or semi-custom) integrated circuits. The PDP-11/40 was designed and manufactured in an era of standard, off the shelf integrated circuits. Thus, much could be learned about practical computer design by studying the maintenance manual, schematics and microcode flow diagrams. This chapter covers the salient features of the 11/40 design. Section 1 - The PDP-11 family and the 11/40. The PDP-11 was a planned family of software and hardware compatible computers. The instruction set architecture is a classic that has set the pattern for other machines such as the 6502, 6800 and 68000 microcomputers. Key features are itemized below. * Eight 16-bit general registers are provided for arithmetic, logical and addressing operations. Two registers are set aside as the stack pointer (R6 or SP) and the program counter (R7 or PC.) * Processor conditions are centralized into the Processor Status Word (PSW.) Conditions include flags for a negative or zero result, overflow and carry, and processor trap priority. The PSW is addressable as octal location 777776. * Eight basic addressing modes are available. They are register, autoincrement, autodecrement, index, register deferred (indirect), autoincrement deferred, autodecrement deferred and index deferred modes. When applied to the program counter register, four more modes are obtained (immediate, absolute, relative, and relative deferred.) * The instruction set uses the addressing modes in a fairly regular way. (The instructions are summarized in Tables 1 to 4.) The double operand (two address) instructions move and modify data between a source and destination location. The single operand (one address) instructions always operate upon a destination location. Note that the register addressing mode permits the use of the general register set as a fast scratchpad for computations. * Each exception condition, I/O interrupt and operating system trap is dispatched to a unique "interrupt vector" from which the entry address of the service routine and new processor priority are read. * A common I/O bus called the "UNIBUS" connects the processor with memory and one or more I/O device controllers. The processor (the bus master) controls bus operation for the slaves (memory and I/O devices.) Direct memory access (DMA) transfers take place on the UNIBUS. Device registers are addressed as memory locations in the I/O page (the last 4K words in "memory.") * Memory management (virtual memory) and extended arithmetic processing (long integers and floating point) are available as options on some models. These common features were generally implemented in a consistent fashion across family members. The memory management and extended arithmetic options, however, varied between machines depending upon price and performance. Mnemonic Description ----------------------------------------- MOV(B) Move source to destination CMP(B) Compare source to destination ADD Add source to destination SUB Subtract source from destination BIT(B) Bit test (non-destructive AND) BIC(B) Bit clear (logical AND) BIS(B) Bit set (logical OR) XOR Exclusive OR Table 1 - PDP-11 double operand instructions. The PDP-11/40 was intended to be a machine of modest capability and cost. The 11/40 was first shipped in January 1973 and was preceded by the first PDP-11 model, the 11/20, a minimal low cost model (the 11/05), and the higher performance 11/45. The PDP-11/45 was designed for low-cost scientific computation and high speed process control and communications. The low cost 11/05 model was targeted for original equipment manufacturer (OEM) market where it would be designed into the deliverable product (e.g., medical imaging, process control etc.) of a third party vendor. Thus, there was a need to create an intermediate model. Mnemonic Description ----------------------------------------- CLR(B) Clear destination COM(B) Complement destination (NOT) INC(B) Increment destination DEC(B) Decrement destination NEG(B) Negate destination (two's complement) TST(B) Test destination ASR(B) Arithmetic shift right ASL(B) Arithmetic shift left ROR(B) Rotate destination right (with carry) ROL(B) Rotate destination left (with carry) SWAB Swap bytes ADC(B) Add carry SBC(B) Subtract carry SXT Extend sign bit Table 2 - PDP-11 single operand instructions. The PDP-11/40 is a reduced capability PDP-11/45. The PDP-11/40 has a subset of the memory management and floating point features of the 11/45. The optional memory management unit does not support the separation of instructions and data, limiting the maximum size of a user process to 32K words. The Extended Instruction Set is a plug-in option that performs integer multiplication, division and multiposition, single and double word shifts. The Floating Point option performs single precision add, multiply, divide and subtract operations. The special arithmetic features of the PDP-11/45 are far more extensive (a separate set of registers and double precision floating point), faster and more expensive. Mnemonic Description ----------------------------------------- BR Branch (unconditional) BNE Branch if not equal to zero BEQ Branch if equal to zero BPL Branch if plus BMI Branch if minus BVC Branch if overflow is clear BVS Branch if overflow if set BCC Branch if carry is clear BCS Branch if carry is set BGE Branch if greater than or equal to zero BLT Branch if less than zero BGT Branch if greater than zero BLE Branch if less than or equal to zero BHI Branch if higher (unsigned comparison) BLOS Branch if lower or the same (unsigned) BHIS Branch if higher or the same (unsigned) BLO Branch if lower (unsigned comparison) Table 3 - PDP-11 branch instructions. The PDP-11/45 processor was designed to match the speed of the newly introduced 300 nanosecond bipolar memory. A fast private processor to memory bus was added to keep high speed computations off of the slower UNIBUS. The PDP-11/45 used instruction prefetch and a dual scratchpad register scheme to speed processing. The PC register is broken out from the general register file to support instruction prefetch. High speed (and power) Shottky logic was just introduced and was incorporated into the 11/45 datapath. Mnemonic Description ----------------------------------------- JMP Jump JSR Jump to subroutine RTS Return from subroutine MARK Mark SOB Subtract one and branch if zero EMT Emulator trap TRAP Trap (user) BPT Breakpoint trap IOT Input/output trap RTI Return from interrupt RTT Return from interrupt (trace) HALT Halt the processor WAIT Wait for interrupt RESET Reset the UNIBUS CLC, CLV, Clear condition code CLZ, CLN, CCC SEC, SEV, Set condition code SEZ, SEN, SCC Table 4 - Jump, trap and miscellaneous instructions. In contrast, the PDP-11/40 performs all of its memory operations on the UNIBUS. The 11/40 accesses memory at a rate of 1000 nanoseconds per transfer as opposed to the 300 nanosecond per transfer rate of the PDP-11/45 with bipolar memory. The 11/40 uses an older circuit technology (standard 74-series TTL). The typical delays through the 74181 and 74S181 parts, for example, are 24 and 11 nanoseconds, respectively. The 11/40 does not employ a tandem register file further reducing the component count. Table 5 summarizes the internal characteristics of the 11/05, 11/40 and 11/45 models. Performance for the PDP-11/45 is shown for both core (13) and bipolar (41) memories. The relative costs address only the actual processor cost and does not include I/O devices or the extra expense of semiconductor memory. Thus, the higher speed of the PDP-11/45 comes at a much greater cost than the indicated value. The lower cost of the 11/05 derives from the smaller number of boards and IC packages needed for the implementation. Control Store Relative Relative Model Boards IC Packages IC Types Words Size Perf Cost -------------------------------------------------------------------- 11/05 2 203 60 249 40 2.5 1.0 11/40 5 417 53 251 56 3.6 3.3 11/45 8 696 78 256 64 13.0 3.8 41.0 Table 5 - Processor size. (Source: Snow and Siewiorek.) Section 2 - Machine structure. The design of the PDP-11/40 processor can be separated into three interacting subsystems: the datapath, controller and I/O subsystems (Figure 1.) The controller and datapath exchange control and condition signals as expected. Memory (bus) addresses and data are communicated to the I/O subsystem through the bus address (BA) and data (D) registers. Incoming memory (bus) data arrives at its own input port to the datapath subsystem. Conditions and control signals are also exchanged with the I/O subsystem to start bus operations and to detect external interrupts and exception conditions. The 11/40 uses a programmable, stoppable clock. The controller can turn off the clock until it is restarted by the I/O subsystem. This provides a reliable synchronization mechanism for data transfers between the datapath, controller and I/O subsystem. The I/O subsystem acts as the interface between the processor and the UNIBUS. It is also responsible for bus arbitration, detecting time-out conditions, etc. We will not discuss the I/O subsystem in this chapter. ------------------------------------------------------- | | | | Datapath -------> Conditions <-------- | | <------- Control signals -- | | | ^ | | | | | | | | | | | Controller | | | | | | | | | | | | | V V | -------------- | | | | | ----- ----- | | Stoppable | | | | |---|---| BA |---| D |--------| Programmable |---|-|-| | | ----- ----- | Clock | | | | | | | | -------------- | | | | | V V V | | | I/O subsystem | | | ------- UNIBUS ---------------------------------------- Figure 1 - Overall structure of the 11/40. Section 3 - Datapath. The PDP-11/40 datapath is a two bus design with the read bus arranged down the right side of Figure 2 and the write bus on the left. Both busses are brought out to the backplane where connections to the EIS and Floating Point options are made. All primary transfer paths are sixteen bits wide. The ISA general registers are contained in the sixteen word register file at the top of the diagram. This file can be addressed by: * Instruction bits 8 to 6 (source field), * Instruction bits 2 through 0 (destination field), * The register immediate field (RIF) of the microinstruction, and * The low order four bits of the bus address (BA) register. As in most two bus designs, two temporary registers, B and D, have been introduced to hold intermediate results. The B register contains the left argument to the ALU while the D register holds the ALU result for a later clock phase when the result will be written back to the register file. This permits the transfer of one word around the entire datapath in one (long) microinstruction. The left (B) input to the ALU may be one of several values: * The unmodified contents of the B register. * The low order byte of B with bit seven (the sign of the low order byte) extended through the high order byte. * The high or low order byte duplicated. * The high and low order bytes swapped (exchanged.) Constants such as the values one and two, the switch register address and key trap addresses can also be passed to the ALU through the B multiplexer. Figure 2 - The PDP-11/40 datapath. The D multiplexer selects one of four values to be placed onto the write bus. Read bus data can be routed directly onto the write bus, thereby avoiding a long delay through the ALU. Next, incoming UNIBUS data arrives through the D multiplexer. Finally, either the unmodified contents of D can be placed on the write bus or the D value shifted left and concatenated with the carry bit. This arrangement implements the left rotate operation. The ALU has several choices for carry in and carry out logic. The carry in to the ALU may be force to zero, one or the carry bit from the Processor Status Word. Choices for carry out include the carry out from both the eighth and sixteenth stage of the ALU, the sign of the ALU result and the current value of the carry bit. Carry in and out values are selected with multiplexers. The PSW is implemented using individually gated (enabled) flip/flops which can also be read and written as a group. As mentioned earlier, UNIBUS data is transfered into the datapath through the D multiplexer. An incoming word can be stored in the register file, PSW and a sixteen bit register (IR) which holds a PDP-11 ISA instruction during its execution. IR fields are sent to the register file multiplexer for register selection and to the controller for instruction decoding. UNIBUS addresses are communicated through BA which may be loaded with either the value on the read bus or the current ALU result. The D register holds outgoing data to be written to the UNIBUS. Section 4 - Controller. The PDP-11/40 controller implements nearly all of the control techniques which we have discussed so far (Figure 3.) It is a microprogrammed controller with a 56-bit microinstruction. The address of the next microinstruction to be fetched and executed is stored in the current microinstruction. The next microaddress field may be modified by instruction decode logic and current machine conditions (states) under the control of the branch test field in the current microinstruction. Figure 3 - The PDP-11/40 controller. The careful reader will note the presence of two registers in the next address calculation and fetch loop of the controller. The first register holds the ROM address of the next microinstruction to be fetched. The control word register contains the currently executing microinstruction. This design permits microinstruction fetch and decode to be overlapped (pipelined) in time. The effect of a microbranch is, therefore, delayed by one microinstruction execution cycle. ___ P1 ___________________| |_____ |<------- 140 ns ----->| ___ P2 ______________________________| |______ |<------- 200 ns ---------------->| ___ ___ P3 ______________________________| |________| |______ |<------- 200 ns ---------------->| |<------- 300 ns ----------------------------->| Figure 4 - Microinstruction timing. The PDP-11/40 employs a programmable, stoppable clock. Three different microinstruction timings can be selected. The shortest timing, P1, is used for non-ALU data transfers. For example, a word may be transfered from the register file to B (and/or BA) or from B to the register file in a P1 cycle. P2 is sufficiently long that an operand can be selected from the register file, combined with a B value by the ALU, and stored in either D or BA. The P3 timing has a pulse occurring at both 200 and 300 nanoseconds. A result produced during the 200 ns phase can then be transfered to the write bus for storage. Only an additional 100 ns is required because the microinstruction fetch overhead is not incurred. Thus, more work (a complete transfer around the datapath) can be performed in one microcycle. Synchronous regeneration ------------------- | | | ------- ------- P1 -------- -->| | | |---->| |--> } Timing Asynchronous restart | Clock | | Clock | P2 | Clock |--> } pulses to ----------------------->| |-->| Pulse |---->| Enable |--> } datapath, Maintenance control |Control| | Gen | P3 | Gates |--> } I/O S/S, ----------------------->| | | |---->| |--> } and control ------- ------- -------- | | | { CLKOFF ------- | | Microword { CLK1 ----------------- | Register { CLK0 --------------------- Control { Enabling { Signals Figure 5 - Clock logic. (Source: KD-11A maintenance manual.) The UNIBUS, peripheral devices and processor run independently at their own speeds. This independence permits I/O DMA transfers and processor instruction execution to proceed concurrently. The processor and UNIBUS must synchronize during data transfers, however. When reading or writing bus data, the control turns the clock off and awaits the completion of the data transfer. The I/O subsystem will restart the clock after the data has been transfered. Input transfers are reliable because the processor never attempts to read invalid or transitory bus data. Section 5 - Microcode. The PDP-11/40 microword is 56 bits long. Tables 6 through 12 summarize the microinstruction fields and their interpretations. The 11/40 is a horizontal machine with very little instruction encoding. Hence, the control fields in the microinstruction are almost a verbatim list of signals that must be generated for datapath, clock and microprogram control. In Table 10, for example, the ALU mode and function field is identical to the mode and function inputs of the 74181 ALU employed in the datapath. Field Timing Value Operation ------------------------------------- CLKB P1 & P3 0 No operation 1 B <- D MUX CLKBA P1 & P2 0 No operation 1 BA <- BA MUX CLKD P2 0 No operation 1 D <- ALU result CLKIR P1 & P3 0 No operation 1 IR <- D MUX WRH P1 & P3 0 No operation 1 R <- D MUX<15:8> WRL P1 & P3 0 No operation 1 R <- D MUX<7:0> Table 6 - Register enables. Field Control Value selected -------------------------------- SBAM 0 ALU result 1 Read bus SDM 0 Read bus 1 UNIBUS data 2 D 3 D concat D<14:0> SBMH 0 B<15:8> 1 B<7> (Sign extend) 2 B<7:0> 3 Constant<15:8> SBML 0 B<7:0> 1 B<7:0> 2 B<15:8> 3 Constant<7:0> Table 7 - Multiplexer selection. Field Control Register index -------------------------------- RIF Register immediate field SRBA 0 - 1 BA <3:0> (Bus address) SRD 0 - 1 IR <2:0> (Destination field) SRI 0 - 1 RIF <3:0> (Immediate) SRS 0 - 1 IR <8:6> (Source field) Table 8 - Register indexing. Field Timing Value Source ------------------------------------- SPS P1 & P3 0 D MUX 1 C 2 NZV 3 NZVC 6 Read bus <- PSW 7 PSW <- D MUX Table 9 - Processor status word. Tables 6, 7, 8 and 9 show the set of transfers which are possible to perform during microinstruction execution. The register enable fields (Table 6) controls the storage of data into the datapath registers. The timing for write operations is given. Note that ALU results are stored on P2 and all other registers may be clocked on either P1 or P3. Field Control Operation -------------------------------- SALUM 0 Arithmetic mode 1 Logical mode SALU ALU function DAD Discrete alteration of data Table 10 - Arithmetic and logic. Field Control Operation -------------------------------- BUS 0 - 1 DATI (data in) 2 Await bus busy 3 DATIP (data in and pause) 4 - 5 DATO (data out) 6 Restart on bus release 7 DATOB (data out byte) Table 11 - UNIBUS control. UNIBUS operations are summarized in Table 11. The DATI, DATIP, DATO and DATOB operations initiate a data transfer between the processor and the UNIBUS. DATIP begins a bus read cycle, but forces the UNIBUS to pause until a data out transfer is requested. DATIP permits the construction of uninterruptable "read-modify-write" transfers that can be used by the operating system for process synchronization. Field Control Operation -------------------------------- CLKOFF 0 No op 1 Turn clock off CLK 0 P1 (140 ns) 2 P2 (200 ns) 3 P3 (200 and 300 ns) UBF<4:0> Microbranch field UPF<7:0> Next microaddress field Table 12 - Clock and microcontrol. The clock control fields are given in Table 12. The UBF field controls the sensing of processor and I/O conditions by the microcontroller. We mentioned earlier that branches are delayed by one microcycle and the address of the next microword is contained in the currently executing microinstruction. The 11/40 controller takes the logical OR of the condition selected by the UBF field with that next address information to form the true next address value which is sent to the ROM via the ROM address register. The microinstructions are carefully assigned to ROM locations such that the condition can be OR'ed with the appropriate number of zero bits. The PDP-11/40 documentation served as the basis for the flow graph notation introduced in Chapter 9. Some examples are included in Figures 6 through 8. Figure 6 contains the main fetch and decode routine. The branch logic is tailored for the PDP-11 instruction set and has a separate flow branch for each of the special cases. In microprogramming, it is important to minimize the number of decode and branch steps per ISA instruction since these steps have a relatively high overhead (especially with delayed branches!) The branch control logic and flow graph are usually designed together so that the flow graph can be "compressed" as much as possible. The main branch also dispatches control for front panel (console), I/O bus (service), exception trap and expansion (EIS and floating point) operations. Figure 6 - Fetch and decode. Figure 7 - Return from subroutine. Figure 8 - Branch instruction. Copyright (c) 1987-2013 Paul J. Drongowski