Computer Design
A computer aided design and VLSI approach

Paul J. Drongowski

Chapter 13 - A design example - The PDP-11/40

Although the PDP-11 series machines do not represent the most
current technology, they are an interesting and well-documented
computer family. The design of a contemporary machine usually
involved proprietary a design embodied in custom (or semi-custom)
integrated circuits. The PDP-11/40 was designed and manufactured
in an era of standard, off the shelf integrated circuits. Thus,
much could be learned about practical computer design by studying
the maintenance manual, schematics and microcode flow diagrams.
This chapter covers the salient features of the 11/40 design.

Section 1 - The PDP-11 family and the 11/40.

The PDP-11 was a planned family of software and hardware compatible
computers. The instruction set architecture is a classic that has
set the pattern for other machines such as the 6502, 6800 and
68000 microcomputers. Key features are itemized below.
  * Eight 16-bit general registers are provided for arithmetic,
    logical and addressing operations. Two registers are set aside
    as the stack pointer (R6 or SP) and the program counter (R7 or PC.)
  * Processor conditions are centralized into the Processor Status
    Word (PSW.) Conditions include flags for a negative or zero result,
    overflow and carry, and processor trap priority. The PSW is addressable
    as octal location 777776.
  * Eight basic addressing modes are available. They are register,
    autoincrement, autodecrement, index, register deferred (indirect),
    autoincrement deferred, autodecrement deferred and index deferred
    modes. When applied to the program counter register, four more
    modes are obtained (immediate, absolute, relative, and relative
    deferred.)
  * The instruction set uses the addressing modes in a fairly regular
    way. (The instructions are summarized in Tables 1 to 4.) The
    double operand (two address) instructions move and modify data
    between a source and destination location. The single operand
    (one address) instructions always operate upon a destination
    location. Note that the register addressing mode permits the
    use of the general register set as a fast scratchpad for
    computations.
  * Each exception condition, I/O interrupt and operating system trap
    is dispatched to a unique "interrupt vector" from which the entry
    address of the service routine and new processor priority are read.
  * A common I/O bus called the "UNIBUS" connects the processor with
    memory and one or more I/O device controllers. The processor (the
    bus master) controls bus operation for the slaves (memory and I/O
    devices.) Direct memory access (DMA) transfers take place on the
    UNIBUS. Device registers are addressed as memory locations in the
    I/O page (the last 4K words in "memory.")
  * Memory management (virtual memory) and extended arithmetic
    processing (long integers and floating point) are available as
    options on some models.
These common features were generally implemented in a consistent
fashion across family members. The memory management and extended
arithmetic options, however, varied between machines depending upon
price and performance.

    Mnemonic   Description
    -----------------------------------------
    MOV(B)     Move source to destination
    CMP(B)     Compare source to destination
    ADD        Add source to destination
    SUB        Subtract source from destination
    BIT(B)     Bit test (non-destructive AND)
    BIC(B)     Bit clear (logical AND)
    BIS(B)     Bit set (logical OR)
    XOR        Exclusive OR

    Table 1 - PDP-11 double operand instructions.

The PDP-11/40 was intended to be a machine of modest capability
and cost. The 11/40 was first shipped in January 1973 and was
preceded by the first PDP-11 model, the 11/20, a minimal low
cost model (the 11/05), and the higher performance 11/45. The
PDP-11/45 was designed for low-cost scientific computation and
high speed process control and communications. The low cost
11/05 model was targeted for original equipment manufacturer (OEM)
market where it would be designed into the deliverable product
(e.g., medical imaging, process control etc.) of a third party
vendor. Thus, there was a need to create an intermediate model.

    Mnemonic   Description
    -----------------------------------------
    CLR(B)     Clear destination
    COM(B)     Complement destination (NOT)
    INC(B)     Increment destination
    DEC(B)     Decrement destination
    NEG(B)     Negate destination (two's complement)
    TST(B)     Test destination
    ASR(B)     Arithmetic shift right
    ASL(B)     Arithmetic shift left
    ROR(B)     Rotate destination right (with carry)
    ROL(B)     Rotate destination left  (with carry)
    SWAB       Swap bytes
    ADC(B)     Add carry
    SBC(B)     Subtract carry
    SXT        Extend sign bit

    Table 2 - PDP-11 single operand instructions.

The PDP-11/40 is a reduced capability PDP-11/45. The PDP-11/40
has a subset of the memory management and floating point features
of the 11/45. The optional memory management unit does not support
the separation of instructions and data, limiting the maximum size
of a user process to 32K words. The Extended Instruction Set is a
plug-in option that performs integer multiplication, division and
multiposition, single and double word shifts. The Floating Point
option performs single precision add, multiply, divide and subtract
operations. The special arithmetic features of the PDP-11/45 are
far more extensive (a separate set of registers and double precision
floating point), faster and more expensive.

    Mnemonic   Description
    -----------------------------------------
    BR         Branch (unconditional)
    BNE        Branch if not equal to zero
    BEQ        Branch if equal to zero
    BPL        Branch if plus
    BMI        Branch if minus
    BVC        Branch if overflow is clear
    BVS        Branch if overflow if set
    BCC        Branch if carry is clear
    BCS        Branch if carry is set
    BGE        Branch if greater than or equal to zero
    BLT        Branch if less than zero
    BGT        Branch if greater than zero
    BLE        Branch if less than or equal to zero
    BHI        Branch if higher (unsigned comparison)
    BLOS       Branch if lower or the same (unsigned)
    BHIS       Branch if higher or the same (unsigned)
    BLO        Branch if lower (unsigned comparison)

    Table 3 - PDP-11 branch instructions.

The PDP-11/45 processor was designed to match the speed of
the newly introduced 300 nanosecond bipolar memory. A fast
private processor to memory bus was added to keep high speed
computations off of the slower UNIBUS. The PDP-11/45 used
instruction prefetch and a dual scratchpad register scheme
to speed processing. The PC register is broken out from the
general register file to support instruction prefetch. High
speed (and power) Shottky logic was just introduced and was
incorporated into the 11/45 datapath.

    Mnemonic   Description
    -----------------------------------------
    JMP        Jump
    JSR        Jump to subroutine
    RTS        Return from subroutine
    MARK       Mark
    SOB        Subtract one and branch if zero
    EMT        Emulator trap
    TRAP       Trap (user)
    BPT        Breakpoint trap
    IOT        Input/output trap
    RTI        Return from interrupt
    RTT        Return from interrupt (trace)
    HALT       Halt the processor
    WAIT       Wait for interrupt
    RESET      Reset the UNIBUS
    CLC, CLV,  Clear condition code
    CLZ, CLN,
    CCC
    SEC, SEV,  Set condition code
    SEZ, SEN,
    SCC

    Table 4 - Jump, trap and miscellaneous instructions.

In contrast, the PDP-11/40 performs all of its memory operations
on the UNIBUS. The 11/40 accesses memory at a rate of 1000 nanoseconds
per transfer as opposed to the 300 nanosecond per transfer rate of
the PDP-11/45 with bipolar memory. The 11/40 uses an older circuit
technology (standard 74-series TTL). The typical delays through the
74181 and 74S181 parts, for example, are 24 and 11 nanoseconds,
respectively. The 11/40 does not employ a tandem register file
further reducing the component count. Table 5 summarizes the
internal characteristics of the 11/05, 11/40 and 11/45 models.
Performance for the PDP-11/45 is shown for both core (13)
and bipolar (41) memories. The relative costs address only the
actual processor cost and does not include I/O devices or the
extra expense of semiconductor memory. Thus, the higher speed
of the PDP-11/45 comes at a much greater cost than the indicated
value. The lower cost of the 11/05 derives from the smaller
number of boards and IC packages needed for the implementation.

                                       Control Store Relative Relative
  Model  Boards  IC Packages  IC Types  Words  Size    Perf     Cost
  --------------------------------------------------------------------
  11/05    2         203         60      249    40      2.5      1.0
  11/40    5         417         53      251    56      3.6      3.3
  11/45    8         696         78      256    64     13.0      3.8
                                                       41.0

     Table 5 - Processor size.
               (Source: Snow and Siewiorek.)

Section 2 - Machine structure.

The design of the PDP-11/40 processor can be separated into
three interacting subsystems: the datapath, controller and I/O
subsystems (Figure 1.) The controller and datapath exchange
control and condition signals as expected. Memory (bus)
addresses and data are communicated to the I/O subsystem through
the bus address (BA) and data (D) registers. Incoming memory
(bus) data arrives at its own input port to the datapath
subsystem. Conditions and control signals are also exchanged
with the I/O subsystem to start bus operations and to detect
external interrupts and exception conditions. The 11/40 uses
a programmable, stoppable clock. The controller can turn off
the clock until it is restarted by the I/O subsystem. This
provides a reliable synchronization mechanism for data transfers
between the datapath, controller and I/O subsystem.

The I/O subsystem acts as the interface between the processor
and the UNIBUS. It is also responsible for bus arbitration,
detecting time-out conditions, etc. We will not discuss the
I/O subsystem in this chapter.

     -------------------------------------------------------
    |                           |                           |
    |      Datapath          -------> Conditions <--------  |
    |                        <------- Control signals --  | |
    |   ^      |         |      |                       | | |
    |   |      |         |      |     Controller        | | |
    |   |      |         |      |                       | | |
    |   |      V         V      |     --------------    | | |
    |   |    -----     -----    |    |  Stoppable   |   | | |
    |---|---| BA  |---|  D  |--------| Programmable |---|-|-|
    |   |    -----     -----         |    Clock     |   | | |
    |   |      |         |            --------------    | | |
    |   |      V         V                              V | |
    |                                 I/O subsystem         |
    |                                                       |
     ------- UNIBUS ----------------------------------------

            Figure 1 - Overall structure of the 11/40.

Section 3 - Datapath.

The PDP-11/40 datapath is a two bus design with the read bus
arranged down the right side of Figure 2 and the write bus on
the left. Both busses are brought out to the backplane where
connections to the EIS and Floating Point options are made.
All primary transfer paths are sixteen bits wide.

The ISA general registers are contained in the sixteen word
register file at the top of the diagram. This file can be
addressed by:
  * Instruction bits 8 to 6 (source field),
  * Instruction bits 2 through 0 (destination field),
  * The register immediate field (RIF) of the microinstruction, and
  * The low order four bits of the bus address (BA) register.
As in most two bus designs, two temporary registers, B and D,
have been introduced to hold intermediate results. The B register
contains the left argument to the ALU while the D register holds
the ALU result for a later clock phase when the result will be
written back to the register file. This permits the transfer of
one word around the entire datapath in one (long) microinstruction.
The left (B) input to the ALU may be one of several values:
  * The unmodified contents of the B register.
  * The low order byte of B with bit seven (the sign of the low
    order byte) extended through the high order byte.
  * The high or low order byte duplicated.
  * The high and low order bytes swapped (exchanged.)
Constants such as the values one and two, the switch register address
and key trap addresses can also be passed to the ALU through the
B multiplexer.

      Figure 2 - The PDP-11/40 datapath.

The D multiplexer selects one of four values to be placed onto the
write bus. Read bus data can be routed directly onto the write bus,
thereby avoiding a long delay through the ALU. Next, incoming UNIBUS
data arrives through the D multiplexer. Finally, either the unmodified
contents of D can be placed on the write bus or the D value shifted
left and concatenated with the carry bit. This arrangement implements
the left rotate operation.

The ALU has several choices for carry in and carry out logic. The
carry in to the ALU may be force to zero, one or the carry bit
from the Processor Status Word. Choices for carry out include
the carry out from both the eighth and sixteenth stage of the ALU,
the sign of the ALU result and the current value of the carry bit.
Carry in and out values are selected with multiplexers. The PSW
is implemented using individually gated (enabled) flip/flops which
can also be read and written as a group.

As mentioned earlier, UNIBUS data is transfered into the datapath
through the D multiplexer. An incoming word can be stored in
the register file, PSW and a sixteen bit register (IR) which holds
a PDP-11 ISA instruction during its execution. IR fields are sent
to the register file multiplexer for register selection and to
the controller for instruction decoding. UNIBUS addresses are
communicated through BA which may be loaded with either the value
on the read bus or the current ALU result. The D register holds
outgoing data to be written to the UNIBUS.

Section 4 - Controller.

The PDP-11/40 controller implements nearly all of the control
techniques which we have discussed so far (Figure 3.) It is a
microprogrammed controller with a 56-bit microinstruction. The
address of the next microinstruction to be fetched and executed
is stored in the current microinstruction. The next microaddress
field may be modified by instruction decode logic and current
machine conditions (states) under the control of the branch
test field in the current microinstruction.

      Figure 3 - The PDP-11/40 controller.

The careful reader will note the presence of two registers in
the next address calculation and fetch loop of the controller.
The first register holds the ROM address of the next microinstruction
to be fetched. The control word register contains the currently
executing microinstruction. This design permits microinstruction
fetch and decode to be overlapped (pipelined) in time. The effect
of a microbranch is, therefore, delayed by one microinstruction
execution cycle.

                            ___
     P1 ___________________|   |_____
        |<------- 140 ns ----->|
                                       ___
     P2 ______________________________|   |______
        |<------- 200 ns ---------------->|
                                       ___          ___
     P3 ______________________________|   |________|   |______
        |<------- 200 ns ---------------->|
        |<------- 300 ns ----------------------------->|

              Figure 4 - Microinstruction timing.

The PDP-11/40 employs a programmable, stoppable clock. Three different
microinstruction timings can be selected. The shortest timing, P1,
is used for non-ALU data transfers. For example, a word may be transfered
from the register file to B (and/or BA) or from B to the register file in
a P1 cycle. P2 is sufficiently long that an operand can be selected
from the register file, combined with a B value by the ALU, and stored
in either D or BA. The P3 timing has a pulse occurring at both 200
and 300 nanoseconds. A result produced during the 200 ns phase can
then be transfered to the write bus for storage. Only an additional
100 ns is required because the microinstruction fetch overhead is not
incurred. Thus, more work (a complete transfer around the datapath)
can be performed in one microcycle.

                    Synchronous regeneration
                       -------------------
                      |                   |
                      |    -------     -------  P1   --------
                       -->|       |   |       |---->|        |--> } Timing
  Asynchronous restart    | Clock |   | Clock | P2  |  Clock |--> } pulses to
  ----------------------->|       |-->| Pulse |---->| Enable |--> } datapath,
  Maintenance control     |Control|   |  Gen  | P3  |  Gates |--> } I/O S/S,
  ----------------------->|       |   |       |---->|        |--> } and control
                           -------     -------       --------
                              |         |   |
            {   CLKOFF -------          |   |
  Microword {     CLK1 -----------------    |
   Register {     CLK0 ---------------------
    Control { Enabling
            {  Signals

      Figure 5 - Clock logic.
                 (Source: KD-11A maintenance manual.)

The UNIBUS, peripheral devices and processor run independently at
their own speeds. This independence permits I/O DMA transfers and
processor instruction execution to proceed concurrently. The
processor and UNIBUS must synchronize during data transfers, however.
When reading or writing bus data, the control turns the clock
off and awaits the completion of the data transfer. The I/O subsystem
will restart the clock after the data has been transfered. Input
transfers are reliable because the processor never attempts to read
invalid or transitory bus data.

Section 5 - Microcode.

The PDP-11/40 microword is 56 bits long. Tables 6 through 12 summarize
the microinstruction fields and their interpretations. The 11/40 is
a horizontal machine with very little instruction encoding. Hence,
the control fields in the microinstruction are almost a verbatim
list of signals that must be generated for datapath, clock and
microprogram control. In Table 10, for example, the ALU mode and
function field is identical to the mode and function inputs of the
74181 ALU employed in the datapath.

    Field   Timing   Value   Operation
    -------------------------------------
    CLKB    P1 & P3    0     No operation
                       1     B <- D MUX
    CLKBA   P1 & P2    0     No operation
                       1     BA <- BA MUX
    CLKD    P2         0     No operation
                       1     D <- ALU result
    CLKIR   P1 & P3    0     No operation
                       1     IR <- D MUX
    WRH     P1 & P3    0     No operation
                       1     R <- D MUX<15:8>
    WRL     P1 & P3    0     No operation
                       1     R <- D MUX<7:0>

       Table 6 - Register enables.

    Field   Control   Value selected
    --------------------------------
    SBAM       0      ALU result
               1      Read bus
    SDM        0      Read bus
               1      UNIBUS data
               2      D
               3      D<C> concat D<14:0>
    SBMH       0      B<15:8>
               1      B<7> (Sign extend)
               2      B<7:0>
               3      Constant<15:8>
    SBML       0      B<7:0>
               1      B<7:0>
               2      B<15:8>
               3      Constant<7:0>

     Table 7 - Multiplexer selection.

    Field   Control   Register index
    --------------------------------
    RIF               Register immediate field
    SRBA       0      -
               1      BA <3:0>  (Bus address)
    SRD        0      -
               1      IR <2:0>  (Destination field)
    SRI        0      -
               1      RIF <3:0> (Immediate)
    SRS        0      -
               1      IR <8:6>  (Source field)

     Table 8 - Register indexing.

    Field   Timing   Value   Source
    -------------------------------------
    SPS     P1 & P3    0     D MUX
                       1     C
                       2     NZV
                       3     NZVC
                       6     Read bus <- PSW
                       7     PSW <- D MUX

       Table 9 - Processor status word.

Tables 6, 7, 8 and 9 show the set of transfers which are possible
to perform during microinstruction execution. The register enable
fields (Table 6) controls the storage of data into the datapath
registers. The timing for write operations is given. Note that ALU
results are stored on P2 and all other registers may be clocked
on either P1 or P3.

    Field   Control   Operation
    --------------------------------
    SALUM      0      Arithmetic mode
               1      Logical mode
    SALU              ALU function
    DAD               Discrete alteration of data

       Table 10 - Arithmetic and logic.

    Field   Control   Operation
    --------------------------------
    BUS        0      -
               1      DATI (data in)
               2      Await bus busy
               3      DATIP (data in and pause)
               4      -
               5      DATO (data out)
               6      Restart on bus release
               7      DATOB (data out byte)

        Table 11 - UNIBUS control.

UNIBUS operations are summarized in Table 11. The DATI, DATIP,
DATO and DATOB operations initiate a data transfer between
the processor and the UNIBUS. DATIP begins a bus read cycle,
but forces the UNIBUS to pause until a data out transfer is
requested. DATIP permits the construction of uninterruptable
"read-modify-write" transfers that can be used by the operating
system for process synchronization.

    Field   Control   Operation
    --------------------------------
    CLKOFF     0      No op
               1      Turn clock off
    CLK        0      P1 (140 ns)
               2      P2 (200 ns)
               3      P3 (200 and 300 ns)
    UBF<4:0>          Microbranch field
    UPF<7:0>          Next microaddress field

      Table 12 - Clock and microcontrol.

The clock control fields are given in Table 12. The UBF field
controls the sensing of processor and I/O conditions by the
microcontroller. We mentioned earlier that branches are delayed
by one microcycle and the address of the next microword
is contained in the currently executing microinstruction. The
11/40 controller takes the logical OR of the condition selected
by the UBF field with that next address information to form
the true next address value which is sent to the ROM via the
ROM address register. The microinstructions are carefully
assigned to ROM locations such that the condition can be OR'ed
with the appropriate number of zero bits.

The PDP-11/40 documentation served as the basis for the flow
graph notation introduced in Chapter 9. Some examples are included
in Figures 6 through 8. Figure 6 contains the main fetch and
decode routine. The branch logic is tailored for the PDP-11
instruction set and has a separate flow branch for each of
the special cases. In microprogramming, it is important to minimize
the number of decode and branch steps per ISA instruction since
these steps have a relatively high overhead (especially with delayed
branches!) The branch control logic and flow graph are usually
designed together so that the flow graph can be "compressed" as
much as possible. The main branch also dispatches control for front
panel (console), I/O bus (service), exception trap and expansion
(EIS and floating point) operations.

       Figure 6 - Fetch and decode.

    Figure 7 - Return from subroutine.

      Figure 8 - Branch instruction.

Copyright (c) 1987-2013 Paul J. Drongowski