Computer Design
A computer aided design and VLSI approach

Paul J. Drongowski

Chapter 4 - An introduction to testing.

Section 1 - Practical considerations.

Any design or production activity must be followed by a
period of evaluation when the functional and performance
characteristics of the system are analyzed in detail.
The results of that analysis will determined if the
system or product is functioning correctly with respect
to its specification. Shortcomings and errors must be
corrected. Several different characteristics may be
evaluated:

  Are the proper results computed?
  Are the results produced within the time allowed?
  Are engineering constraints such as maximum unit cost,
    maximum current consumption, maximum heat dissipation
    and expected mean time between failure satisfied?

Some of these characteristics can be determined through
static analysis. For example, the per unit cost is the
sum of the individual component costs and the expected
cost of assembling the components into a working production
unit.

Demonstrating the correctness of a computing system is
a difficult undertaking. A computing system is usually a very
complex device consisting of many subcomponents which must
correctly operate both individually and together. A VLSI circuit,
for example, may consist of several hundred thousand transistors
and in order for the total system to compute the correct result,
each transistor must function correctly.

System correctness is demonstrated using a combination of two
techniques. "Formal verification" relies upon the definition of
a formal mathematical model for the system and a set of properties
about that model which are of interest to the designers and must
be satisfied by the system. Correctness can be rigorously
demonstrated by proving that the system satisfies those properties.
"Testing" is a dynamic process in which input data and programs
must be constructed to exercise the individual components and
the interactions between those components. The input data is
applied to the model (or physical system) and the results
are examined and evaluated.

Formal verification demonstrates the absence of errors from a
system and gives the designers a very high degree of confidence
that the system will perform correctly. The cost of formal
verification is relatively high because the size of the verification
task often exceeds the magnitude of the design effort itself. Thus,
formal verification is reserved for particularly critical subsystems.
Even for a fully verified hardware systems, errors will still
creep into production units due to mistakes or imperfections in
the fabrication process.

Thus, testing is an essential development and post manufacturing
activity. The design team must show that the system satisfied those
properties or requirements which have not been rigorously verified.
Post production tests be developed to check units as they arrive
from the assembly line.

Unlike formal verification which demonstrates the absence of
errors, testing can only show the presence of a particular bug
or flaw. A system must be tested exhaustively (all data values
for all instructions anywhere in memory!) before it can be called
"bug free." Clearly, the time and budget allocated for design and
check-out imposes severe limitations on the completeness of
system testing. Bennetts gives the following definition of
testability:

  A circuit is "testable" if a set of test patterns can be
  generated and applied in such a way as to satisfy pre-defined
  levels of performance, defined in terms of fault-detection,
  fault-isolation, and test application criteria with a
  pre-defined cost budget and time scale.

We will return to this idea in the next section.

Design engineers are optimistic individuals by nature. Why else
would anyone attack the design of system and make such personal
sacrifices as 60 or 70 hour work weeks? It is their job to make
the system work and to get it out of the door onto the delivery
trucks. Testing, however, requires a pessimistic attitude. A
good test engineer delights in making the system break. With
personal and professional goals at odds, it is sensible to create
separate design and test teams. The corporation should recognize
and reward the efforts of the test team even though their product
(quality) is intangible.

Section 2 - Kinds of testing.

The rest of this chapter introduces ideas and terminology in testing.
The development and continued support of a product requires three
different kinds of testing.

  * Debugging. The system must be debugged during design
    and development.
  * Quality assurance (QA.) Defective production units must
    be found and rejected (or repaired, if possible.)
  * Field maintenance. Problems with installed units must
    be identified and repaired.

Each kind of testing has different practical requirements.
Debugging requires a considerable amount of interaction with the
system as the engineers try to identify and correct conceptual
flaws in the design. Rapid speed is not as important as the
ability to probe and perturb the system state. Speed is more
important in quality assurance testing especially in high volume
applications where many parts must be checked in a short period
of time. The primary purpose of field maintenance testing is
the eventual and timely repair of a customer system. The time to
repair will be greatly affected by the field engineer's ability
to quickly identify and replace a failed component (or module.)

Section 2 - Faults, failure modes and experiments.

In order to formulate a test strategy, the engineering team
must first identify the ways in which a system may fail,
the system failure modes. Division by zero and non-terminating
(infinite) loops are two ways in which a software system may
fail. Example hardware failures include short and open circuits,
spurious switching due to noise and degraded component
performance (e.g., amplifier gain is too low.) When a failure
occurs, the error is called a "fault."

The classic model for hardware logic faults are the so-called
"stuck at zero" and "stuck at one" faults. (These faults are
abbreviated "s-a-0" and "s-a-1.") A stuck at zero fault occurs
when the expected output of a logic gate is one and a zero is
observed instead. If an output remains at one when a zero
value was expected, then a stuck at one fault has occurred.
Failures other than s-a-0 and s-a-1 may be expected in VLSI
circuits due to timing or fabrication flaws. Stuck-at faults
are the most widely used, however, because they are easily
characterized and cover a wide range of failures.

A single fault may be the cause of a system failure, or more
likely, multiple faults will occur together. It is generally
easier to analyze a system for single faults than multiple
faults since the interactions between the multiple fault are
complex. For example, one fault can hide the presence of
another fault.

The testing process consists of a series of well-design
experiments. Each experiment is design to expose a particular
fault or set of faults. Two kinds of test experiments may
be performed. In a "fault detection" experiment, mere wants
to test for the presence of a fault. Fault detection experiments
are useful for quality assurance testing where repair is
impossible (defective parts are discarded.) "Fault isolation"
experiments detect the presence of a fault and also identify
its location. Debugging and field maintenance testing require
a fault isolation capability. The design team must find and
isolate conceptual design errors. The maintenance technician
must isolate a fault to a particular unit which can be repaired
or placed.

Testing is performed by attaching the part (board or system) to
a physical "test fixture." The fixture has probes or connectors
that are compatible with the physical package of the part under
test. For example, an integrated circuit tester will have a
zero-insertion force socket to hold the IC under test and to
drive and read the IC pins. A board tester will have an edge
connector or a set of probes called a "bed of nails." The board
is placed in contact with the bed of nails and signals are
read and written through the nails. Inputs that can be driven
and output signals that can be sensed are called the "primary
inputs" and "primary outputs," respectively.

Section 3 - Test activities.

The testing process consists of three different, but closely
related activities: test generate, evaluation and application.

The objective of the test generation activity is to produce a
set of input data called test "patterns" or "vectors" which will
exercise the circuit. Two keys factors are involved: controllability
and observability. To properly exercise a circuit, the
primary inputs, internal logic and primary outputs must be
under the control of the tester to supply test patterns to
the circuit under test, to evoke the operation which is to
be checked, and to acquire the results of the experiment.
Experimental results can only be reported to the tester if
they are observable. Components that cannot be directly
controlled or have subcircuits with limited observability
are difficult to test. As we will see with the SP.1, a test
engineer does not always have absolute control or observability.

Test generation is very dependent upon the availability of
a good system specification. If the system specification is
ambiguous, incomplete or incorrect, the test patterns may
themselves not be correct or complete. Next, the volume of
test data required to check a VLSI-sized computer system is
very large. Computer aids are almost a necessity for testing
large systems not only to assist the data management task,
but to actually produce test patterns from the specification
and a description of the system structure. Finally, there
may be failure modes which are unknown to the test team and
thus, erroneous behavior that will escape undetected.

Once a series of test experiments have been devised and the
test patterns generated, the quality of the testing procedure
must be graded. Test quality is evaluated through fault
insertion experiments. Faults are artificially placed into
a known good device (KGD), known good board (KGB) or simulation
model and the test patterns are applied. The values at the
primary outputs are compared with the correct results. If the
fault is found (the outputs are different), then it is "covered"
by the test data. The percentage of the circuit which is
testable is a measure called the "fault coverage" and it is an
indication of the completeness of the testing process. Common
industrial practice requires at least 90 percent coverage
before an integrated circuit will be accepted for fabrication
(even undebugged prototypes.) Since testing is constrained by
the practical concerns of schedule and budget, the ultimate
goal of the test generation and evaluation activities is the
minimum set of test patterns which will guarantee a particular
level of fault coverage.

The test patterns must ultimately be applied to the real
circuit using highly specialized automatic test equipment
or "ATE." Test application is limited by the maximum rate
at which tests can be performed, the ability of the ATE
to catch signal pulses (very short pulses may escape the
attention of the tester) and physical access to the device
under test.

Copyright (c) 1987-2013 Paul J. Drongowski