Computer Design A computer aided design and VLSI approach Paul J. Drongowski Chapter 4 - An introduction to testing. Section 1 - Practical considerations. Any design or production activity must be followed by a period of evaluation when the functional and performance characteristics of the system are analyzed in detail. The results of that analysis will determined if the system or product is functioning correctly with respect to its specification. Shortcomings and errors must be corrected. Several different characteristics may be evaluated: Are the proper results computed? Are the results produced within the time allowed? Are engineering constraints such as maximum unit cost, maximum current consumption, maximum heat dissipation and expected mean time between failure satisfied? Some of these characteristics can be determined through static analysis. For example, the per unit cost is the sum of the individual component costs and the expected cost of assembling the components into a working production unit. Demonstrating the correctness of a computing system is a difficult undertaking. A computing system is usually a very complex device consisting of many subcomponents which must correctly operate both individually and together. A VLSI circuit, for example, may consist of several hundred thousand transistors and in order for the total system to compute the correct result, each transistor must function correctly. System correctness is demonstrated using a combination of two techniques. "Formal verification" relies upon the definition of a formal mathematical model for the system and a set of properties about that model which are of interest to the designers and must be satisfied by the system. Correctness can be rigorously demonstrated by proving that the system satisfies those properties. "Testing" is a dynamic process in which input data and programs must be constructed to exercise the individual components and the interactions between those components. The input data is applied to the model (or physical system) and the results are examined and evaluated. Formal verification demonstrates the absence of errors from a system and gives the designers a very high degree of confidence that the system will perform correctly. The cost of formal verification is relatively high because the size of the verification task often exceeds the magnitude of the design effort itself. Thus, formal verification is reserved for particularly critical subsystems. Even for a fully verified hardware systems, errors will still creep into production units due to mistakes or imperfections in the fabrication process. Thus, testing is an essential development and post manufacturing activity. The design team must show that the system satisfied those properties or requirements which have not been rigorously verified. Post production tests be developed to check units as they arrive from the assembly line. Unlike formal verification which demonstrates the absence of errors, testing can only show the presence of a particular bug or flaw. A system must be tested exhaustively (all data values for all instructions anywhere in memory!) before it can be called "bug free." Clearly, the time and budget allocated for design and check-out imposes severe limitations on the completeness of system testing. Bennetts gives the following definition of testability: A circuit is "testable" if a set of test patterns can be generated and applied in such a way as to satisfy pre-defined levels of performance, defined in terms of fault-detection, fault-isolation, and test application criteria with a pre-defined cost budget and time scale. We will return to this idea in the next section. Design engineers are optimistic individuals by nature. Why else would anyone attack the design of system and make such personal sacrifices as 60 or 70 hour work weeks? It is their job to make the system work and to get it out of the door onto the delivery trucks. Testing, however, requires a pessimistic attitude. A good test engineer delights in making the system break. With personal and professional goals at odds, it is sensible to create separate design and test teams. The corporation should recognize and reward the efforts of the test team even though their product (quality) is intangible. Section 2 - Kinds of testing. The rest of this chapter introduces ideas and terminology in testing. The development and continued support of a product requires three different kinds of testing. * Debugging. The system must be debugged during design and development. * Quality assurance (QA.) Defective production units must be found and rejected (or repaired, if possible.) * Field maintenance. Problems with installed units must be identified and repaired. Each kind of testing has different practical requirements. Debugging requires a considerable amount of interaction with the system as the engineers try to identify and correct conceptual flaws in the design. Rapid speed is not as important as the ability to probe and perturb the system state. Speed is more important in quality assurance testing especially in high volume applications where many parts must be checked in a short period of time. The primary purpose of field maintenance testing is the eventual and timely repair of a customer system. The time to repair will be greatly affected by the field engineer's ability to quickly identify and replace a failed component (or module.) Section 2 - Faults, failure modes and experiments. In order to formulate a test strategy, the engineering team must first identify the ways in which a system may fail, the system failure modes. Division by zero and non-terminating (infinite) loops are two ways in which a software system may fail. Example hardware failures include short and open circuits, spurious switching due to noise and degraded component performance (e.g., amplifier gain is too low.) When a failure occurs, the error is called a "fault." The classic model for hardware logic faults are the so-called "stuck at zero" and "stuck at one" faults. (These faults are abbreviated "s-a-0" and "s-a-1.") A stuck at zero fault occurs when the expected output of a logic gate is one and a zero is observed instead. If an output remains at one when a zero value was expected, then a stuck at one fault has occurred. Failures other than s-a-0 and s-a-1 may be expected in VLSI circuits due to timing or fabrication flaws. Stuck-at faults are the most widely used, however, because they are easily characterized and cover a wide range of failures. A single fault may be the cause of a system failure, or more likely, multiple faults will occur together. It is generally easier to analyze a system for single faults than multiple faults since the interactions between the multiple fault are complex. For example, one fault can hide the presence of another fault. The testing process consists of a series of well-design experiments. Each experiment is design to expose a particular fault or set of faults. Two kinds of test experiments may be performed. In a "fault detection" experiment, mere wants to test for the presence of a fault. Fault detection experiments are useful for quality assurance testing where repair is impossible (defective parts are discarded.) "Fault isolation" experiments detect the presence of a fault and also identify its location. Debugging and field maintenance testing require a fault isolation capability. The design team must find and isolate conceptual design errors. The maintenance technician must isolate a fault to a particular unit which can be repaired or placed. Testing is performed by attaching the part (board or system) to a physical "test fixture." The fixture has probes or connectors that are compatible with the physical package of the part under test. For example, an integrated circuit tester will have a zero-insertion force socket to hold the IC under test and to drive and read the IC pins. A board tester will have an edge connector or a set of probes called a "bed of nails." The board is placed in contact with the bed of nails and signals are read and written through the nails. Inputs that can be driven and output signals that can be sensed are called the "primary inputs" and "primary outputs," respectively. Section 3 - Test activities. The testing process consists of three different, but closely related activities: test generate, evaluation and application. The objective of the test generation activity is to produce a set of input data called test "patterns" or "vectors" which will exercise the circuit. Two keys factors are involved: controllability and observability. To properly exercise a circuit, the primary inputs, internal logic and primary outputs must be under the control of the tester to supply test patterns to the circuit under test, to evoke the operation which is to be checked, and to acquire the results of the experiment. Experimental results can only be reported to the tester if they are observable. Components that cannot be directly controlled or have subcircuits with limited observability are difficult to test. As we will see with the SP.1, a test engineer does not always have absolute control or observability. Test generation is very dependent upon the availability of a good system specification. If the system specification is ambiguous, incomplete or incorrect, the test patterns may themselves not be correct or complete. Next, the volume of test data required to check a VLSI-sized computer system is very large. Computer aids are almost a necessity for testing large systems not only to assist the data management task, but to actually produce test patterns from the specification and a description of the system structure. Finally, there may be failure modes which are unknown to the test team and thus, erroneous behavior that will escape undetected. Once a series of test experiments have been devised and the test patterns generated, the quality of the testing procedure must be graded. Test quality is evaluated through fault insertion experiments. Faults are artificially placed into a known good device (KGD), known good board (KGB) or simulation model and the test patterns are applied. The values at the primary outputs are compared with the correct results. If the fault is found (the outputs are different), then it is "covered" by the test data. The percentage of the circuit which is testable is a measure called the "fault coverage" and it is an indication of the completeness of the testing process. Common industrial practice requires at least 90 percent coverage before an integrated circuit will be accepted for fabrication (even undebugged prototypes.) Since testing is constrained by the practical concerns of schedule and budget, the ultimate goal of the test generation and evaluation activities is the minimum set of test patterns which will guarantee a particular level of fault coverage. The test patterns must ultimately be applied to the real circuit using highly specialized automatic test equipment or "ATE." Test application is limited by the maximum rate at which tests can be performed, the ability of the ATE to catch signal pulses (very short pulses may escape the attention of the tester) and physical access to the device under test. Copyright (c) 1987-2013 Paul J. Drongowski