The program latency.c is a clone of the pointer chasing program. It uses the AMR1176 Cycles Counter Register (CCR) to measure the execution time of individual pointer chasing operations (i.e., the reference through the next pointer in a linked list). The program traverses the linked list many times (default: 1000 traversals). Each time it traverses the list, it chooses a chase operation and measures the execution time of the chase operation. The chase operation consists of a single ARM1176 load instruction. The CCR is captured before and after the instruction and the difference between the two times is the execution time (plus ~6 cycles of measurement bias). The execution time is saved in an array. Once all traversals are performed, the array is written to a file named “samples.dat”.
The program histogram.c reads the “samples.dat” file and generates a histogram-like table. The program writes the table to a file named “histogram.txt”. The table shows the distribution of chase operation execution times. Peaks in the distribution correspond to frequently occurring reads to a particular level in the memory hierarchy. The execution times (and distribution) are also affected by MicroTLB and Main TLB misses.
The ARM1176 performance counter kernel module (aprofile) must be loaded before running the latency program. The latency program needs user-space access to the performance counters in order to configure, clear and read the CCR.