Performance counter kernel module

As promised, I’ve described the design of a Linux loadable kernel module that allows user-space access to the Raspberry Pi (ARM 1176) performance counters. By the way, the design of the module is not specific to Raspbian Wheezy or even the Raspberry Pi for that matter. I believe that the kernel module could be used on the new Beagleboard Black (BBB) to enable user-space counter access on its ARM Cortex-A8 processor under Linux. I just ordered a BBB and will try out the code when possible. (Assuming quick delivery!)

The kernel module alone isn’t enough to measure performance events. In fact, the kernel module doesn’t even touch the counters. It merely flips a privileged hardware bit which lets user-space programs read and write the performance counters and control register. So, I have also written a few user-space C functions to configure, clear, start and stop the performance counters. An application program just needs to call a few functions to choose the events to be measured and start counting, to stop counting, to get the raw counts, and to print the event counts.

I have uploaded the source for both the kernel module (aprof.c) and the user-space functions (rpi_pmu.h and rpi_pmu.c). In addition, there is source for some utility functions that I like to use in benchmark programs (test_common.h and test_common.c). All of this is a work in progress and I will update the source when major enhancements or changes are made.

Speaking of source, I have found a way of organizing and storing source code through WordPress. WordPress is kind of security paranoid and doesn’t allow you to upload source code or even gzip’ed TAR files. I ran into this issue when I attempted to upload a make file and WordPress wouldn’t let me do it (with complaints about potentially malicious code and so forth). WordPress does let you post source for viewing, however.

So, I’ve added a Source menu item to the main menu. I want the menu structure below the Source item to operate like a browsable code repository. The first level of items below Source are projects, like the kernel module. The next level of menu items navigate into the source belonging to a project. Each make file and source file is a separate page. The source code is displayed using the SyntaxHighligher plug-in in order to keep indentation. No other formatting or highlighting is done just to keep things simple. I could cut and paste code from these pages, so I hope you can, too!

An introduction to performance tuning (and counters)

My latest page is an overview of performance tuning on ARM11. The Raspberry Pi is a nifty little Linux box, but it’s kind of slow at 700MHz. Therefore, I suspect that programmers will have an interest in tuning up application programs and making them run faster. Performance tuning is also a good opportunity to learn more about computer architecture and machine organization, especially the ARM1176 core at the heart of the Raspberry Pi and its memory subsystem.

The ARM1176 has three performance counters which can measure over 20 different microarchitectural events. One of these counters is dedicated to core clock cycles while the other two are configurable. The new performance tuning page has a brief overview of the counters and it has a table with the supported events.

The new page also describes two different use cases for the counters: caliper mode and sampling mode. Caliper mode counts the number of microarchitectural hardware events that occur between two different points in program execution. Caliper mode is good for measuring the number of data cache accesses and misses for a hot code region like a loop. The programmer inserts code to start counting at the beginning of the hot region and inserts code to stop counting at the end of the hot region. This is the easiest use case to visualize and to implement. It’s the approach that I’m taking with my first performance measurement software and experiments (a custom kernel module plus some user-space code). These experiments are almost finished and ready for write up.

Sampling is a statistical technique that produces an event profile. A profile shows the distribution of events across program instructions, routines, source lines, or modules. This is a good way to find hot-spots in a program where tuning is most beneficial. Sampling does not require modification to source.

Performance Events for Linux (informally called “PERF”) is the standard tool for program profiling on Linux. At the moment, PERF has a bug which prevents it from sampling hardware events. I’ve been looking into this problem, too, and hope to post some results. In the long-run, I want to post examples using PERF in order to help people tune up their programs on Raspberry Pi.

Building a loadable kernel module

I needed to design, implement and build a loadable kernel module in order to access the ARM11 performance counters from user-space. I will slowly roll out the design and code for the kernel module. But, first, I’ve posted my notes for building a loadable kernel module. It’s easier to explain the process of building the module and the internal design of the module if I separate the two discussions.

There were a few problems along the way. The kernel source for Raspbian Wheezy 3.6.11+ is not available using Synaptic. Only 3.6.9 is available through Synaptic. I needed to download the source for 3.6.11+ from github.com in order to match the installed Linux image. Next, I needed the module version information for 3.6.11+. Usually this information is built along with the kernel and is stored in the file named Module.symvers. Raspbian Wheezy takes 10+ hours to build on the Raspberry Pi, according to reports on the Web, so I didn’t want to undertake a long-running kernel build just to generate the version information. Fortunately, I could download Module.symvers from github.com, too.

I hope the next Raspbian Wheezy distro has all of the essentials for a module build — headers, version information, the whole she-bang. This would really help a brother out because many people are building custom hardware for the RPi and they need to build custom device drivers, too.

I’m currently in the process of writing pages on performance monitoring on the RPi. That discussion will include the design and code for my kernel module. The kernel module is about as simple as can be and is a kind of “Hello world” example. Please stay tuned.

ARM11 microarchitecture

You probably know by now that the Raspberry Pi uses an ARM processor. In particular, the Raspberry Pi model B uses the Broadcom BCM2835 system on a chip (SoC). The BCM2835 is a member of the ARM11 family. Its name is the ARM1176JZF-S. (Whew!)

Like all computers, the BCM2835 has an internal processor structure called its “microarchitecture”. The word “architecture” refers to the machine features that are visible to a programmer — things like the instruction set. The microarchitecture refers to the building blocks in the guts of the machine, or more properly, in the guts of a specific implementation (BCM2835) of an architectural family (ARM11 or ARMv6).

The microarchitecture can have a big effect on program performance. Compiler writers, for example, study the microarchitecture in order to build compilers that generate the best possible code for the microarchitecture. As we’ll see in later posts, application programmers can also take steps to tune their programs for the underlying microarchitecture. Tuning is important on Raspberry Pi because at 700 MHz, this machine is running its heart out!

Today, I added a page that summarizes the characteristics of the BCM2835 (ARM11) microarchitecture. Please check out the info! We will revisit this page when I discuss profiling and tuning.

Raspberry Pi tips and tricks

If you’re new to the world of Raspberry Pi, you should check out the page about Raspberry Pi tips and tricks. The page has all of the things that I did to configure Raspberry Pi and get started with the operating system, Raspbian Wheezy.

One of the most important things you should do is to configure the operating system for your geographic location or “locale”. The Raspbian Wheezy image is configured for the United Kingdom (UK). Therefore, the operating system and other software formats the data, time, and money for the UK. If you’re living somewhere else, you’ll want to set things to local convention. That’s where the locale comes into play.

Run the raspi-config program to change settings:

sudo raspi-config

The program implements a simple quasi-GUI where you navigate using the arrow keys. The space bar makes selections and the enter key confirms selections, etc.

Of course, you should set the timezone using the change_timezone option. Use the change_locale option to set your locale.

You would think that change_locale would set the keymap for your keyboard, but no! You need to change the keymap (keyboard layout) using the configure_keyboard option. If you type a key like “|” and you get a different character, you probably need to change the keymap to match your keyboard. Raspian Wheezy will build the new keymap when it reboots the next time (a step that takes a little time to execute).

Finally, you should know how to properly shut down the operating system and Raspberry Pi. I use the shutdown program:

sudo shutdown -h now

The -h option tells the processor to halt after shutdown, allowing you to safely pull the adapter plug from the wall. Use the -r option if you want to reboot instead. Don’t forget to do this since the OS may need to update information in permanent storage, thereby avoiding file system corruption.

Hey, don’t forget to have fun with your new system!

What’s this all about?

Electronics and computing are great hobbies. You can even make a career out of them!

I’m a computer scientist and engineer who is exploring Raspberry Pi, Arduino, Papilio and other terrific, low-cost toys. I’m hoping to share my knowledge and discoveries as I go along so that you can benefit from my experience, too. Students and educators are especially welcome! In addition to new information and activities, I will eventually post classroom material about data structures, computer architecture, and computer design which I used when teaching these subjects.

Thanks for visiting this site and I hope that you will check back often for updates. For the moment, comments are closed. Once I gain a little more expertise with WordPress, I will open the site for comments. Thanks, again.