PERF tutorial part 2 now available

Part 2 of a three part tutorial about Linux-tools PERF is now available.

Part 1 of the series shows how to find hot execution spots in an application program. It demonstrates the basic PERF commands using software performance events such as CPU clock ticks and page faults.

Part 2 of the series — just released — introduces hardware performance counters and events. I show how to count hardware events with PERF and how to compute and apply a few basic derived measurements (e.g., instructions per cycle, cache miss rate) for analysis. Part 3 is in development and will show how to use sampling to profile a program and to isolate performance issues in code.

All three parts of the series use the same simple, easy to understand example: matrix multiplication. One version of the matrix multiplication program illustrates the impact of severe performance issues and what to look for in PERF measurements. The issues are mitigated in the second, improved version of the program. PERF measurements for the improved program are presented for comparison.

The test platform is the latest second generation Raspberry Pi 2 running Raspbian Wheezy 3.18.9-v7+. The Raspberry Pi 2 has a 900MHz quad-core ARM Cortex-A7 (ARMv7) processor with 1GByte of primary memory. Although the tutorial series demonstrates PERF on Cortex-A7, the same PERF commands and analytical techniques can be employed on other architectures like x86.

A special note for Raspberry Pi users. The current stable distribution of Raspbian Wheezy — 3.18.7-v7+ February 2015 — does not support PERF hardware events. Full PERF support was enabled in a later, intermediate release and full PERF support should be available in the next stable release of Raspbian Wheezy. In the meantime, Raspberry Pi 2 users may profile their programs using PERF software events as shown in Part 1 of the tutorial. First generation Raspberry Pi users are also restricted to software performance events.

Brave souls may try rpi-update to upgrade to the latest and possibly unstable release. I recommend waiting for the next stable release unless you really, really know what you are doing and are willing to chance an unstable kernel with potentially catastrophic consequences.

RPi2: Work in progress 1

Here’s a quick status update on working with Raspberry Pi gen 2. The installed operating system is Raspbian Wheezy 3.18.7-v7+ built on 16 February 2015.

I’m happy to report that I could profile programs using PERF software events. I’m disappointed to report that PERF does not recognize any hardware (performance counter) events. This distro has Linux-tools-3.2 installed. I uninstalled 3.2 and installed 3.18 which matches the kernel:

sudo apt-get remove Linux-tools-3.2
sudo apt-get install Linux-tools-3.18

Still no joy when attempting to use hardware events. If you want to profile your program using PERF software events, please see my current PERF tutorial about finding execution hot-spots. I tried all of the commands and, with the exception of one typo, everything still works!

I’m in the process of troubleshooting my loadable kernel module for user-space performance counter events. I’ve encountered many of the same old stumbling blocks (e.g., finding the correct headers and Module.symvers file). At the present time, the kernel will attempt to load the module, then die. I cannot tell at this stage if there is a problem in the module itself or if there is a bug in Raspbian Wheezy. In case you want to dive into module development yourself, I’ve started a permanent page for building kernel modules on RPi2.

Once again, after two+ years, I want to make a public plea for more open information about the underlying hardware and for guidance and support for end-user device driver development. Quite frankly, Broadcom plays this situation too close to the chest, especially for a computer that’s advertised as a vehicle for learning and education. The dearth of information is stifling. People still struggle to identify and download essential information (e.g., Module.symvers) for device driver development. This is not true of other major Linux distros and the Raspbian folks really need to take note! Broadcom, in particular, runs the risk of killing off the goose laying the golden eggs.

Before signing off, here is a quick PERF command cheat sheet. I recommend reading the tutorial, but if you really must peck away at the keyboard… All the best!

perf help
perf list
perf stat -e cpu-clock ./program
perf record -e cpu-clock ./program
perf record -e cpu-clock,faults .program
perf report
perf report --stdio --sort comm,dso --header
perf report --stdio --dsos=program,libc-2.13.so
perf annotate --stdio --dsos=program --symbol=function
perf annotate --stdio --dsos=program --symbol=function --no-source
perf record -e cpu-clock --freq=8000 ./program
perf evlist -F

Replace “program” with the name of your application program and replace “function” with the name of a function in your program.

Second generation RPi is here

The second generation Raspberry Pi (RPi2) is now shipping in large quantities! Given the excitement on the Web, this machine should be at least as popular as its first generation parents. Although the RPi2 model B has the same overall form factor as the first generation model B+, the designers made two substantial improvements which make the RPi2 a contender for your desktop:

  1. The single core Broadcom BCM2835 is replaced by the quad core BCM2836.
  2. Primary memory is increased to 1GByte of LPDDR2 RAM.

That’s just the face of it. Not only does the BCM2836 have four processor cores instead of one core, the cores are based on the ARMv7 architecture (Coretx-A7) including the NEON single instruction, multiple data (SIMD) instructions. The clock frequency is increased to 900MHz (from 700MHz). I’ve already begun to explore the ARMv7 micro-architecture and plan to write up a short, concise summary of its performance-related characteristics.

The BCM2836 has a different memory controller. Primary memory is no longer implemented using the Package on Package (PoP) approach. The Elpida (Micron) B8132C4PB-8D-F memory chip is mounted on the bottom of the RPi2 board (instead of the PoP piggyback).

The RPi2 sold out at Sparkfun almost immediately. Fortunately, Canakit, Element14 and Microcenter have received shipments, too. Amazon advertised the Canakit Raspberry Pi 2 Ultimate Starter Kit at a very attractive price and I immediately bought a kit. Microcenter in Cambridge had a mound of RPi2s and impatience took the best of me — I bought one. Yes, after getting the mail, I now have two.

I copied the latest Raspbian Wheezy release (16 February 2015) to a 16MByte microSD card using Win32DiskImager. The Canakit ships with NOOBS on an 8GByte card and I hope to try and report about NOOBS later. There was a little drama while bringing up Raspbian Wheezy as some relatively small, but annoying problems did crop up. Once I got past the sand traps, the new RPi2 proved to be an able performer.

Today, I copied my test software over to the RPi2. Here is a quick comparison between the older RPi model B and the new RPi2.

Platform Naïve MM Interchange MM
RPi model B gen 1 18.67 sec 6.75 sec
RPi gen 2 3.15 sec 2.42 sec

The two test cases are the naïve matrix multiplication program and the loop nest interchange matrix multiplication program. (Get the code in the source section of the web site.) Yes, that is a 6x improvement in performance for the naïve case. It’ll be fun to explore and find the reasons behind the speed-up. Fast matrix multiplication depends upon memory bandwidth and there must be some significant improvements in the memory subsystem. Naïve matrix multiplication incurs a lot of translation lookaside buffer (TLB) misses, so improvements in TLB miss handling could also contribute to the speed-up in the naïve test case.

I ditched the Epiphany Web browser as it seems to have significant bugs. The browser crashed repeatedly when loading the New York Times front page. This is unacceptable. I installed Midori, which came with the initial release of Raspbian Wheezy. The New York Times front page is a bit of a torture test. Midori loaded the page in less time than the RPi gen 1, but still felt slow and logy. I suspect that many applications will need to be compiled for ARMv7 before we end-users get the full benefit of the BCM2836. The initial result, however, is encouraging.

Well, I’ve started to reorganize the site’s menu structure in order to get ready for new content about the RPi2. I intend to retain the older articles as they remain quite relevant. More to come!

Send MIDI from USB-B to 5-pin

Please see the bottom of this page for an update.

Way back in January 2014, I outlined a way to send MIDI from a USB-B only controller to a keyboard or module with classic 5-pin MIDI using Raspberry Pi as a bridge. Finally, one year later, I got to try out this idea.

It seems like MIDI over USB has taken over the MIDI controller world!

New controllers now communicate MIDI data over USB instead of using the old 5-pin DIN interface. 5-pin MIDI is dirt simple and is just a faster form of plain old serial communication — no bus protocol, no host/client, no hassles.

The world was 5-pin MIDI for a long time and many classic synthesizers and workstations only have a 5-pin DIN interface. Most of the new controllers have only a USB-B device port and expect to be connected to a USB-A host port for power and communication. If you want to use your new controller with an old 5-pin MIDI synth, you have a communication gap to bridge. Because USB is a peripheral bus with a sophisticated protocol, USB cannot be directly connected/converted to simple 5-pin MIDI signals.

There are two ways to bridge the gap:

  1. Buy a bridge box like the Kenton MIDI USB Host (about $115USD) or iConnectivity iConnectMIDI4+ ($200USD).
  2. Use a PC-based DAW to bridge 5-pin MIDI ports and USB MIDI ports.

Both solutions involve software, a computer, a 5-pin MIDI IN/OUT interface, and a USB-A Host interface. The old synth (or whatever) is connected to the computer through the 5-pin MIDI IN/OUT interface and the controller is connected to the USB-A Host port. The software streams the MIDI data between the 5-pin and USB worlds.

The Kenton is portable, but is a little bit pricey for my taste. Also, the Kenton is not readily available in all parts of the world (e.g., the USA) and shipping is expensive. The PC-based bridge is not so portable and maybe you don’t want to take a laptop to the gig.

Hmmm, let’s see. Computer? USB Host interface? Software? Raspberry Pi!

The Rapsberry Pi B+ would be the ideal model with its four USB Host (A) ports. From the hardware perspective, here’s what we need to do:

  • Connect the USB MIDI controller to one of the Raspberry Pi USB-A Host ports.
  • Connect a bog standard 5-pin MIDI to USB-A interface to one of the other USB Host ports.
  • Connect the 5-pin MIDI IN/OUT ports on the interface to the appropriate 5-pin MIDI ports on the old synth.

This is exactly how we would connect the controller and synth if we used the PC and the DAW except we have replaced the PC with the Raspberry Pi (much smaller and only $40USD).

For software, the Raspbian Linux operating system comes with ALSA audio and MIDI support. We need to use the ALSA aconnect utility to identify the incoming and outgoing MIDI ports and to connect the appropriate ports.

I wanted to try this approach without buying any new hardware. Unfortunately, my Raspberry Pi is the earlier model B with only two USB-A Host ports. I need at least one more port to connect a keyboard and mouse, so a hub has to enter the picture somewhere. I found that the ALSA software did not recognize the controller or MIDI interface through my cheapo non-powered USB hub. Please keep this possible limitation in mind during your own experiments.

Here’s my test set-up. The keyboard controller is an M-Audio Keystation Mini 32. I used an Apple keyboard and mouse for regular user I/O. The Apple keyboard has a built-in hub and adds two USB-A ports. The keyboard is connected to the Raspberry Pi and the mouse is connected to one of the keyboard USB-A ports. The Keystation is connected to the second USB-A port on the Apple keyboard through a USB-A to USB-B cable.

USB-B to 5-pin MIDI connection diagram

The 5-pin MIDI IN/OUT ports are provided by a Roland (Edirol) UM-2ex USB MIDI interface. This interface is connected to one of the Raspberry Pi USB-A Host ports. The UM-2ex has a switch to select either the standard driver or an advanced proprietary driver. Select the standard driver setting. You want to be “class compliant” all the way for best results. Connect the 5-pin MIDI IN/OUT ports to the synth using standard MIDI cables. For this test, the synth is a Yamaha PSR-S950 arranger workstation.

Boot Raspbian and log in. You can either run aconnect from the initial shell or you can start the X Windows systems. For this example, I chose to start X Windows so I could capture output from aconnect.

Type “aconnect -i” to display a list of the readable input ports. These are the ports which provide incoming MIDI data.

$ sudo aconnect -i
client 0: 'System' [type=kernel]
    0 'Timer           '
    1 'Announce        '
client 14: 'Midi Through' [type=kernel]
    0 'Midi Through Port-0'
client 20: 'UM-2' [type=kernel]
    0 'UM-2 MIDI 1     '
client 24: 'Keystation Mini 32' [type=kernel]
    0 'Keystation Mini 32 MIDI 1'

The “$” charcter in the example output is the shell command prompt. The output shows the UM-2ex MIDI input port (client 20) and the Keystation (client 24). The Keystation input port is the source of the MIDI data that we want to send to the synth.

By the way, sudo is required when entering these commands through X Windows as an ordinary user. Superuser privilege is needed to set up ALSA connections.

Type “aconnect -o” to display a list of the writeable output ports. These are the ports which send outgoing MIDI data.

$ sudo aconnect -o
client 14: 'Midi Through' [type=kernel]
    0 'Midi Through Port-0'
client 20: 'UM-2' [type=kernel]
    0 'UM-2 MIDI 1     '
    1 'UM-2 MIDI 2     '
client 24: 'Keystation Mini 32' [type=kernel]
    0 'Keystation Mini 32 MIDI 1'

The output shows the two UM-2ex MIDI OUT ports (client 20 ports 0 and 1) and the Keystation (client 24).

Finally, type “aconnect 24:0 20:0” to establish a bridge between the Keystation port and the UM-2ex MIDI OUT port.

sudo aconnect 24:0 20:0

The first port (24:0) on the command line is the sender and the second port (20:0) is the receiver.

Play a few notes on the controller. If the controller and synth are communicating on the same MIDI channel (usually channel 1 by default), then you should hear some sound from the synth — assuming that the volume is turned up and the synth is connected to an amp, and so forth!

Type “aconnect -x” to remove all connections when finished. Individual connections can be removed using the -d option. If you ever need usage information about aconnect, just type “aconnect -h” or “man aconnect”.

Just for fun, I tried using the Yamaha PSR-E443 for input instead of the Keystation. I replaced the Apple keyboard hub with a powered USB hub, too. (Apple keyboards and Linux don’t always interoperate as friends!) The PSR-E443 keyboard sends on MIDI channels 1 (main voice), 2 (dual voice) and 3 (split voice). By assigning these MIDI channels to RIGHT1, RIGHT2 and LEFT on the S950, I could play a layer in the right hand with a split bass in the left hand.

So, there you go! A simple, cheap bridge between a USB MIDI controller and an old school 5-pin MIDI synthesizer. The next step is to find a way to discover and connect MIDI ports through a boot time, start-up script. If you solve this problem, please post the solution!

Update:If you enjoyed this blog post, then you might like these articles, too:

RPi MIDI bridge

[Update: See Send MIDI from USB-B to 5-pin.]

Here’s a vexing problem that many electronic musicians face.

Let’s say that you own a lot of gear, some of which uses the old school 5-pin DIN MIDI interface. For example, there are a ton of classic (and not so classic) tone modules and keyboards that have 5-pin MIDI IN and MIDI OUT ports.

Then, you buy a new mobile MIDI controller which only does MIDI over USB through a USB B device port. The M-Audio Keystation Mini 32 is an example. This design covers the most common case — hooking the controller to a computer having a USB A host port — but you can’t connect the controller directly to the 5-pin MIDI IN port on one of your old tone modules or keyboards. USB ain’t RS-232 and class-compliant MIDI over USB has its own protocols, too. So, you can’t just whip up a simple cross-over cable or signal converter.

There are two commercial solutions to this problem: the Kenton USB MIDI host and the iConnectivity iConnectMIDI4+. Neither of these solutions is cheap and they cost more than a lot of MIDI controllers themselves!

Some people on the web have suggested an Arduino-based solution. However, here’s an easy riddle. What super low cost single-board computer has two USB host ports? Answer: The Raspberry Pi Model B.

The RPi Model B seems like a natural for this problem. It’s inexpensive, it has the necessary ports, and there are plenty of rugged cases available. Musicians will want to use this solution at the gig, so a good case is essential. There are two issues. First, the RPi can source only a limited amount of power to a USB device. Some MIDI controllers may draw too much current. Second, musicians don’t like to haul extra gear to a gig, so they won’t want to take a display and keyboard to a gig just to boot the RPi and run the software needed to bridge the two USB A ports. The solution must be stand-alone, plug-and-play, and consist only of the RPi itself, a power source, and a few cables.

Here’s what I have in mind for the hardware. The MIDI controller is connected to the RPi using a standard USB A to USB B cable. The MIDI controller draws power from the RPi. Some MIDI controllers have a dedicated power supply jack and in that case, a separate power adapter for the MIDI controller seems prudent. The other USB host port on the RPi is connected to an inexpensive commercial USB to 5-pin MIDI interface — the kind used to connect 5-pin equipment to computers. The commercial interface should be MIDI class-compliant and should not require special drivers. Knowing the state of the world such as it is, you may not easily find proprietary Linux drivers for the interface. The commercial MIDI interface provides the connection to the 5-pin DIN MIDI ports on your old piece of gear.

Musicians usually have an old USB MIDI interface like the Edirol/Roland UM-2EX in the studio. These interfaces are readily available at very low cost on the web for not much more dosh than a cable. This approach doesn’t require custom hardware or shields like an Arduino-based solution.

Here’s what I have in mind for the software. Folks already bridge PC MIDI ports using MIDI-OX. Linux has the ALSA MIDI software. The amidi -l command displays the physical and virtual MIDI ports. The aconnect command connects MIDI ports. The trick will be discovering and connecting MIDI ports after boot without manual intervention, i.e., the RPi boots and builds the bridge without a keyboard, display, a log in, etc.

So, there it is! My hardware lab is currently in disarray so I can’t easily do a proof of concept implementation. However, if you have the RPi and the pieces/parts, please give this a try.

If you enjoyed reading this article, you may find these articles interesting, too:

A sweet one-liner for histograms

Here is a shell script one-liner that is just too sweet to go unrecognized.

I’ve written a program (latency.c) to measure the execution time of individual chase operations in a linked list pointer chasing loop. The program writes the execution times into a file named samples.dat. The distribution of the execution times should show the access time to different levels of the ARM1176 memory hierarchy.

I still needed a way to view the distribution in a histogram. So, I wrote a short C program (histogram.c) to postprocess the execution times. The program produces a histogram-like table — not a chart. Bummer.

A quick search of the Web brought up some rather sophisticated data visualization tools. I experimented with a few of them, but still couldn’t get want I wanted.

Then, this little gem came up from Small Labs, Inc. (Search on “command line histogram.)

history | awk '{h[$2]++}END{for(i in h){print h[i],i|"sort -rn|head -20"}}' |awk '!max{max=$1;}{r="";i=s=60*$1/max;while(i-->0)r=r"#";printf "%15s %5d %s %s",$2,$1,r,"\n";}'

This one-liner displays a histogram of recent command lines (history) from the most to least frequently used. It displays only the 20 most frequent commands (head -20).

This one-liner looks quite promising, but it needs a few changes. First, it needs to read data from the file samples.dat. It needs to read only one item from each line of the file; history produces two items per line. Finally, we can discard some of the white space in the output and make the lines in the chart narrower.

Here is the revised one-line written in the form of a bash shell script. We’re going to analyze several files and it’s appropriate to put the one-liner into a shell command file.

#!/bin/bash

cat samples.dat | awk '{h[$1]++}END{for(i in h){print h[i],i|"sort -rn|head -20"}}' |awk '!max{max=$1;}{r="";i=s=60*$1/max;while(i-->0)r=r"#";printf "%6s %5d %s %s",$2,$1,r,"\n";}'

Here is some sample output.

    62  517 ###########################################################
    61  301 ################################### 
     9  119 ############## 
    70   12 ## 
    63   11 ## 
    65    9 ## 
    64    4 # 
    17    4 # 
   336    2 # 
   168    2 # 
   154    2 # 
    71    1 # 
    67    1 # 
    60    1 # 
   525    1 # 
   521    1 # 
   390    1 # 
   389    1 # 
   386    1 # 
   378    1 # 

We can easily see the most frequent values, but this doesn’t really show the distribution in a useful way. So, let’s just sort the output on the leading numeric values (sort -n). The numbers in the first column are access/latency times in cycles, by the way.

     9  119 ##############
    17    4 # 
    60    1 # 
    61  301 ###################################
    62  517 ###########################################################
    63   11 ## 
    64    4 # 
    65    9 ## 
    67    1 # 
    70   12 ## 
    71    1 # 
   154    2 # 
   168    2 # 
   336    2 # 
   378    1 # 
   386    1 # 
   389    1 # 
   390    1 # 
   521    1 # 
   525    1 # 

Ah, now we can see the level 1 data cache hits (9 cycles including measurement bias) and reads to primary memory (61 and 62 cycles). The bi-modal nature of the distribution is revealed.

Before leaving this topic, I’d like to give a shout-out to Dave Christie at AMD. Dave always gave me a good-natured kidding about these old school histograms. Dave, BTW, is one of the true unsung technical heroes at AMD. All the best!

Building a loadable kernel module

I needed to design, implement and build a loadable kernel module in order to access the ARM11 performance counters from user-space. I will slowly roll out the design and code for the kernel module. But, first, I’ve posted my notes for building a loadable kernel module. It’s easier to explain the process of building the module and the internal design of the module if I separate the two discussions.

There were a few problems along the way. The kernel source for Raspbian Wheezy 3.6.11+ is not available using Synaptic. Only 3.6.9 is available through Synaptic. I needed to download the source for 3.6.11+ from github.com in order to match the installed Linux image. Next, I needed the module version information for 3.6.11+. Usually this information is built along with the kernel and is stored in the file named Module.symvers. Raspbian Wheezy takes 10+ hours to build on the Raspberry Pi, according to reports on the Web, so I didn’t want to undertake a long-running kernel build just to generate the version information. Fortunately, I could download Module.symvers from github.com, too.

I hope the next Raspbian Wheezy distro has all of the essentials for a module build — headers, version information, the whole she-bang. This would really help a brother out because many people are building custom hardware for the RPi and they need to build custom device drivers, too.

I’m currently in the process of writing pages on performance monitoring on the RPi. That discussion will include the design and code for my kernel module. The kernel module is about as simple as can be and is a kind of “Hello world” example. Please stay tuned.

Raspberry Pi tips and tricks

If you’re new to the world of Raspberry Pi, you should check out the page about Raspberry Pi tips and tricks. The page has all of the things that I did to configure Raspberry Pi and get started with the operating system, Raspbian Wheezy.

One of the most important things you should do is to configure the operating system for your geographic location or “locale”. The Raspbian Wheezy image is configured for the United Kingdom (UK). Therefore, the operating system and other software formats the data, time, and money for the UK. If you’re living somewhere else, you’ll want to set things to local convention. That’s where the locale comes into play.

Run the raspi-config program to change settings:

sudo raspi-config

The program implements a simple quasi-GUI where you navigate using the arrow keys. The space bar makes selections and the enter key confirms selections, etc.

Of course, you should set the timezone using the change_timezone option. Use the change_locale option to set your locale.

You would think that change_locale would set the keymap for your keyboard, but no! You need to change the keymap (keyboard layout) using the configure_keyboard option. If you type a key like “|” and you get a different character, you probably need to change the keymap to match your keyboard. Raspian Wheezy will build the new keymap when it reboots the next time (a step that takes a little time to execute).

Finally, you should know how to properly shut down the operating system and Raspberry Pi. I use the shutdown program:

sudo shutdown -h now

The -h option tells the processor to halt after shutdown, allowing you to safely pull the adapter plug from the wall. Use the -r option if you want to reboot instead. Don’t forget to do this since the OS may need to update information in permanent storage, thereby avoiding file system corruption.

Hey, don’t forget to have fun with your new system!