RPi2: Work in progress 1

Here’s a quick status update on working with Raspberry Pi gen 2. The installed operating system is Raspbian Wheezy 3.18.7-v7+ built on 16 February 2015.

I’m happy to report that I could profile programs using PERF software events. I’m disappointed to report that PERF does not recognize any hardware (performance counter) events. This distro has Linux-tools-3.2 installed. I uninstalled 3.2 and installed 3.18 which matches the kernel:

sudo apt-get remove Linux-tools-3.2
sudo apt-get install Linux-tools-3.18

Still no joy when attempting to use hardware events. If you want to profile your program using PERF software events, please see my current PERF tutorial about finding execution hot-spots. I tried all of the commands and, with the exception of one typo, everything still works!

I’m in the process of troubleshooting my loadable kernel module for user-space performance counter events. I’ve encountered many of the same old stumbling blocks (e.g., finding the correct headers and Module.symvers file). At the present time, the kernel will attempt to load the module, then die. I cannot tell at this stage if there is a problem in the module itself or if there is a bug in Raspbian Wheezy. In case you want to dive into module development yourself, I’ve started a permanent page for building kernel modules on RPi2.

Once again, after two+ years, I want to make a public plea for more open information about the underlying hardware and for guidance and support for end-user device driver development. Quite frankly, Broadcom plays this situation too close to the chest, especially for a computer that’s advertised as a vehicle for learning and education. The dearth of information is stifling. People still struggle to identify and download essential information (e.g., Module.symvers) for device driver development. This is not true of other major Linux distros and the Raspbian folks really need to take note! Broadcom, in particular, runs the risk of killing off the goose laying the golden eggs.

Before signing off, here is a quick PERF command cheat sheet. I recommend reading the tutorial, but if you really must peck away at the keyboard… All the best!

perf help
perf list
perf stat -e cpu-clock ./program
perf record -e cpu-clock ./program
perf record -e cpu-clock,faults .program
perf report
perf report --stdio --sort comm,dso --header
perf report --stdio --dsos=program,libc-2.13.so
perf annotate --stdio --dsos=program --symbol=function
perf annotate --stdio --dsos=program --symbol=function --no-source
perf record -e cpu-clock --freq=8000 ./program
perf evlist -F

Replace “program” with the name of your application program and replace “function” with the name of a function in your program.

Second generation RPi is here

The second generation Raspberry Pi (RPi2) is now shipping in large quantities! Given the excitement on the Web, this machine should be at least as popular as its first generation parents. Although the RPi2 model B has the same overall form factor as the first generation model B+, the designers made two substantial improvements which make the RPi2 a contender for your desktop:

  1. The single core Broadcom BCM2835 is replaced by the quad core BCM2836.
  2. Primary memory is increased to 1GByte of LPDDR2 RAM.

That’s just the face of it. Not only does the BCM2836 have four processor cores instead of one core, the cores are based on the ARMv7 architecture (Coretx-A7) including the NEON single instruction, multiple data (SIMD) instructions. The clock frequency is increased to 900MHz (from 700MHz). I’ve already begun to explore the ARMv7 micro-architecture and plan to write up a short, concise summary of its performance-related characteristics.

The BCM2836 has a different memory controller. Primary memory is no longer implemented using the Package on Package (PoP) approach. The Elpida (Micron) B8132C4PB-8D-F memory chip is mounted on the bottom of the RPi2 board (instead of the PoP piggyback).

The RPi2 sold out at Sparkfun almost immediately. Fortunately, Canakit, Element14 and Microcenter have received shipments, too. Amazon advertised the Canakit Raspberry Pi 2 Ultimate Starter Kit at a very attractive price and I immediately bought a kit. Microcenter in Cambridge had a mound of RPi2s and impatience took the best of me — I bought one. Yes, after getting the mail, I now have two.

I copied the latest Raspbian Wheezy release (16 February 2015) to a 16MByte microSD card using Win32DiskImager. The Canakit ships with NOOBS on an 8GByte card and I hope to try and report about NOOBS later. There was a little drama while bringing up Raspbian Wheezy as some relatively small, but annoying problems did crop up. Once I got past the sand traps, the new RPi2 proved to be an able performer.

Today, I copied my test software over to the RPi2. Here is a quick comparison between the older RPi model B and the new RPi2.

Platform Naïve MM Interchange MM
RPi model B gen 1 18.67 sec 6.75 sec
RPi gen 2 3.15 sec 2.42 sec

The two test cases are the naïve matrix multiplication program and the loop nest interchange matrix multiplication program. (Get the code in the source section of the web site.) Yes, that is a 6x improvement in performance for the naïve case. It’ll be fun to explore and find the reasons behind the speed-up. Fast matrix multiplication depends upon memory bandwidth and there must be some significant improvements in the memory subsystem. Naïve matrix multiplication incurs a lot of translation lookaside buffer (TLB) misses, so improvements in TLB miss handling could also contribute to the speed-up in the naïve test case.

I ditched the Epiphany Web browser as it seems to have significant bugs. The browser crashed repeatedly when loading the New York Times front page. This is unacceptable. I installed Midori, which came with the initial release of Raspbian Wheezy. The New York Times front page is a bit of a torture test. Midori loaded the page in less time than the RPi gen 1, but still felt slow and logy. I suspect that many applications will need to be compiled for ARMv7 before we end-users get the full benefit of the BCM2836. The initial result, however, is encouraging.

Well, I’ve started to reorganize the site’s menu structure in order to get ready for new content about the RPi2. I intend to retain the older articles as they remain quite relevant. More to come!