Preparing audio waveforms for Arduino PROGMEM

The Arduino lo-fi Beat Box is kicking up some activity and comments on the littleBits site. (Follow this link to the Beat Box project page at littleBits.)

Two littleBits inventors have made considerable progress in suppressing the noisy buzz which seems to plague the el-cheapo lo-fi DAC design. I eventually gave up fighting the buzz and built a proper Small Peripheral Interface (SPI) DAC for the littleBits Arduino. See this page and this page for more information about the SPI DAC design. The main component is the Microchips MCP4921 12-bit SPI-compatible DAC. The audio output is much quieter.

I built a littleBits song player that sings a song in Do-Re-Mi Solfege. It uses the SPI DAC for conversion. Although I completed the project at the beginning of September 2016, I’m just now getting to a write-up for the littleBits site!

If you’re still hacking the Beat Box project, you should check out the ongoing discussion in the littleBits forum. Inventor alexpikkert built a rather spiffy passive low pass filter module using littleBits bitSnaps. I’m waaay too ham-handed for that kind of work, so I’m quite impressed by his implementation.

Another inventor, Frankje, would like to contribute some new drum waveforms. He needs more information about the drum waveforms and the process that I used to make them. So, here goes.

The drum waveforms (AKA “the samples”) are stored in the Arduino’s program memory (PROGMEM). PROGMEM is the non-volatile flash memory where the uploaded sketch resides. PROGMEM is quite big by Arduino standards. The Leonardo (ATmega32u4) has 1K byte EEPROM, 2.5K bytes SRAM (read/write RAM for variables) and 32K bytes of flash memory (PROGMEM). The bootloader uses 4KBytes of PROGMEM leaving 28K bytes for user code and data.

Notice that I said “and data.” The Arduino developers wisely give a sketch direct access to data stored in PROGMEM. A sketch reads data from PROGMEM using an access functions such as pgm_read_byte_near(). Thanks to PROGMEM, Arduino programmers can store a reasonably large amount of non-volatile data along with their code.

By now, if you are using a modern day musical instrument library (i.e., 10+ GBytes of sampled instruments), you’re shreiking in horror. I wanted to keep the Beat Box design small, simple and self-contained — no SD card or bulk flash memory. That means cramming all of the percussion samples into less than 28KBytes. Please remember, our sketch needs to fit into that 28K bytes, too.

Immediately, I chose a sampling rate and size that minimized space without sacrificing too much quality. The Beat Box sample format is 22,050Hz, signed 8-bit, mono. I tried a 10,025Hz sampling rate, but too much of the top end (high frequency brightness) was lost. The Arduino PWM conversion technique provides, at best, 8 or 9 bits of resolution, so its was easy to settle on 8-bit signed. Going mono cut waveform size in half. Stereo would require a second lo-fi DAC as well as upping memory consumption by a factor of two.

I started out sampling a TR-808 kit here and a TR-808 kit there. Nothing sounded as good as the TR-808 samples produced by Michael Fischer. Michael sampled a TR-808 back in September 1994 (!) and his sample set is excellent. He sampled each of the TR-808 voices over a range of knob (parameter) values. I went through the sample set, found the sounds which (to me) represent the 808, and chose sounds with the smallest WAV files from that representative subset.

Then, the torture began.

Michael’s samples are 44,100Hz, 16-bit, mono. So, I first down converted the chosen few waveforms to 22,050Hz, 8-bit, mono and I trimmed the samples as short as I could dare. My main audio editing tool is Sony Sound Forge Audio Studio, but any good audio editor could do the job. I’m most familiar with Sound Forge and can fly with it.

The next step is getting each waveform into a compilable, C-language source file. I converted each 22,050Hz, 8-bit mono WAV file to a RAW audio file. A RAW audio file does not have a header and contains only waveform samples. I wrote a program, raw2c.c, to convert a raw file to a C-language include file containing a formatted, C-language array that is initialized with the waveform samples. The program counts the number of samples and generates a #define for the array length.

Here is the source code for raw2c.c.

I also wrote a simple command script to batch convert all sixteen RAW files and to concatenate the individual include files into a single include file, waveforms.h.

Once I had the waveform.h file, I compiled the entire sketch to see if everything would fit into 28K bytes.

Then I repeated the trim, convert and compile process, again. And, again. And, again. You get the picture. I eventually had to mangle the waveforms. Truly a shame. The final cymbal sounds have only a brief shimmer of their true glory.

There you have it! I applied the same development process to the Do-Re-Mi waveforms although I started out with samples of my vocoded voice. Memory space requirements were even tighter (!) and I had to reduce the sampling rate to 11,025Hz.

Good luck, squeeze away and convert!

Copyright © 2017 Paul J. Drongowski

The long view

Here’s some information attributed to Martin Harris from Yamaha. Martin is one of the key sound developers at Yamaha:

  • Better Pianos
  • New Strings – 70 piece Seattle Symphony Orchestra Mega
  • New Orchestral Brass – highly dynamic
  • New Tuned Percussion – Glock, Xylo, Marimba and Vibes (with motor on)
  • New Mega guitars – Telecaster with Finger and Plectrum
  • SA2 Celtic Violin
  • New Synth Voices
  • New Classical Choir – Cathedral ambience
  • New Gospel Choir – Various articulations and Ad libs
  • New Pop Vocals – 4 session singers, 2 male and 2 female
  • Singing many dynamics and many articulations (wave cycling)

Montage? No, Tyros 4. The “SA2” should be a clue as the Montage does not provide Super Articulation 2 (SA2) voices.

My purpose here is not to be tricky, but to make the case that sample-based workstations or synthesizers draw from the sound pool that is available at development time, much the same way that hardware designers draw on the pool of available components. Products cannot be composed of imaginary circuits (“sand”), software, and sounds, after all.

To better illustrate this point, here is a rough timeline for the Tyros and Motif product lines with a few mid-range products (S9xx and MOX) thrown in:

             Tyros                        Motif/Montage
----   ------------------  ------------------------------------------
Year   Model     Physical  Model     Physical  Uncompressed waveforms
----   ------------------  ------------------------------------------
2001                       Motif      48MB     84MB 1,309 waveforms
2002   Tyros      96MB
2003                       Motif ES   96MB     175MB 1,859 waveforms
2005   Tyros 2   192MB
2007                       Motif XS  128MB     355MB 2,670 waveforms
2008   Tyros 3   256MB
2010   Tyros 4   512MB     Motif XF  256MB     741MB 3,977 waveforms
2011                       MOX       128MB     355MB 2,670 waveforms
2012   PSR-S950  256MB
2013   Tyros 5   768MB     MOXF      256MB     741MB 3,977 waveforms
2015   PSR-S970    2GB
2016                       Montage     4GB     5.67GB 6,347 waveforms

I included physical wave memory size for each product. I also included the uncompressed total sample size and number of waveforms for each member of the Motif/Montage line.

Clearly, Yamaha know how to ride the memory technology curve. Memory technology has progressed to the point where it is no longer a significant hardware design factor. Rather, the amount of wave memory in a product depends more upon the ability of the sound designers to fill it with quality content and mid- versus premium-product grading (i.e., the target market segment and price point for the model). For example, note that the mid-range S970 has more than twice the physical wave memory than the Tyros 5. Although the “expansion memory” is reserved in the S970’s physical wave memory, the S970 waveform content is substantially smaller than the Tyros 5.

The other characteristic to note is how the Tyros and Motif lines tend to leapfrog each other. Generally, the Tyros line leads the Motif line in physical wave memory and content. This is partly due to the higher memory requirements of SA2 voices, which require many additional articulation samples.

Both the Tyros 4 and Motif XF were released in 2010. Both machines use two SWP51L tone generators. (Newer products like the Montage use the SWP70 tone generator.) The Tyros 4 has twice the physical wave memory capacity with respect to the Motif XF. Yet, the Tyros 4 has sample content which did not make it to a deliverable product in the Motif line until the Montage in 2016: Seattle strings, orchestral brass, Celtic violin, vocals (choir and scat), Telecaster guitar and suitcase electric piano.

Tyros 5 expanded this content in 2013. The Motif XF, on the other hand, received a significant update in January 2014. The V.150 update added the “Real Distortion” effects implemented by the Tyros 5. (A few Real Distortion effects actually premiered in the mid-range S950.) The V1.50 update and the “White Motif” color job were life-extenders for the Motif line. I’ve conjectured before that Montage development was late and this is further evidence.

So, what can we expect in the Tyros successor which I’m calling the “Tyros++”. (Yamaha have trademarked the name “GENOS” which may be the name of the follow-on. Only Yamaha really knows.) Personally, I’m hoping for the new orchestral woodwinds from Montage. These are superbly expressive voices. I’m also expecting improved electric pianos, again, of comparable quality to the Montage.

SA2 voices will probably remain exclusive to the Tyros line. Many folks hoped that Montage would have SA2 and it didn’t. SA2 is an important product differentiator — kind of like the premium “Natural” piano voices are to the Clavinova line. I suspect that FM voices will be a differentiator for the premium Montage line in years to come as well. Yamaha tends to think of these three product lines as distinct, so cross-over is carefully controlled and limited.

All of this talk about samples and wave memory size is overly reductionist. The three main (DMI) product lines — Tyros, Motif/Montage, Clavinova — have distinct personalities and features. Motif/Montage is a synthesizer for stage and production studio. Clavinova is primarily a home or church piano. Tyros serves double duty as a home keyboard and as a workstation for performing professionals. (Oddly, many USA customers scoff at this latter role.)

Although these are all fine instruments, the personalities have quirks. Upper-range Clavinovas are Tyros-in-disguise except for multi-pads, third RIGHT voice (i.e., only two voice layers in the right hand), and no expansion memory. Tyros does not have the deep editing or modulation features of the Motif/Montage. The Motif and Montage — strangely! — do not have a tonewheeel organ mode. This latter omission is hard to understand since the Montage competes against other “stage” products like the Korg Kronos and Nord Stage.

Having compared voice programming between PSR-S950 (Tyros 3 without SA2 voices) and MOX (Motif XS sound set), the product lines are voiced (programmed) differently. Motif/Montage effect programming has a harder edge than the Tyros, which is oriented toward oldies, pop and jazz standards. (Yes, Virginia, the Tyros does have latent EDM potential to be tapped.) If the Tyros++ includes the orchestral woodwinds, for example, they will probably be programmed rather differently than Montage. Tyros++ four-part divisi ensembles with the new orchestral woodwinds would be simply brilliant. Can’t wait to see and hear what happens!

One finally editorial comment. The world is filled with product reviews. Publications like Keyboard magazine, Electronic Musician, etc. focus on individual products and rarely present a deep, long-term perspective on products. Sound On Sound reviews occasionally give historical background — usually for esoteric, retro studio pieces. As consumers, we need the long view in order to make the most informed choice.

Scat voice expansion pack

I’m pleased to release version 1 of my jazz scat voice expansion pack for Yamaha PSR-S950 and PSR-S750 arranger workstations. The expansion pack has five PSR voices which let you create “Take 6” style, a cappella arrangements and other kinds of jazz voice performances. Give the MP3 demo a try!

Four of the PSR voices are individual syllables: DOO, DOT, BOP and DOW. The DOO syllable is looped and let’s you create sustained chords for backing. The DOT, BOP and DOW syllables are short and provide scat-like expression. All four syllables are combined into a velocity-switched voice where you select and play one of the syllables based on how hard you strike the keys (i.e., MIDI note velocity). You will need to adjust touch response (and practice!) to get the most playable and musical result.

Here is a link to the expansion pack file. You need to download and UNZIP this file, then install the YEP file by following the directions in the Yamaha PSR-S950/PSR-S750 Owner’s Manual. See the section titled “Expanding Voices”.

I am also releasing the multi-samples that I used to create the expansion pack in case you would like to create a scat voice for your own synthesizer or software instrument. If you are curious about how I created the expansion pack voices and the samples, please see this blog post.

Both the scat voice expansion pack and the scat voice samples are released under a Creative Commons Attribution 4.0 International License.

Creative Commons License
ScatVoices and ScatVoice samples by Paul J. Drongowski are licensed under a Creative Commons Attribution 4.0 International License.

You are free to use the expansion pack voice or samples (even for commercial purposes) as long as you provide a link to from your own web site AND/OR explicitly credit me in your creative work, e.g., “Scat samples/voice by Paul J. Drongowski”.