Spectral (component) modeling

The subject of Yamaha’s Spectral Component Modeling (SCM) technology comes up from time to time. Yamaha have successfully employed SCM in its CP1/CP4 stage pianos and Reface CP instrument. Players love SCM because it avoids sonic discontinuities due to velocity switching, giving a more natural and dynamic character as the player digs in or strikes gently.

Before getting into my own comments, here are two sections of background information.

Yamaha Spectral Component Modeling

Yamaha, as usual, are mum about the actual technical details. The following quote is taken from the Yamaha FAQ “What is Spectral Component Modeling (9/27/2010)?”

The CP1’s tone generator produces sounds based on performance data that is created by playing the keyboard and operating various controllers. The type of piano sound produced is defined by the currently selected Performance and the Master Equalizer. Each Performance is made up of two individual piano parts together plus a Reverb block.

Each of the Parts in the Performance is subdivided into three distinct blocks namely, the Piano block, the Modulation Effect block, and the Power-Amplifier / Compressor block. These blocks allow the characteristics of the instruments to be faithfully reproduced by simulating a broad spectrum of piano types, amplifiers, effectors, and other critical elements. Using the Piano Customize function to assemble these blocks in various combinations, not only can standard vintage settings be reproduced, but unique hardware combinations can be realized that would never be possible in the real world.

Each Performance allows the piano sounds produced by two different Parts to be sent through a common Reverb block for finishing. Performances also contain a Common Settings area that allows a name, a keyboard mode, controllers, pan settings, and several other parameters to be configured for each. These common settings can be used to make final adjustments to the individual Performances.

The Master Equalizer block is used to set EQ parameters that effect all Performances. In this way, the tone of the CP1 can be adjusted to match the room acoustics so that each of the Performance selected will have the desired sound.

Thus, the term “SCM” is a bit of scientific truth and a bit of marketing-speak. “Spectral Component Modeling” refers not only to spectral synthesis, but it encompasses the DSP effects, equalization and compression processing. The latter elements are part of Yamaha’s Virtual Component Modeling effort in which Yamaha model vintage gear that lends character to a synthesized sound.

Spectral modeling

What then is spectral modeling? For a brief answer, I quote J.S. Smith III of Stanford’s Center for Computer Research in Music and Acoustics (CCRMA).

Spectral modeling can be viewed as “sampling synthesis done right” [154]. That is, in spectral modeling synthesis, segments of the time-domain signal are replaced by their short-time Fourier transforms, thus providing a sound representation much closer to the perception of sound by the brain [66,109,205]. This yields two immediate benefits: (1) computational cost reductions based on perceptual modeling, and (2) more perceptually fundamental data structures. Cost reductions follow naturally from the observation [168] that roughly 90% of the information contained in a typical sound is not perceived by the brain. For example, the popular MP3 audio compression format [27,28] can achieve an order of magnitude data reduction with little or no loss in perceived sound quality because it is based on the short-time Fourier transform, and because it prioritizes the information retained in each spectral frame based on psychoacoustic principles. To first order, MPEG audio coding eliminates all spectral components which are masked by nearby louder components.

The disadvantages of spectral modeling are the same as those of sampling synthesis, except that memory usage can be greatly reduced. Sampling the full playing range of a musical instrument is made more difficult, however, by the need to capture every detail in the form of spectral transformations. Sometimes this is relatively easy, such as when playing harder only affects brightness. In other cases, it can be difficult, such as when nonlinear noise effects begin to play a role.

An excellent recent example of spectral modeling synthesis is the so-called Vocaloid developed by Yamaha in collaboration with others [5]. In this method, the short-time spectrum is modeled as sinusoids plus a residual signal, together with higher level spectral features such as vocal formants. The model enables the creation of “vocal fonts” which effectively provide a “virtual singer” who can be given any material to sing at any pitch. Excellent results can be achieved with this approach (and some of the demos are very impressive), but it remains a significant amount of work to encode a particular singer into the form of a vocal font. Furthermore, while the sound quality is generally excellent, subtle “unnaturalness” cues may creep through from time to time, rendering the system most immediately effective for automatic back-up vocals, or choral synthesis, as opposed to highly exposed foreground lead-singer synthesis.

Zooming out, spectral modeling synthesis can be regarded as modeling sound inside the inner ear, enabling reductions and manipulations in terms of human perception of sound.

[5] X. Amatriain, J. Bonada, A. Loscos, and X. Serra, “Spectral processing,” in DAFX – Digital Audio Effects (U. Zölzer, ed.), pp. 373-438, West Sussex, England: John Wiley and Sons, LTD, 2002, http: //www.dafx.de/.

[27] M. Bosi and R. E. Goldberg, Introduction to Digital Audio Coding and Standards, Boston: Kluwer Academic Publishers, 2003.

[28] K. Brandenburg and M. Bosi, “Overview of MPEG audio: Current and future standards for low-bit-rate audio coding,” Journal of the Audio Engineering Society, vol. 45, pp. 4-21, Jan./Feb. 1997.

[66] B. R. Glasberg and B. C. J. Moore, “A model of loudness applicable to time-varying sounds,” Journal of the Audio Engineering Society, vol. 50, pp. 331-342, May 2002.

[109] B. C. J. Moore, An Introduction to the Psychology of Hearing, New York: Academic Press, 1997.

[154] J. O. Smith, “Viewpoints on the history of digital synthesis,” in Proceedings of the 1991 International Computer Music Conference, Montreal, pp. 1-10, Computer Music Association, 1991, http: //ccrma.stanford.edu/~jos/kna/.

[205] E. Zwicker and H. Fastl, Psychoacoustics: Facts and Models, Berlin: Springer Verlag, 1999, second updated edition, 80pp., CD-ROM/softcover.

Of course, this assumes that Yamaha have adopted this specific approach/technology for SCM!

Commentary

Thus, one may view spectral modeling as a compression technique as well as a synthesis technique. Spectral modeling encodes sound in a similar way as MP3 or other psychoacoustic compression methods.

Why did Yamaha adopt spectral modeling? We need to consider the technology available to Yamaha in the 2010 time frame. In 2010, the SWP51L “Standard Wave Processor” was Yamaha’s workhorse tone generation chip. The SWP51L has a fixed address width to waveform (sample) memory. Acoustic pianos are notorious memory hogs. It’s possible that Yamaha run up against the physical hardware addressing limit of the SWP51L. Yamaha needed to break this barrier and the psychoacoustic compression offered by SCM was one way out.

The CP1 stage piano employs three SWP51L tone generator ICs. Two SWP51Ls are a master/slave pair and performed tone generation (synthesis). The third SWP51L is dedicated to effects (including damper resonance). The master/slave pair share waveform memory which consists of two MR26V51252R (512Mbit) memory devices for a total of 128MBytes of waveform P2ROM. Considering the sound set and sonic quality of the pianos, this is an insanely small waveform memory and well-within the addressing range of the SWP51L.

The Reface CP, also SCM-based, uses very modest compute (SWX08 processor and tone generator) and wave memory. The Reface CP and Reface YC have the same digital logic board. Unfortunately, I have only the Reface YC service manual, but a notation in the overall block diagram implies a 64MByte waveform memory in the Reface CP.

So, why isn’t SCM used today, Reface CP excepted? The current generation tone generator, the SWP70, does not have the same hard addressing limit as the SWP51L. The SWP70 employs Open NAND Flash Interface (ONFI) commodity memory. Thus, the high compression offered by spectral modeling is no longer needed and conventional sample-playback synthesis (AWM2) is “good enough.” Yamaha engineering is probably loath to carry and support two different sampling/synthesis techniques for cost reasons.

Now whether “good enough” satisfies sonically or not is a subjective question…

From the promotional angle, Yamaha are featuring the CFX grand piano. The CP1 featured the CFIII and S6. To be current, Yamaha would need a spectral modeling implementation of the CFX — an additional, perhaps unnecessary expense.

I’d like to point out that 64MBytes of wave memory is not much more than the small budget DGX-650, which has 32MBytes of wave memory. Thus, Yamaha could build an absolute killer DGX — a model that would totally cannibalize sales of its high-end piano offerings! Business first.

Copyright © 2019 Paul J. Drongowski (except quoted excerpts as cited)