Spectral (component) modeling

The subject of Yamaha’s Spectral Component Modeling (SCM) technology comes up from time to time. Yamaha have successfully employed SCM in its CP1/CP4 stage pianos and Reface CP instrument. Players love SCM because it avoids sonic discontinuities due to velocity switching, giving a more natural and dynamic character as the player digs in or strikes gently.

Before getting into my own comments, here are two sections of background information.

Yamaha Spectral Component Modeling

Yamaha, as usual, are mum about the actual technical details. The following quote is taken from the Yamaha FAQ “What is Spectral Component Modeling (9/27/2010)?”

The CP1’s tone generator produces sounds based on performance data that is created by playing the keyboard and operating various controllers. The type of piano sound produced is defined by the currently selected Performance and the Master Equalizer. Each Performance is made up of two individual piano parts together plus a Reverb block.

Each of the Parts in the Performance is subdivided into three distinct blocks namely, the Piano block, the Modulation Effect block, and the Power-Amplifier / Compressor block. These blocks allow the characteristics of the instruments to be faithfully reproduced by simulating a broad spectrum of piano types, amplifiers, effectors, and other critical elements. Using the Piano Customize function to assemble these blocks in various combinations, not only can standard vintage settings be reproduced, but unique hardware combinations can be realized that would never be possible in the real world.

Each Performance allows the piano sounds produced by two different Parts to be sent through a common Reverb block for finishing. Performances also contain a Common Settings area that allows a name, a keyboard mode, controllers, pan settings, and several other parameters to be configured for each. These common settings can be used to make final adjustments to the individual Performances.

The Master Equalizer block is used to set EQ parameters that effect all Performances. In this way, the tone of the CP1 can be adjusted to match the room acoustics so that each of the Performance selected will have the desired sound.

Thus, the term “SCM” is a bit of scientific truth and a bit of marketing-speak. “Spectral Component Modeling” refers not only to spectral synthesis, but it encompasses the DSP effects, equalization and compression processing. The latter elements are part of Yamaha’s Virtual Component Modeling effort in which Yamaha model vintage gear that lends character to a synthesized sound.

Spectral modeling

What then is spectral modeling? For a brief answer, I quote J.S. Smith III of Stanford’s Center for Computer Research in Music and Acoustics (CCRMA).

Spectral modeling can be viewed as “sampling synthesis done right” [154]. That is, in spectral modeling synthesis, segments of the time-domain signal are replaced by their short-time Fourier transforms, thus providing a sound representation much closer to the perception of sound by the brain [66,109,205]. This yields two immediate benefits: (1) computational cost reductions based on perceptual modeling, and (2) more perceptually fundamental data structures. Cost reductions follow naturally from the observation [168] that roughly 90% of the information contained in a typical sound is not perceived by the brain. For example, the popular MP3 audio compression format [27,28] can achieve an order of magnitude data reduction with little or no loss in perceived sound quality because it is based on the short-time Fourier transform, and because it prioritizes the information retained in each spectral frame based on psychoacoustic principles. To first order, MPEG audio coding eliminates all spectral components which are masked by nearby louder components.

The disadvantages of spectral modeling are the same as those of sampling synthesis, except that memory usage can be greatly reduced. Sampling the full playing range of a musical instrument is made more difficult, however, by the need to capture every detail in the form of spectral transformations. Sometimes this is relatively easy, such as when playing harder only affects brightness. In other cases, it can be difficult, such as when nonlinear noise effects begin to play a role.

An excellent recent example of spectral modeling synthesis is the so-called Vocaloid developed by Yamaha in collaboration with others [5]. In this method, the short-time spectrum is modeled as sinusoids plus a residual signal, together with higher level spectral features such as vocal formants. The model enables the creation of “vocal fonts” which effectively provide a “virtual singer” who can be given any material to sing at any pitch. Excellent results can be achieved with this approach (and some of the demos are very impressive), but it remains a significant amount of work to encode a particular singer into the form of a vocal font. Furthermore, while the sound quality is generally excellent, subtle “unnaturalness” cues may creep through from time to time, rendering the system most immediately effective for automatic back-up vocals, or choral synthesis, as opposed to highly exposed foreground lead-singer synthesis.

Zooming out, spectral modeling synthesis can be regarded as modeling sound inside the inner ear, enabling reductions and manipulations in terms of human perception of sound.

[5] X. Amatriain, J. Bonada, A. Loscos, and X. Serra, “Spectral processing,” in DAFX – Digital Audio Effects (U. Zölzer, ed.), pp. 373-438, West Sussex, England: John Wiley and Sons, LTD, 2002, http: //www.dafx.de/.

[27] M. Bosi and R. E. Goldberg, Introduction to Digital Audio Coding and Standards, Boston: Kluwer Academic Publishers, 2003.

[28] K. Brandenburg and M. Bosi, “Overview of MPEG audio: Current and future standards for low-bit-rate audio coding,” Journal of the Audio Engineering Society, vol. 45, pp. 4-21, Jan./Feb. 1997.

[66] B. R. Glasberg and B. C. J. Moore, “A model of loudness applicable to time-varying sounds,” Journal of the Audio Engineering Society, vol. 50, pp. 331-342, May 2002.

[109] B. C. J. Moore, An Introduction to the Psychology of Hearing, New York: Academic Press, 1997.

[154] J. O. Smith, “Viewpoints on the history of digital synthesis,” in Proceedings of the 1991 International Computer Music Conference, Montreal, pp. 1-10, Computer Music Association, 1991, http: //ccrma.stanford.edu/~jos/kna/.

[205] E. Zwicker and H. Fastl, Psychoacoustics: Facts and Models, Berlin: Springer Verlag, 1999, second updated edition, 80pp., CD-ROM/softcover.

Of course, this assumes that Yamaha have adopted this specific approach/technology for SCM!

Commentary

Thus, one may view spectral modeling as a compression technique as well as a synthesis technique. Spectral modeling encodes sound in a similar way as MP3 or other psychoacoustic compression methods.

Why did Yamaha adopt spectral modeling? We need to consider the technology available to Yamaha in the 2010 time frame. In 2010, the SWP51L “Standard Wave Processor” was Yamaha’s workhorse tone generation chip. The SWP51L has a fixed address width to waveform (sample) memory. Acoustic pianos are notorious memory hogs. It’s possible that Yamaha run up against the physical hardware addressing limit of the SWP51L. Yamaha needed to break this barrier and the psychoacoustic compression offered by SCM was one way out.

The CP1 stage piano employs three SWP51L tone generator ICs. Two SWP51Ls are a master/slave pair and performed tone generation (synthesis). The third SWP51L is dedicated to effects (including damper resonance). The master/slave pair share waveform memory which consists of two MR26V51252R (512Mbit) memory devices for a total of 128MBytes of waveform P2ROM. Considering the sound set and sonic quality of the pianos, this is an insanely small waveform memory and well-within the addressing range of the SWP51L.

The Reface CP, also SCM-based, uses very modest compute (SWX08 processor and tone generator) and wave memory. The Reface CP and Reface YC have the same digital logic board. Unfortunately, I have only the Reface YC service manual, but a notation in the overall block diagram implies a 64MByte waveform memory in the Reface CP.

So, why isn’t SCM used today, Reface CP excepted? The current generation tone generator, the SWP70, does not have the same hard addressing limit as the SWP51L. The SWP70 employs Open NAND Flash Interface (ONFI) commodity memory. Thus, the high compression offered by spectral modeling is no longer needed and conventional sample-playback synthesis (AWM2) is “good enough.” Yamaha engineering is probably loath to carry and support two different sampling/synthesis techniques for cost reasons.

Now whether “good enough” satisfies sonically or not is a subjective question…

From the promotional angle, Yamaha are featuring the CFX grand piano. The CP1 featured the CFIII and S6. To be current, Yamaha would need a spectral modeling implementation of the CFX — an additional, perhaps unnecessary expense.

I’d like to point out that 64MBytes of wave memory is not much more than the small budget DGX-650, which has 32MBytes of wave memory. Thus, Yamaha could build an absolute killer DGX — a model that would totally cannibalize sales of its high-end piano offerings! Business first.

Copyright © 2019 Paul J. Drongowski (except quoted excerpts as cited)

Inside Reface YC and CP

Like the Yamaha Reface DX and CS, the Reface YC and CP are brother and sister.

The Reface DX and CS use the Yamaha proprietary SSP2 integrated circuit (IC) for sound synthesis. A few minor hardware differences and the front panel aside, the main difference between DX and CS is software. The YC and CP designs are analogous although the tone generation method and hardware are different.

Sample playback and memory bandwidth

Many people focus on the computational aspects of tone generation and wave memory size, not realizing that memory bandwidth is just as important, if not critical, for sample playback. Waveform samples need to flow from wave memory to the tone generation apparatus whether tone generation is performed on a CPU or a proprietary tone generator IC like Yamaha’s previous generation SWP51L and the now current SWP70.

Sustainable polyphony depends on memory bandwidth. If available bandwidth is low, then polyphony is low. Raise bandwidth and you can raise polyphony, too, provided adequate computational resources (e.g., tone generation channels or CPU cycles) are available.

Several factors affect memory bandwidth.

  • The most obvious factor is the raw speed of the memory technology. Fast memory means high bandwidth.
  • Next is the kind of memory communication channel: shared or dedicated. If waveform samples and CPU code reside in the same physical memory component, then bandwidth must be shared between the CPU and the tone generator, lowering tone generation bandwidth and polyphony. Bandwidth is higher when the CPU and tone generator each have their own memory channel and component. Concurrency wins!
  • Bandwidth sometimes depends on the read access mode or pattern of the memory component. Concerns here include random vs. sequential access, word vs. paged, etc. This subject is a little too deep for this short note.
  • Finally, bandwidth depends on the bus organization: serial or parallel. Parallel buses move each bit in a word on a dedicated wire. Serial buses move moves sequentially on one or a few wires. Parallel is fast; serial is slower.

Of course, there are further factors and choices like the necessity for read-write access, non-volatile data storage, and so forth.

The instrument designer faces the challenge of supplying sufficient memory bandwidth, tone generation channels and polyphony at a particular price point. Polyphony and price point are market-driven requirements. Memory bandwidth and tone generation resources are technological. The designer must work within both kinds of requirements and constraints.

Internet discussions tend to dwell on memory speed and component cost alone, neglecting system-level design costs like board complexity, wiring and testing. A simple rule of thumb is, “More IC pins and wires means higher system cost.” Serial communication decreases pins and wires, but it compromises bandwidth. Shared buses also decrease the number of pins and wires, again, penalizing bandwidth. One expects to find serial communication and/or shared buses in low price products, while higher price products can reap the benefits of dedicated, parallel communication.

I must note that commodity bulk flash memory uses a serialized memory bus, but it does so by sequential paged reads and data caching. The SWP70 is compatible with commodity flash and uses a dedicated RAM cache to achieve high sample bandwidth. This scheme is cheaper than the SWP51L with its parallel dedicated wave bus.

Processor primer

Yamaha have several different processors at their disposal for main CPU, tone generation and effect processing (DSP) chores:

  • SWLxx: SWL processors, like the SWL01U, have integrated CPU, tone generation and DSP resources in the same IC. CPU instructions, data and waveform samples travel on the same shared bus. SWL processors are typically designed into value (i.e., entry-level) products. SWLs are also low power and ready for battery operation.
  • SWXxx: SWX processors have integrated CPU, tone generation and DSP resources on the same IC. CPU, tone generation and DSP each have a dedicated memory channel. SWX processors often appear in mid-range products.
  • SWPxx: SWP processors have a large number of tone generation and DSP elements, and no main CPU. The SWPs must be controlled by a separate main CPU.
  • SSP2: The SSP2 has an integrated CPU and DSP elements. The SSP2 is not used in AWM2 applications, appearing instead in computationally intensive synthesis engines (Reface CS and DX), vocal harmony processors, and digital mixers.

The SWL, SWX and SSP2 series processors are true “system on a chip (SOC)” designs with analog-to-digital conversion, bit-serial data (UART), USB, SPI and other interfaces. The CPU core is usually a variant of the Renesas SH architecture family. Architectural commonality facilitates code reuse across products. Yamaha have damned good engineers.

There are two different types of SWX processor: SWX02/SWX03 and the SWX08. The 02/03 variants appear in lower priced mid-range products. Examples include the MOX6 (SWX02), PSR-S650 (SWX02) and Piaggero NP-32 (SWX03). The SWX08 appear in the upper mid-range: PSR-S770, Reface YC and Reface CP.

Sometimes an SWX processor is used as the main computer controlling an SWP. For example, the SWX02 is the main computer in the MOX6/MOX8, controlling an SWP51L. Similarly, the SWX08 is the main computer in the PSR-S750, controlling an SWP51L. In both cases, the SWP51L handles all tone generation duties. Yamaha increases fabrication volume when it uses an SWX in this way.

At this point, semiconductor folks might ask if Yamaha fuses off TG or DSP deficient SWX08s and assigns them to main computer duty only. This strategy cuts waste as it deploys SWX08s with perfectly good CPUs and faulty, fused off TG and/or DSP circuitry. This is standard practice throughout the industry, so please don’t freak out.

Reface YC and Reface CP

The Yamaha Reface YC and the CP share the same digital logic board design. The main large-scale integrated (LSI) components are:

IC CPU (SWX08)   Yamaha R8A02042BG         SH-2A CPU core
Work SDRAM       Winbond W9812G6JH-6       8M x 16-bit word, 166MHz
DSP SDRAM        Winbond W9864G6KH-6       4M x 16-bit word, 166MHz
Program/Wave YC  Cypress S29GL256S90TFI020 16M x 16-bit word NOR flash
DAC              Asaki Kasei AK4396VF-E2   192kHz, 24-bit stereo DAC
Panel scan CPU   MB9AF141LAPMC1            ARM Cortex-M3 (32-bit core)
ADC              TI PCM1803ADBR            96kHz, 24-bit stereo ADC

The same ARM Cortex-M3 (32-bit core) processor is used in the Reface CS and Reface DX for panel and keyboard scan. Potentiometers and so forth are sensed by the ARM’s 12-bit analog to digital converter (ADC). Key scanning is performed through GPIO lines. (I don’t see any way to expand beyond 37 keys, unfortunately.)

The SWX08 is the main control computer. It handles the 5-pin MIDI interface and the USB interface. The ARM communicates with the SWX08 over a serial link (UART). Integral tone generation and DSP elements synthesize digital audio and effects.

The AK4396VF-E2 digital to analog converter (DAC) is also used in the PSR-S770 and PSR-S970 arranger workstations (among other Yamaha products.) The Montage employs the AK4393VM-E2 DAC by way of comparison. Digital audio for the internal speakers is converted by the Yamaha YDA176 digital amplifier.

The PCM1803ADBR ADC sends serial digital audio (24-bit I2S format) to the SWX08 where it is mixed with the synthesized tones.

DSP processors on the SWX08 have their own dedicated 16-bit data channel to DSP SDRAM (i.e., working memory for effects). The wave memory (NOR flash ROM) has a dedicated 16-bit parallel channel for samples. Wave memory is labelled “E:64MB / O:32MB”. Presumably, this means that the CP needs 64MBytes for electric piano waveforms and the YC needs 32MBytes for organ waveforms. I wonder if Yamaha substitute a larger, pin-compatible flash ROM in the Reface CP? I don’t have the Reface CP service manual in order to resolve this conjecture.

Summary

So, there you have it. Yamaha wisely designed the CS and DX as a pair and designed the CP and YC as a pair. I’m sure that shared board designs reduced their manufacturing costs.

Reface sales seem to be coming to an end. Nearly all Reface models have sold through in North America. Yamaha has either decided to cancel the Reface after the first production run or they will launch Reface 2.0, perhaps with full-size keyboards. They could easily design the guts of the YC and/or CP into the Piaggero NP-12 chassis. That would make for one killer, battery-powered stage machine!

Copyright © 2017 Paul J. Drongowski

Inside Reface DX and Reface CS

With so much to do and learn, it’s been a long while since I’ve taken a peek below the hood of an electronic musical instrument.

Yamaha caught the world by surprise with its Reface series of portable keyboards. So far, there are four models in the series: Reface YC (organ), Reface CP (electric piano), Reface CS (analog modeling synthesizer), and Reface DX (4-op FM synthesizer).

Before I get to the DX and CS, here’s a few thoughts about the YC and CP. According to Yamaha specifications, the Reface YC tone generation engine is “AWM (Organ Flutes)”. This suggests to me that the YC uses a standard AWM tone generation integrated circuit (IC) like the SWP70. Hammond-like “Organ Flutes” have been part of the mid- and upper-tier arranger workstations like Tyros for a very long time. Thus, I suspect that the YC implementation is an updated implementation of the arranger technology.

The Reface CP tone generation engine is specified as “SCM + AWM2”. SCM or “Spectral Component Modeling” is the modeling technique first employed in the flagship CP-1 stage piano. SCM and AWM2 are also used in the CP-4 and CP-40 models. The CP-1 uses three tried-and-true SWP51L tone generation ICs: master, slave and effects. The master and slave generate the base piano tones and the two ICs share the same WAVE ROM. Total WAVE ROM size is 1024Mbits or 128MBytes (organized as 16-bit words) which is a ridiculously small amount of memory for a top quality piano. Such is the power of SCM!

The CP-1’s samples are stored in two Lapis Semiconductor MR26V51252R devices (32M by 16-bit words each). The processor is a Yamaha SWX02 (SH-2A CPU core operating at 135.4752MHz). There’s not much to the CP-1 user interface, so a relatively light-weight, low-cost processor is enough for the job. The SWP51Ls handle all of the heavy computation.

Thus, the Reface YC and Reface CP are relatively uninteresting from a technologist’s point of view. The YC and CP use proven technology from other Yamaha products. That leaves the Reface CS and Reface DX.

Although the CS and DX implement two different tone generation techniques — analog physical modeling vs. frequency modulation (FM) — they are fraternal twins at the hardware level. They share much of the same base hardware design with a few variations to handle their unique user interface requirements.

The CS and DX both use a Fujitsu MB9AF141LAPMC1 processor to handle key and panel scanning. Here’s a quick summary of its characteristics:

    CPU                 Cortex-M3
    CPU Frequency       40MHz

    On-chip flash memory   Main area   64KBytes
    On-chip flash memory   Work area   32KBytes
    On-chip SRAM           SRAM0        8KBytes
    On-chip SRAM           SRAM1        8KBytes

    Peripheral interfaces:
        DMAC            8 channel
        Serial I/F      8 channel
        Base timer      8 channel
        Dual timer      1
        Realtime clock  1
        Watch counter   1
        12-bit A/D      12 channel

This processor is a good choice for embedded control applications where low power and low cost are important. To my knowledge, this is the first product line using an ARM embedded microcontroller.

The Reface CS and Reface DX both use the proprietary Yamaha SSP2 (uPD800500F1-011-KN9-A) for tone generation. The SSP2 is Yamaha’s designated hitter for DSP tasks and is incorporated into many products. The SSP2 has an SH-2A CPU core operating at an internal clock speed of 135.4752MHz. The SSP2 has its own ADC, GPIO, UART, USB and serial audio interfaces. The SSP2 UART handles 5-pin MIDI communications. The SSP2 USB interface handles external USB communications.

The SSP2 has two memory interfaces:

  • DSP RAM: Connecting to 8MBytes of DSP SDRAM.
  • CPU bus: Connecting to 8MBytes of program ROM and 16MBytes of SDRAM.

Memory sizes and devices are the same in both products.

The AUX IN and audio out hardware design is also the same across the two products:

  • PCM1803ADBR ADC: AUX IN analog-to-digital converstion
  • AK4396: Digital-to-analog conversion for OUTPUT L/R and PHONES OUT
  • YDA176 D-Amp: DAC and amplification for internal speakers

This shouldn’t be any surprise. All of the Reface series products ahare the same external jack, power and key switch boards.

Digital audio is transfered serially between the SSP2, the ADC, the DAC and the digital amplifier. The SSP2 generates the master clock (MCLK) and bit clock (BCLK) to synchronize data transfers. MCLK and BCLK are derived from the SSP2 clock, in case you’re wondering about those odd-looking CPU clock frequencies. MCLK is 256*fs and BCLK is 64*fs, where fs is the sampling frequency, 44.1KHz. MCLK operates the AK4396’s digital interpolation filter and delta signal modulator. Data format is I2S and is probably 24-bit as it is in workstation products.

Aside from the other front panel controls, the Reface DX has two major additions: Capacitive sensors for the front panel touch strips and the LCD panel display. The printed circuit board positions for the LCD interface are not populated (i.e., no mount) in the Reface CS as it has no LCD display.

There you have it — two more examples of solid and conservative Yamaha hardware design.

Now, you may find the SSP2 to be incredibly boring. It is, however, a good choice for a low-cost, compact product. The Reface CS and DX need a metal shield over the SSP2, perhaps to control RF emissions, perhaps to radiate heat, or maybe both purposes together. Low power is a vital concern throughout the Reface series due to battery power concerns.

I’m a little hesitant to draw any inferences about future products. The Yamaha Montage supports 128 note, 8 operator FM polyphony. The Reface DX provides a relatively meager 8 note, 4 operator FM polyphony. Thus, there must be considerable hardware resources at work in the Montage. Well-worth the price, one hopes! And speaking of hopes, many people would like an analog modeling extension to the Montage. That would depend, of course, on the availability of spare computational horsepower.

Copyright © 2016 Paul J. Drongowski