Yamaha CSP pianos: First take

Yamaha just announced the Clavinova CSP series of digital pianos. There are two models: CSP-150 and CSP-170. The main differences between the 170 and 150 are keyboard action (NWX and GH3X, respectively) and sound system (2 x 45W and 2 x 30W, respectively). USA MSRP list prices are $5,399 to $5,999, and $3,999 to $4,599 USD.

These are not stage pianos. They are “furniture” pianos which complement and fit below the existing CLP line.

Here’s my imagined notion of the product pitch meeting:

Digital piano meets arranger meets Rock Band. Let’s say that you don’t have much (any) musical training, but you want to play along with Katy Perry. Sit down at the CSP with your smart device, install the Smart Pianist app and connect via Bluetooth. Call up “Roar” in the app and get a simple musical score. Start the song, follow the LEDs above the keys and play along with the audio. The app stays in sync with the audio and highlights the notes to be played on each beat. So, if you learned a little bit about reading music, you’re good to go.

Sorry, a little bit more than an elevator pitch, but this is first draft writing! 🙂

That is CSP in a nutshell. The CSP is a first-rate piano and it has a decent collection of non-piano voices and arranger styles. The CSP even includes the Hammond-ish “organ flutes” drawbar organ voices. So, if you want to jam out with electric guitar, you’re set. If you want to play chords with your left hand and freestyle it, the CSP is ready.

If you’re looking for a full arranger workstation, though, you’re missing some features. No pitch bend wheel, no mod wheel, no multipads, no accompaniment section (MAIN, FILL, …) buttons. No voice editing; all voices are preset.

And hey, there’s no display either! The Smart Pianist app is your gateway to the CSP feature set. You can select from a few voices and styles using the FUNCTION button and the piano keyboard, but you need the app to make full use of the CSP. Eliminating the CLP’s touch panel, lights and switches takes a lot of cost out of the product, achieving a more affordable price point.

I could see the CSP appealing to churches as well as home players given the quality of the piano and acoustic voices. Flipping the ON switch and playing piano is just what a lot of liturgical music ministers want. The more tech savvy will dig in. Pastors will appreciate the lower price of the CSP line.

From the perspective of an arranger guy, the CSP represents a shift away from the standard arranger. For decades, people want to play with their favorite pop tunes. In order to use a conventional arranger (no matter what brand), the musician must find a suitable style and the musician must have the musical skill to play a chord with the left hand, even if it’s just the root note of the chord. Often the accompaniment doesn’t really “sound like the record” and the player feels disappointed, unskilled and depressed. Shucks, I feel this way whenever I make another attempt at playing guitar and at least I can read music!

The CSP is a new paradigm that addresses these concerns. First, the (budding) musician plays with the actual recording. Next, the app generates a simplified musical score — no need to chase after sheet music. The score matches the actual audio and the app leads the player through the score in sync with the audio. Finally, the CSP’s guide lights make a game of playing the notes in the simplified score.

We’ve already seen apps from Yamaha with some of these features. Chord Tracker analyzes a song from your audio music library and generates a chord chart. Kittar breaks a song down into musical phrases that can be repeated, transposed and slowed down for practice. The Smart Pianist app includes Chord Tracker functionality and takes it to another level producing a two stave piano score.

Notice that I said “a score” not “the score.” Yamaha’s audio analysis only needs to be good enough to produce a simple left hand part and the melody. It does not need to generate the full score for a piece of music. Plus, there are likely to be legal copyright issues with the generation of a full score. (A derivative work?)

Still, this is an impressive technical feat and is the culmination of years of research in music analysis. Yamaha have invested heavily in music analysis and hold many patents. Here are a few examples:

  • U.S. Patent 9,378,719: Technique for analyzing rhythm structure of music audio data, June 28, 2016
  • Patent 9,117,432: Apparatus and method for detecting chords, August 25, 2015
  • U.S. Patent 9,053,696: Searching for a tone data set based on a degree of similarity to a rhythm pattern, June 9, 2015
  • U.S. Patent 9,006,551: Musical performance-related information output device, April 14, 2015
  • Patent 9,275,616: Associating musical score image data and logical musical score data, March 1, 2016
  • U.S. Patent 9,142,203: Music data generation based on text-format chord chart, September 22, 2015

The last patent is not music analysis per se. It may be one of several patents covering technology that we will see in the next Yamaha top of the line (TOTL) arranger workstation.

I think we will be seeing more features based on music analysis. Yamaha’s stated mission is to make products that delight customers and to provide features that are not easily copied by competitors. Yamaha have staked out a strong patent position in this area let alone climbing over the steep technological barrier posed by musical analysis of audio.

Real Acoustic Sound

As mentioned in my earlier post, the Yamaha NSX-1 integrated circuit implements three sound sources: a General MIDI engine based on the XG voice architecture, eVocaloid and Real Acoustic Sound (RAS). RAS is based on Articulation Element Modeling (AEM) and I now believe that eVocaloid is also a form of AEM. eVocaloid uses AEM to join or “blend” phonemes. The more well-known “conventional” Vocaloid uses computationally intensive mathematics for blending which is why conventional Vocaloid remains a computer-only application.

Vocaloid uses a method called Frequency-domain Singing Articulation Splicing and Shaping. It performs frequency domain smoothing. (That’s the short story.)

AEM underlies Tyros Super Articulation 2 (S.Art2) voices. Players really dig S.Art2 voices because they are so intuitively expressive and authentic. Synthesizer folk hoped that Montage would implement S.Art2 voices — a hope not yet realized.

Conceptually, S.Art2 has two major subsystems: a controller and a synthesis engine. The controller (which is really software running on an embedded microcomputer) senses the playing gesture made by the musician and translates those gestures into synthesis actions. Gestures include striking a key, releasing a key, pressing an articulation button, moving the pitch bend or modulation wheel. Vibrato is the most commonly applied modulation type. The controller takes all of this input and figures out the musician’s intent. The controller then translates that intent into commands which it sends to the synthesis engine.

AEM breaks synthesis into five phases: head, body, joint, tail and shot. The head phase is what we usually call “attack.” The body phase forms the main part of a tone. The tail phase is what we usually call “release.” The joint phase connects two bodies, replacing the head phase leading into the second body. A shot is short waveform like a detached staccato note or a percussive hit. A flowing legato string passage sounds much different than pizzicato, so it makes sense to treat shots separately.

Heads, bodies and tails are stored in a database of waveform fragments (i.e., samples). Based on gestures — or MIDI data in the case of the NSX-1 — the controller selects fragments from the database. It then modifies and joins the fragments according to the intent to produce the final digital audio waveform. For example, the synthesis engine computes joint fragments to blend two legato notes. The synthesis engine may also apply vibrato across the entire waveform (including the computed joint) if requested.

Whew! Now let’s apply these concepts to the human voice. eVocaloid is driven by a stream of phonemes. The phonemes are represented as an ASCII string of phonetic symbols. The eVocaloid controller recognizes each phoneme and breaks it down into head, body and tail fragments. It figures out when to play these fragments and when bodies must be joined. The eVocaloid controller issues internal commands to the synthesis engine to make the vocal intent happen. As in the case of musical passages, vibrato and pitch bend may be requested and are applied. The NSX-1 MIDI implementation has three Non-Registered Parameter Number (NRPN) messages to control vibrato characteristics:

  • Vibrato Type
  • Vibrato Rate
  • Vibrato Delay

I suspect that a phoneme like “ka” must be two fragments: an attack fragment “k” and a body fragment “a”. If “ka” is followed immediately by another phoneme, then the controller requests a joint. Otherwise, “ka” is regarded as the end of a detached word (or phrase) and the appropriate tail fragment is synthesized.

Whether it’s music or voice, timing is critical. MIDI note on and note off events cue the controller as to when to begin synthesis and when to end synthesis. The relationship between two notes is also critical as two overlapping notes indicate legato intent and articulation. The Yamaha AEM patents devote a lot of space to timing and to mitigation of latency effects. The NSX-1 MIDI implementation has two NRPN messages to control timing:

  • Portamento Timing
  • Phoneme Unit Connect Type

The Phoneme Unit Connect Type has three settings: fixed 50 msec mode, minimum mode and velocity mode in which the velocity value changes the phoneme’s duration.

As I mentioned earlier, eVocaloid operates on a stream of phonetic symbols. Software sends phonetic symbols to the NSX-1 using either of two methods:

  1. System Exclusive (SysEx) messages
  2. NRPN messages

A complete string of phonetic symbols can be sent in a single SysEx message. Up to 128 phonetic symbols may be sent in the message. The size of the internal buffer for symbols is not stated, but I suspect that it’s 128 symbols. The phoneme delimiter is ASCII space and the syllable delimiter is ASCII comma. A NULL character must appear at the end of the list.

The NRPN method uses three NRPN message types:

  • Start of Phonetic Symbols
  • Phonetic Symbol
  • End of Phonetic Symbols

In order to send a string of phonetic symbols, software sends a start NRPN message, one or more phonetic symbol NRPN messages and, finally, an end of phonetic symbols NRPN message.

Phonetic symbols are stored in a (128 byte?) buffer. The buffer lets software send a phrase before it is played (sung) by the NSX-1. Each MIDI note ON message advances a pointer through the buffer selecting the next phoneme to be sung. The SEEK NRPN message lets software jump around inside the buffer. If software wants to start at the beginning of the buffer, it sends a “SEEK 0” NRPN message. This capability is really handy, potentially letting a musician start at the beginning of a phrase again if they have lost their place in the lyrics.

When I translated the Yamaha NSX-1 brochure, I encountered the statement: “eVocaloid and Real Acoustic Sound cannot be used at the same time. You need to choose which one to pre-install at the ordering stage.”. This recommendation is not surprising. Both RAS and eVocaloid must have its own unique database; RAS has instrument samples and eVocaloid has human vocal samples. I don’t think, therefore, that Pocket Miku has any RAS (AEM) musical instrument samples. (Bummer.)

Speaking of databases, conventional Vocaloid databases are quite large: hundreds of megabytes. eVocaloid is intended for embedded applications and eVocaloid databases are much smaller. I’ll find out how big once I take apart Pocket Miku. Sorry, Miku. 🙂

I hope this article has given you more insight into Yamaha Real Acoustic Sound and eVocaloid.

New Yamaha patents

Raining like crazy today, so it’s a good chance to look for new patents and patent applications.

First, here are a few new technical patents assigned to Yamaha. US Patent 9,536,508 titled “Accompaniment data generating apparatus,” awarded on January 3, 2017, describes accompaniment generation using a combination of MIDI and audio waveforms. The accompaniment generator follows chord changes, etc. just like today’s arrangers except that it also plays back melodic (pitched) audio phrases as well as MIDI. This is very likely the nexus of the next generation of Yamaha arrangers (flagship “GENOS“).

US Patent 9,514,728 titled “Musical performance apparatus that emits musical performance tones and control tones for controlling an apparatus,” awarded December 6, 2016, describes a system for near ultrasonic communication between a tablet and a keyboard. Software on the tablet controls tone generation on the keyboard, allowing an app to play back a musical performance (e.g., MIDI over near ultra sonic sound). I suspect that some future Yamaha patent will use this technology for wireless tablet to keyboard communication in place of Bluetooth or WiFi.

The third patent, number 9,489,938 is titled “Sound synthesis method and sound synthesis apparatus” and was awarded on November 8, 2016. The patent abstract says it best:

A sound synthesis apparatus connected to a display device, includes a processor configured to: display a lyric on a screen of the display device; input a pitch based on an operation of a user, after the lyric has been displayed on the screen; and output a piece of waveform data representing a singing sound of the displayed lyric based on the inputted pitch.

Yamaha have a stellar technology base in VOCALOID. I believe they are working toward a real-time system to sing lyrics. This would be a real breakthrough especially for pitch-challenged vocalists like me!

Finally, Yamaha was awarded several design patents covering the external industrial design of synth and arranger keyboards:

    D772,974   PSR-S670   November 29, 2016
    D776,189   Montage    January 10, 2017
    D778,347   YPT-255    February 7, 2017
    D778,346   Reface YC  February 7, 2017
    D778,345   Reface CP  February 7, 2017
    D778,344   Reface DX  February 7, 2017
    D778,343   Reface CS  February 7, 2017
    D778,342   ????       February 7, 2017

The final design patent, D778,342, is perplexing. I haven’t been able to associate it with a product in the North American market. A future product perhaps? It shows a 26-key keyboard with a four way, cursor-like pad. The keyboard design is E-to-F! I/O is on the left side panel.

Tip-toe through the tech

Last year ’bout this time, we were all holding our collective breath awaiting the new Yamaha Montage. There are two products which I expect to see from Yamaha sometime in the next one to two years:

  1. The successor to the mid-range MOXF synthesizer, and
  2. The successor to the top-of-the-line (TOTL) Tyros arranger workstation.

NAMM 2017 seems a little too soon for both products. In the case of the MOXF successor, Yamaha conducted marketing interviews during the summer of 2015. I would guess that MOXF sales are still pretty good and no new products from the usual suspects (Korg, Roland) are visible on the horizon. The Krome and FA could both use an update themselves. Not much market pressure here at the moment. (Korg’s NAMM 2017 announcements are, so far, a little underwhelming.)

Read my MOX retrospective and interview follow-up.

I suspect that the Tyros successor is somewhat closer to launch. Speculation has been heated ever since Yamaha filed for a US trademark on the word mark “GENOS”. The word mark was published for opposition on November 15, 2016. “Published for opposition” means that anyone who believes that they will be damaged by registration of the mark must file for opposition within 30 days of publication. If “GENOS” is indeed the name for the Tyros successor, then the 30 day period ending December 15, 2016 is cutting it very close to NAMM 2017. Even more ludicruous if Yamaha were to begin manufacturing products printed with that name for a NAMM 2017 launch. Imagine the scrap if opposition was successful!

For quite some time, I have been meaning to summarize the key U.S. patents that I believe to be GENOS-related. (Assuming that “GENOS” is the name!) I’ve procrastinated because the launch date is most likely fall 2017 at the earliest as previous Yamaha mid- and high-end arranger models are typically launched in the fall in anticipation of the holiday selling season.

A much larger barrier is the task of reading and gisting the patents. Patents are written in legalese and are much more difficult to read than the worst written scientific papers! One of the folks on the PSR Tutorial forum suggested making a list of the top five technologies for the new TOTL arranger. I generally hate the superficial nature of “list-icles,” but the suggestion is a good one. Nothing will get done as long as the barrier is big because I would much rather jam and play! I’m supposed to be retired.

The 2016 Yamaha annual report states that Yamaha want to make innovative products which are not easily copied by competitors. Patents — legally protected intellectual property — are essential to achieving this goal. Generally, a company only applies for a patent on technology in which they have a serious business interest due to the significant cost of obtaining and maintaining patent protection.

So, here are a few of Yamaha patented technologies which could appear in future products — perhaps GENOS, perhaps others.

SWP70 tone generator

This may seems like old news…

The next generation SWP70 tone generator first appeared in the mid-range Yamaha PSR-S970 arranger workstation. The SWP70 made its second appearance in the Yamaha Montage synthesizer. The S970 incorporates only one SWP70 and does not make full use of the chip. (At least three major interfaces are left unconnected.) In keeping with Yamaha’s TOTL design practice, the Montage employs two SWP70 integrated circuits: one each for AWM2 sample-playback and FM. A second sample cache interface on the AWM2 side is unconnected.

The Tyros successor likely will use two SWP70 tone generators, too. The number of available tone generation channels with two SWP70s will be massive (512 channels). Yamaha could opt for a single SWP70 and still outmatch the current generation Tyros 5. Like the Montage, there will be enough insert effect DSP processors to cover each style and user part, as many as two for every part.

It will be interesting to see (and hear) if the GENOS will make use of the second sample cache interface. A second cache would not only support more tone generation channels, but might be necessary for long, multi-measure musical phrases that are needed for full audio styles (discussed below).

The SWP70 flash memory interface follows the Open NAND FLASH interface (ONFI) standard, the same as solid state drives (SSD). ONFI memory devices can be stacked on a bi-directional tri-state bus, so potentially, the GENOS could support a large amount of internal waveform storage. This flash memory will contain the “expansion memory,” that is, physical memory reserved in flash memory for user waveforms. The expansion flash memory expansion modules (FL512M, FL1024M) are dead, Jim.

If you’re interested in Yamaha AWM2 tone generation, here’s a few patents to get you started:

  • Patent 9,040,800 Musical tone signal generating apparatus, May 26, 2015
  • Patent 8,383,924 Musical tone signal generating apparatus, February 26, 2013
  • Patent 8,389,844 Tone generation apparatus, March 5, 2013
  • Patent 8,957,295 Sound generation apparatus, February 17, 2015
  • Patent 8,035,021 Tone generation apparatus, October 2011
  • Patent 7,692,087 Compressed data structure and apparatus and method related thereto, April 6, 2010

U.S. Patent 8,957,295 is the patent issued for the SWP70 memory interface. U.S. Patent 9,040,800 describes a tone generator with 256 channels — very likely the SWP70.

Pure Analog Circuit

This may seem like old news, too, since Pure Analog Circuit (PAC) debuted in the Yamaha Montage.

Pure Analog Circuit is probably the least understood and least appreciated feature of the Montage. It’s not just better DACs, people. The high speed digital world is very noisy as far as analog audio is concerned. Yamaha separated the analog and digital worlds by putting the DACs and analog electronics on their own printed circuit board away from noisy digital circuits. Yamaha then applied old school engineering to the post-DAC analog circuitry, paying careful attention to old school concerns like board layout for noise minimization and clean power with separate voltage regulation for analog audio. Yamaha’s mid- to high-end products have always been quiet — PAC is pristine.

Since the PAC board is a separate, reusable entity, I could see Yamaha adopting the same board for GENOS.

Styles combining audio and MIDI

Yamaha are constantly in search of greater sonic realism. Existing technologies like Megavoices and Super Articulation 2 (Advanced Element Modeling) reproduce certain musical articulations. However, nothing can really match the real thing, that is, a live instrument played by an experienced professional musician. PG Music Band-in-a-Box (BIAB), for example, uses audio tracks recorded by studio musicians to produce realistic sounding backing tracks. The Digitech TRIO pedal draws on the PG Music technology for its tracks. (“Hello” to the Vancouver BC music technology syndicate.)

Yamaha have applied for and been granted several patents on generating accompaniment using synchronized audio and MIDI tracks. Here is a short list of U.S. patents:

  • Patent 9,147,388 Automatic performance technique using audio waveform data, September 29, 2015
  • Patent 9,040,802 Accompaniment data generating apparatus, May 26, 2015
  • Patent 8,791,350 Accompaniment data generating apparatus, July 29, 2014
  • Application 13/982,476 Accompaniment data generating apparatus, March 12, 2012

There are additional patents and applications. Each patent covers a different aspect of the same basic approach, making different claims (not unusal in patent-land). Yamaha have clearly invested in this area and are staking a claim.

The patents cite four main motivations, quoting:

  1. The ability to produce “actual musical instrument performance, human voices, natural sounds”
  2. To play “automatic accompaniment in which musical tones of an ethnic musical instrument or a musical instrument using a peculiar scale”
  3. To exhibit the “realism of human live performance”
  4. To advance beyond known techniques that “provide automatic performance only of accompaniment phrases of monophony”

Your average guy or gal might say, “Give me something that sounds as natural as Band-in-a-Box.” Yamaha sell into all major world markets, so the ability to play ethnic instruments with proper articulation is an important capability. Human voice, to this point, is limited to looped and one-shot syllables, e.g., jazz scat. The new approach would allow long phrases with natural intonation. [Click on images in this article for higher resolution.]


Currently, mid- and high-end Yamaha arrangers have “audio styles” where only the rhythm track is audio. The patents cover accompaniment using melodic instruments in addition to rhythm instruments. The melodic audio tracks follow chord and tempo changes just like the current MIDI-based styles. Much of the technical complexity is due to synchronization between audio and MIDI events. Synchronization is troublesome when the audio tracks contain a live performance with rubato. Without good synchronization, the resulting accompaniment doesn’t feel right and sounds sloppy.

Accompaniment from chord chart

This next feature will be very handy. U.S. Patent 9,142,203 is titled “Music data generation based on text-format chord chart,” September 22, 2015. If you use textual chord charts (lyrics plus embedded chord symbols), you will want this!


Simply put, the technique described in this patent translates a textual chord chord to an accompaniment. The accompaniment is played back by the arranger. The user can select tempo, style, sections (MAIN, FILL IN) and so forth.

The translator/generator could be embedded in an arranger or it could be implemented by a PC- or tablet-based application. Stay tuned!

Selectively delayed registration changes

A registration is a group of performance parameters such as the right hand voice settings, left hand voice settings, accompaniment settings, and so forth. Mid- and high-end arrangers have eight front panel buttons where each button establishes a set of parameter values (“readout”) when the button is pushed. It’s the player’s job to hit the appropriate button at the appropriate time during a live performance to make voice settings, etc. A player may need a large number of buttons, if a musical performance is complicated.

Usually only a few parameters are different from one registration to the next. Recognizing this, the technique described by U.S. Patent 9,111,514 (“Delayed registration data readout in electronic music apparatus,” August 18, 2015) delays one or more parameter changes when a button is pushed. The user specifies the parameters to be delayed and the delay (such as the passage of some number of beats or measures, etc.) Thus, a single registration can cover the work of multiple individual registrations.


I’ll have to wait to see the final product to assess the usefulness of this feature. Personally, I’d be happy with a configuration bit to keep OTS buttons from automatically turning on the accompaniment (ACCOMP). Sure would make it easier to use the OTS buttons for voice changes.

Ensembles / divisi

Tyros 5 ensemble voices assign played notes to individual instrument voices in real time, allowing a musician to perform divisi (divided) parts. Tyros 5 ensembles can be tweaked using its “Ensemble Voice Key Assign Type List.” Types include open, closed, and incremental voice assignment. U.S. Patent 9,384,717, titled “Tone generation assigning apparatus and method” and published July 5, 2016, extends Tyros 5 ensemble voice assignment.

The technique described in 9,384,717 gives the musician more control over part assignment through rules: target depressed key, priority rule, number of tones to generated, note range, etc. The rules handle common cases like splitting a single note to two or more voices.


These extensions could lead to some serious fun! I didn’t feel like the Tyros 5 ensemble feature was sufficiently smart and placed too many demands on the average player, i.e., less-than-talented me. The rules offer the opportunity to shift the mental finger work to software and perhaps could lead to more intuitive ensemble play. Neat.

Voice synthesis

As I alluded to earlier, arrangers make relatively primitive use of the human voice. Waveforms are usually limited to sustained (looped) or short (one-shot) syllables.

Yamaha have invested a substantial amount of money into the VOCALOID technology. VOCALOID draws on a singer database of syllable waveforms and performs some very heavy computation to “stitch” the individual waveforms together. The stitching is like a higher quality, non-real time version of Articulated Element Modeling (AEM).

VOCALOID was developed through a joint research project (led by Kenmochi Hideki) between Yamaha and the Music Technology Group (MTG) of the Universitat Pompeu Fabra in Barcelona, Spain. VOCALOID grew from early work by J. Bonada and X. Serra. (See “Synthesis of the Singing Voice by Performance Sampling and Spectral Models.”) More recent research has stretched synthesis from the human voice to musical instruments. Yamaha hold many, many patents on the VOCALOID technology.

Patent 9,355,634, titled “Voice synthesis device, voice synthesis method,” is a recent patent concerning voice synthesis (May 31, 2016). It, too, draws from a database of prerecorded syllables. The human interface is based on the notion of a “retake,” such as a producer might ask a singer to make in a recording studio using directives like “put more emphasis on the first syllable.” The retake concept eliminates a lot of the “wonky-ness” of the VOCALOID human interface. (If you’ve tried VOCALOID, you know what I mean!) The synthesis system sings lyrics based on directions from you — the producer.

An interface like this would make voice synthesis easier to use, possibly by novices or non-technically oriented musicians. The big question in my mind is whether voice synthesis and editing can be sped up and made real time. Still, wouldn’t it be cool if you could teach your arranger workstation to sing?

Music minus one

This work was conducted jointly with the MTG at the Universitat Pompeu Fabra. A few of the investigators were also involved in VOCALOID. Quoting, “The goal of the project was to develop practical methods to produce minus-one mixes of commercially available western popular music signals. Minus-one mixes are versions of music signals where all instruments except the targeted one are present.”

This is not good old center cancellation. The goal is to remove any individual instrument from a mix regardless of placement in the stereo field. You can hear a demo at http://d-kitamura.sakura.ne.jp/en/demo_deformation_en.htm.

I doubt if this technique will appear on an arranger; the computational requirements are too high and the method is not real time. However, “music minus-one” is very appealing to your average player (that is, me). My practice regimen includes playing with backing tracks. I would love to be able to play with any commercial tune on whim.

There are patents:

  • US Patent 9,002,035 Graphical audio signal control
  • US Patent 9,224,406 Technique for estimating particular audio component
  • US Patent 9,070,370 Technique for suppressing particular audio component

and there are scientific papers:

  • “Audio Source Separation for Music in Low-latency and High-latency Scenarios”, Ricard Marxer Pinon, Doctoral dissertation, Universitat Pompeu Fabra, Barcelona, 2013.
  • D. Kitamura, et al., “Music signal separation by supervised nonnegative
    matrix factorization with basis deformation,” Proc. DSP 2013, T3P(C)-1, 2013.
  • D. Kitamura, et al., “Robust Music Signal Separation Based on Supervised Nonnegative Matrix Factorization with Prevention of Basis Sharing”, ISSPIT, December 2013.

Music analysis

Yamaha have put considerable resources into what I would call “music analysis.” These technologies may not (probably will not) make it into an arranger keyboard. They are better suited for PC- or tablet-based applications.

I think we have seen the fruits of some of this labor in the Yamaha Chord Tracker iPad/iPhone application. Chord Tracker identifies tempo, beats, musical sections and chords within an audio song from your music library. It displays the extracted info in a simple chord chart and can even send the extracted “lead sheet” to your arranger. The arranger plays back the “lead sheet” as an accompaniment using the selected style.

We’re probably both wondering if Chord Tracker will integrate with the chord chart tool described above. Stay tuned.

Yamaha Patent 9,378,719 (June 28, 2016) is a “Technique for analyzing rhythm structure of music audio data.” Patent 9,117,432 (August 25, 2015) is an “Apparatus and method for detecting chords.” I wouldn’t be surprised if Chord Tracker draws from these two patents.

Yamaha has also investigated similarity measures and synchronized score display:

  • Patent 9,053,696 Searching for a tone data set based on a degree of similarity to a rhythm pattern, June 9, 2015
  • Patent 9,006,551 Musical performance-related information output device, April 14, 2015
  • Patent 9,275,616 Associating musical score image data and logical musical score data, March 1, 2016

I’m not sure where Yamaha is going with similarity measures and searching. Will they use similarity measures to selected accompaniment phrases? Who knows?

The work on score display synchronizes the display of the appropriate part of a musical score with its live or recorded performance. These techniques may be more appropriate to musical education and training, particularly for traditional brass, string and woodwind players. Yamaha derives considerable revenue from traditional instruments and this is perhaps a way to enhance their “ecosystem” for traditional acoustic instruments.

Score display is one possible application of Yamaha’s patented technique to transmit performance data via near-ultrasonic sound. The technique borrows one or more tone generation channels to generate the near-ultrasonic data signal. See my earlier post about U.S. Patent 8,779,267 for more details.

So long for now!

That’s it! I hope you enjoyed this brief tour through a few of Yamaha’s recent patent grants and filings.

If you want more information about a particular patent, then cruise on over the the U.S. Patent and Trademark Office (USPTO) web site. Navigate to patent search and plug in the patent number.

Won’t be long, yeah!

Winter NAMM 2017 starts in two weeks (January 19). As usual, we gear freaks can’t wait to get our annual new product fix!

Roland jumped the field and announced a few new products at the 2017 Consumer Electronics Show (CES). They appear to be rolling out a new consumer-oriented product line, “GO:”, for amateur musicians and music makers.

Roland announced two new keyboards for beginning players: the GO:KEYS (G-61K and G-61KL) and the GO:PIANO. Both products target the entry-level market currently dominated by Yamaha and Casio. This is a smart business move as the entry-level segment moves a lot of units and offerings in this segment have been getting stale. Here are estimated USA sales statistics for 2014 in the “portable keyboard” segments:

    Category                       Units            Retail value
    -----------------------------  ---------------  -------------
    Portable keyboards under $199    656,000 units  $ 64,000,000
    Portable keyboards over $199     350,000 units  $123,000,000
    Total portable keyboards       1,006,000 units  $187,000,000

    (Source: NAMM)

Unit volume is high, but price and margins are razor thin. Keyboards in the “under $199” category are sold mainly in big box stores, not musical instrument retailers. So, it will be interesting to see where the new Roland keyboards are sold.

The GO:KEYS is most similar to an entry-level arranger keyboard. Estimated street price is $299. Roland is selling two models: a model with Bluetooth support and a model without. Probably depends on their ability to get RF type acceptance in a country or region. The GO:KEYS claims General MIDI 2 (GM2) support among 500 “pro-quality” sounds. The GM2 tone set consists of 256 melodic instruments and nine drum kits. I produced quite a few decent backing tracks using the Roland GM2 sound set on its RD-300GX stage piano. If Roland adopted this set, then the GO:KEYS should sound pretty decent (at least through external monitors rather than its internal speakers). No manual yet so it’s hard to say specifically what other sounds are included. Even if they recycled some chestnuts from the old JV/XP/XV, there is hope.


The Roland GO:PIANO is, ta-da, a portable piano. This product has the Yamaha Piaggero line in its cross-hairs. The estimated street price is $329. Again, no manual, so it’s hard to assess the feature set. Pricing on both products places them at the higher end of the entry-level market. The inclusion of Bluetooth support at this price point is a significant differentiator.


Both the GO:KEYS and GO:PIANO are battery powered (six AA batteries) in addition to an AC adapter. Both products use one-off fixed field LCD text and graphics like the lower cost Yamaha and Casio models. The key beds look decent, but we will have to play them in order to assess feel and quality. At least the keys are full size — not mini-keys, thank you.

If the Roland sounds are indeed up to snuff, Roland may be able to take sales away from Yamaha and Casio. Yamaha has been coasting with its entry-level sound set for over a decade and the recent PSR-E453 refresh did little to rejuvenate the entry-level segment. It will be interesting to see if Roland can win sales and spur innovation at the low end.

The GO:MIXER is positioned as an audio mixer for your mobile phone. It is USB powered, however, with no battery option. The GO:MIXER has guitar, microphone, instrument and media player inputs with associated mixing level control. There is a stereo monitor output as well as a “center cancel” feature. The estimated street price is $99USD.


Although Roland promote it for video production, I could see musicians using the GO:MIXER for a quick mix in the field. It certainly has enough inputs that a small group of pals could plug in and jam away.

FreePlay style deconstructed

Yamaha FreePlay styles for PSR and Tyros are terrific for music without rhythm instruments and strong rubato (variation in tempo to achieve a musical or emotional effect).

I’m customizing a few FreePlay styles with the intention of using them for liturgical music. In the first pass, I’m changing the OTS voice settings and I’m making a registration that calls up my go-to voices for traditional and contemporary church music.

Of course, my curiosity took over and I had to take a look inside of a FreePlay style or two using a DAW and Michael B’s StyleDump program. I have attached a text file with my working notes. The notes may be too much detail for most readers, so here is a quick summary of what I found. I’ve looked at only two styles so far: EtherealHymn (taken from the CVP-609) and OrganPlay1 (taken from the Church Organ expansion pack).

First off, how does it sound and feel to play a FreePlay style? The accompaniment is triggered and guided by the left hand as usual. (I haven’t tried FreePlay with AI fingering, etc. yet.) The accompaniment plays a gentle pad-like chord and a simple bass. The simplicity provides a blank canvas on which you can embellish to your heart’s content.

You might guess that the MAIN and FILL IN sections are quite simple and you would be right. The MAIN sections in the OrganPlay1 and EtherealHymn styles hold notes for 8 and 32 measures, respectively. The chord source in each case is CMaj7. The BASS track holds a single note (e.g., C2) through the entire section. The chord or pad tracks hold the rest of the notes that make up the CMaj7 chord: E, G and B. Harmony-wise, that’s it!

The FILL IN sections are similar and hold notes for just one measure because FILL IN sections are only one MIDI bar long.

Without a rhythm track, those looooooooong notes have a timeless quality. A musician would rarily — if ever — hold a chord that long. Thus, MAIN sections typically do not re-trigger.

Yamaha’s genuine contribution lies in the INTRO/ENDING sections and the fun MIDI stuff that happens during the MAIN sections. The INTRO and ENDING sections have more “orchestration” and consist of style appropriate introductory and ending phrases. For my own purposes, I will probably stick to the simple INTRO A and ENDING A sections as it’s generally hard to match up more complicated musical phrases with the main theme itself.

The “MIDI stuff” must have been fun to program. The EtherealHymn style has string and choir tracks. The string track has MIDI expression data (Control Change 11 or “CC11”) that repeatedly ramps up for two measures and down for two measures. The ramp pattern creates alternating string swells up and swells down. Other control change patterns are rather unusual and I’ll leave that for you to explore with a DAW! (All you need to do is to change the “.STY” or “.FPS” extension to “.MID” and import the renamed file into a DAW.)

One could create a basic FreePlay style from scratch. The MIDI notes in the MAIN and FILL IN sections are dirt simple. The fun part would be selecting instrument voices and effects with dynamic elements that give life to the accompaniment. Then there is the creative aspect of driving the voices and effects with MIDI controller data. For INTRO and ENDING sections, a little Bach or Mozart would do.

Hmmm, sounds like a fun wintertime project!

The long view

Here’s some information attributed to Martin Harris from Yamaha. Martin is one of the key sound developers at Yamaha:

  • Better Pianos
  • New Strings – 70 piece Seattle Symphony Orchestra Mega
  • New Orchestral Brass – highly dynamic
  • New Tuned Percussion – Glock, Xylo, Marimba and Vibes (with motor on)
  • New Mega guitars – Telecaster with Finger and Plectrum
  • SA2 Celtic Violin
  • New Synth Voices
  • New Classical Choir – Cathedral ambience
  • New Gospel Choir – Various articulations and Ad libs
  • New Pop Vocals – 4 session singers, 2 male and 2 female
  • Singing many dynamics and many articulations (wave cycling)

Montage? No, Tyros 4. The “SA2” should be a clue as the Montage does not provide Super Articulation 2 (SA2) voices.

My purpose here is not to be tricky, but to make the case that sample-based workstations or synthesizers draw from the sound pool that is available at development time, much the same way that hardware designers draw on the pool of available components. Products cannot be composed of imaginary circuits (“sand”), software, and sounds, after all.

To better illustrate this point, here is a rough timeline for the Tyros and Motif product lines with a few mid-range products (S9xx and MOX) thrown in:

             Tyros                        Motif/Montage
----   ------------------  ------------------------------------------
Year   Model     Physical  Model     Physical  Uncompressed waveforms
----   ------------------  ------------------------------------------
2001                       Motif      48MB     84MB 1,309 waveforms
2002   Tyros      96MB
2003                       Motif ES   96MB     175MB 1,859 waveforms
2005   Tyros 2   192MB
2007                       Motif XS  128MB     355MB 2,670 waveforms
2008   Tyros 3   256MB
2010   Tyros 4   512MB     Motif XF  256MB     741MB 3,977 waveforms
2011                       MOX       128MB     355MB 2,670 waveforms
2012   PSR-S950  256MB
2013   Tyros 5   768MB     MOXF      256MB     741MB 3,977 waveforms
2015   PSR-S970    2GB
2016                       Montage     4GB     5.67GB 6,347 waveforms

I included physical wave memory size for each product. I also included the uncompressed total sample size and number of waveforms for each member of the Motif/Montage line.

Clearly, Yamaha know how to ride the memory technology curve. Memory technology has progressed to the point where it is no longer a significant hardware design factor. Rather, the amount of wave memory in a product depends more upon the ability of the sound designers to fill it with quality content and mid- versus premium-product grading (i.e., the target market segment and price point for the model). For example, note that the mid-range S970 has more than twice the physical wave memory than the Tyros 5. Although the “expansion memory” is reserved in the S970’s physical wave memory, the S970 waveform content is substantially smaller than the Tyros 5.

The other characteristic to note is how the Tyros and Motif lines tend to leapfrog each other. Generally, the Tyros line leads the Motif line in physical wave memory and content. This is partly due to the higher memory requirements of SA2 voices, which require many additional articulation samples.

Both the Tyros 4 and Motif XF were released in 2010. Both machines use two SWP51L tone generators. (Newer products like the Montage use the SWP70 tone generator.) The Tyros 4 has twice the physical wave memory capacity with respect to the Motif XF. Yet, the Tyros 4 has sample content which did not make it to a deliverable product in the Motif line until the Montage in 2016: Seattle strings, orchestral brass, Celtic violin, vocals (choir and scat), Telecaster guitar and suitcase electric piano.

Tyros 5 expanded this content in 2013. The Motif XF, on the other hand, received a significant update in January 2014. The V.150 update added the “Real Distortion” effects implemented by the Tyros 5. (A few Real Distortion effects actually premiered in the mid-range S950.) The V1.50 update and the “White Motif” color job were life-extenders for the Motif line. I’ve conjectured before that Montage development was late and this is further evidence.

So, what can we expect in the Tyros successor which I’m calling the “Tyros++”. (Yamaha have trademarked the name “GENOS” which may be the name of the follow-on. Only Yamaha really knows.) Personally, I’m hoping for the new orchestral woodwinds from Montage. These are superbly expressive voices. I’m also expecting improved electric pianos, again, of comparable quality to the Montage.

SA2 voices will probably remain exclusive to the Tyros line. Many folks hoped that Montage would have SA2 and it didn’t. SA2 is an important product differentiator — kind of like the premium “Natural” piano voices are to the Clavinova line. I suspect that FM voices will be a differentiator for the premium Montage line in years to come as well. Yamaha tends to think of these three product lines as distinct, so cross-over is carefully controlled and limited.

All of this talk about samples and wave memory size is overly reductionist. The three main (DMI) product lines — Tyros, Motif/Montage, Clavinova — have distinct personalities and features. Motif/Montage is a synthesizer for stage and production studio. Clavinova is primarily a home or church piano. Tyros serves double duty as a home keyboard and as a workstation for performing professionals. (Oddly, many USA customers scoff at this latter role.)

Although these are all fine instruments, the personalities have quirks. Upper-range Clavinovas are Tyros-in-disguise except for multi-pads, third RIGHT voice (i.e., only two voice layers in the right hand), and no expansion memory. Tyros does not have the deep editing or modulation features of the Motif/Montage. The Motif and Montage — strangely! — do not have a tonewheeel organ mode. This latter omission is hard to understand since the Montage competes against other “stage” products like the Korg Kronos and Nord Stage.

Having compared voice programming between PSR-S950 (Tyros 3 without SA2 voices) and MOX (Motif XS sound set), the product lines are voiced (programmed) differently. Motif/Montage effect programming has a harder edge than the Tyros, which is oriented toward oldies, pop and jazz standards. (Yes, Virginia, the Tyros does have latent EDM potential to be tapped.) If the Tyros++ includes the orchestral woodwinds, for example, they will probably be programmed rather differently than Montage. Tyros++ four-part divisi ensembles with the new orchestral woodwinds would be simply brilliant. Can’t wait to see and hear what happens!

One finally editorial comment. The world is filled with product reviews. Publications like Keyboard magazine, Electronic Musician, etc. focus on individual products and rarely present a deep, long-term perspective on products. Sound On Sound reviews occasionally give historical background — usually for esoteric, retro studio pieces. As consumers, we need the long view in order to make the most informed choice.

Motif styles for your arranger!

I’m pleased to announce my collection of Motif performance styles for the Yamaha PSR-S950 arranger and its close cousins: Tyros 5, PSR-S770 and PSR-S970.

Motif and MOX are great song-writing machines with thousands of built-in musical phrases. In Motif-speak, these phrases are called “arpeggios.” Motif/MOX also have built-in “Performances” which combine these musical phrases into jam-along song starters. Although Motif-series workstations are not arranger keyboards, the Performances are fun for live jams, covering many modern genres (contemporary jazz, funk and R&B) which are underserved by arranger workstations.

To fill this gap, I translated 23 Motif performances to PSR/Tyros styles. In keeping with the original source material, these styles are stripped down and lean. No orchestration to get in the way! Some styles use only bass and drum. INTROs and ENDINGs are short and basic. Depending upon the source performance, a translated style may have only three MAIN sections. However, all styles bring the groove.

Many of the styles use Megavoice bass and guitar. Plus, I’ve added appropriate OTS voices. Of course, you’re welcome to ditch the OTS voices and replace them with your own.

Here is the link to the ZIP file: perf_for_s950.zip. The file unzips into a directory named “PERF_for_S950”. The ZIP file includes a short READ ME file with more information.

If you would like to know how I translate a Motif/MOX performance to a PSR/Tyros style, please read the following articles:

Tenor to the max!

A few posts ago, I deconstructed the Yamaha MOX (Motif XS) tenor saxophone patches. The article summarizes the waveform assignment and Expanded Articulation (XA) control for each element within a preset voice. I’m not going to dive into the basics here, so I recommend reviewing the article for background information on XA and its behavior.

The blog entry covered the MOX (Motif XS) tenor sax presets, but not the newer Motif XF (MOXF) presets. The XF series workstations have two additional waveforms:

  1. Tenor Sax2 Growl
  2. Tenor Sax2 Falls

bringing the XF up to the level of Tyros/PSR Super Articulation tenor sax voices. This article deconstructs the “Tenor MAX” preset which makes use of these additional waveforms. The analysis is relevant even in the Montage era because the Montage tenor sax is based upon the XF waveforms (no update in the new model).

Pushing the main topic aside for a moment, Super Articulation 2 (SArt2) voices are a whole different technology and even to this day, the Motif and Montage do not implement SArt2 voices. SArt2 seems to be a premium feature that is reserved for Tyros. SArt2 requires realtime analysis of playing gestures and computation which is beyond basic AWM2 synthesis.

The table below gives the waveform, key range, and velocity range for each element in the “Tenor MAX” patch.

    Elem#  Waveform            XA        Notes   Velocity
    -----  ------------------  --------  ------  --------
      1    Tenor Sax2 Soft     AllAFOff  C-2 G8    1   79
      2    Tenor Sax2 Med      AllAFOff  C-2 G8   80  110
      3    Tenor Sax2 Growl    AllAFOff  C-2 G8  126  127
      4    Tenor Sax2 Hard     AllAFOff  C-2 G8  111  125
      5    Tenor Sax2 Hard     AF2 On    C-2 G8    1  127
      6    Tenor Sax2 Falls    AF1 On    C-2 G8    1  127

When the AF1 and AF2 buttons are OFF, one of the first four waveforms are triggered based upon the key velocity. The four elements cover the dynamic range from soft, through medium, through hard, all the way up to growl. The AF1 and AF2 buttons select particular waveforms depending upon the player’s intention. When AF2 is ON, all key velocities trigger the hard waveform. When AF1 is ON, all key velocities trigger sax falls.

So, bottom line, the “Tenor MAX” programming is just about what I expected.

I hope the analysis of tenor sax programming has helped you to understand XA and Motif/MOX voice programming. If you’re a Tyros/PSR player, then I hope that this analysis has helped you to understand a little bit of the technology beneath the Super Articulation voices.

Montage review: Yes, I’ve played one!

The Yamaha Montage synthesizer is now hitting stores in North America. One of the local retailers (GC in Natick) have a Montage set up for demo. Let’s go!

The demo unit is a Montage8 with the 88-key balanced hammer effect keyboard. I have always liked Yamaha’s upper-end “piano” actions and the Montage8 is no exception. I primarily play lighter “synth” action keyboards like the MOX and the PSR-S950. Fortunately, I spent the previous week working out on the Nord Elecro 2 waterfall keyboard, which requires a slightly heavier touch. I played the Montage8 for a little bit more than an hour without my hands wilting — a good sign.

First off, the demo unit was plugged into two Yamaha HS7 monitors and a Yamaha HS8S subwoofer. GC usually patches keyboards through grotty keyboard amplifiers, so I suspect that Yamaha provided the monitors in order to create the best impression of the Montage. I was dismayed when I started off with a few B-3 organ patches and could not contain the low end. The front panel EQ simply didn’t do the job. Time to check the monitor settings. The HS7s were flat, but the HS8S subwoofer level was cranked. After backing off the sub, all was right with the world.

Yes, some people like to simulate small earthquakes with subsonic frequencies. This, however, is not conducive for acoustic music. It’s not conducive for peaceful co-existence with your bass player either. If you encounter a Montage in the wild, check the EQ before proceeding!

So, as you may have gathered already, this is not a review of Montage for EDM. I took along my church audition folder (covering gospel to contemporary Christian to traditional and semi-classical music) and a small binder of rock, jazz, soul and everything in between. I’d like to think that this is the first time anyone has played “Jesu, Joy of Man’s Desiring” on the Montage, however poorly.

The electric pianos are terrific. I had a fine old time playing soul jazz and what not. Great connection between keys and sound. Comparing against Nord Stage, I would say that the Montage is top notch in this department and definitely a cut above the old Nord Electro 2. Yamaha did not put the Reface CP (Spectral Component Modeling) technology into Montage; they didn’t need to.

Tonewheel organ is still Yamaha’s Achilles’ heel. There is some modest improvement, but the Montage is not in clone territory. In this area, I would say, “Advantage Nord.” If I can cover B-3 with the MOX on Sunday, I’m sure that the Montage is up for medium duty. However, the tonewheel organs lack the visceral thrill of the EPs. I will say that the 88-key action did not inhibit my playing style too much. (If I was going to buy a Montage, tho’, it would be a 6.)

The pipe organs got some tweaks, mainly by enhancing the Motif pipe organ sounds via FM. There are a few lovely patches, but I will still look to the Tyros (and the PSR expansion pack) for true realism. The Nord Electro 5d has modeled principal organ pipes where the drawbars change the registration. Ummm, here, I would give the edge to Nord. Plus, the pipe organs in the Nord sample library are more on par with the Tyros and PSR expansion pack. Hate to say it: Montage pipe organs are good “synthesizer pipe organs,” and that ain’t entirely a compliment.

The new strings are wonderfully realistic, especially for solo/melody lines. I really enjoyed bringing sections in and out dynamically. (The expression pedal was sync’ed to the SuperKnob.) With the changes in our music ministry group, I’ve been playing more melodic and exposed parts. I could really dig playing a reflective improvisation for meditation using the strings and woodwinds under Motion Control.

The classical woodwinds got a boost in Montage, too. The woodwinds are all excellent although the sonic delta above Motif XF (MOXF and MOX, too) was not as “Wow” as the strings. Most likely, my ears were getting tired at that point…

Since I was losing objectivity, I just briefly touched on brass. I need good French horns and Montage did not disappoint. I wish that I had spent time with the solo trumpets and trombones, but my ears were telling me to knock it off.

The new Telecaster (TC) is quite a treat. The “Real Distortion” effects (Motif XF update 1.50) are now standard and the programmers made good use of them. I wish that the Montage had the voice INFO screen from the PSR/Tyros series. The INFO screen displays playing tips and articulations for each voice. This makes it a lot easier to find and exploit the sonic “Easter eggs” in the patches. (“Play AF1 to get a slide. Play AF2 to get a hammer on.”)

Fortunately, it was a rainy Saturday afternoon and the store was empty — disturbed only by the occasional uncontrolled rugrat pounding on some poor defenseless keyboard. Overall, I felt like I really heard the Montage and could make a fair evaluation.

I did not dive into editing, arpeggios, motion sequencing, recording, etc., so this is surely not a comprehensive review. Anyone spending less than one month with this ax cannot claim “comprehensive.” It just ain’t possible, so I would call my initial opinion, “first impressions.” That said, I can see why the Live Sets are important. I mainly dove in through Category Search where some of the touch buttons are a wee too small. Punching up a sound in full combat requires BIG buttons.

Montage looks, feels and sounds like a luxury good. Montage is also priced like a luxury good. The Montage8 MAP is $4000 USD. It is quite a beast physically and I would most likely go for the Montage6 at a “mere” 33 pounds and $3000 USD. None of the Montage line would be an easy schlep, especially when I have to buzz in and out of my church gig fast.

Would I buy one? Tough call. On the same field trip, I got to sit in a Tesla Model S ($71,000 USD) — a luxury car built around a computer monitor or two. I just recently bought a Scion iM (AKA Toyota Auris, Levin, Blade, whatever) for about $20,000 USD. Both cars could get me to the gym and back. I like my iM. What does that say about me as a customer? Do you think I would buy a Montage? Enigmatic.

See the list of new waveforms in the Montage. Also, check out the latest blog posts! Update: May 10, 2016.