Casio singing synthesis in pictures

I’m slowly immersing myself in the singing synthesis technology behind the Casio CT-S1000V. Heck, ya need somethin’ to do during TV advertisements while watching sports. 🙂

There are two major approaches to speech (singing) synthesis: unit-selection and statistical parametric.

Most people are familiar with unit-selection systems like Texas Instruments old Speak and Spell or the much more advanced Yamaha Vocaloidâ„¢. Unit-selection relies upon a large database of short waveform units (AKA phonemes) which are concatenated during synthesis. The real trick behind natural sounding singing (and speech) is the connective “tissue” between units. Vocaloid creates waveform data that connects individual phonetic units.

If you are familiar with Yamaha’s Articulation Element Modeling (AEM), a light should have lit in your mind. The two technologies have similarities, i.e., joining note heads, bodies, and tails. The Yamaha NSX-1 chip implements a stripped down Vocaloid engine and Real Acoustic Sound (AEM).

The content and size of the unit waveform database is a significant practical problem. The developers must record, organize and store a huge number of sampled phrases (waveform units). The Vocaloid 2 Tonio database (male, operatic English singer) occupies 750MBytes on my hard drive — not small and was a real challenge to collect, no doubt.

Statistical parametric systems effectively encode the source phonetic sounds into a model such as an hidden Markov model (HMM). During training, the source speech is subdivided into temporal frames and the individual frames are reduced to acoustic parameters. The model learns to associate specific text with the corresponding acoustic parameters. During synthesis, the model is fed text and acoustic parameters are recalled by the model. The acoustic parameters drive some form of vocoding. (“Vocoding” is used broadly here.)

Deep neural networks (DNN) improve on HMM. Sinsy is a DNN-based singing voice synthesis (SVS) system from Nogoya Institute of Technology. It is the culmination of many years of research by sensei Professor Keiichi Tokuda, his students and colleagues. It was partially supported by the Casio Science Promotion Foundation. Thus, adoption by Casio is hardly accidental!

Sinsy Singing Voice Synthesis System

The Sinsy block diagram is taken from their paper: Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System, by Yukiya Hono, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda, EEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 2803-2815, 2021. The method is quite complex and consists of several models. It’s not clear (to me, yet) if the Casio approach has all elements of the Sinsy approach. I recommend reading the paper, BTW; it’s well-written and highly technical.

Casio U.S. Patent 10,789,922 vocal synthesis

The next block diagram is taken from Casio’s U.S. Patent number 10,789,922 awarded September 29, 2020. Their approach is separated into a training phase and a synthesis (playing) phase. You’ll notice that Casio employ only an acoustic model. The patent discloses a “Voice synthesis LSI” unit, so their software may have a hardware assist. We’ll need to take a screwdriver to the CT-S1000V to find out for sure!

A picture is worth a thousand words. A technical diagram, however, requires a little interpretive context. 😉 Paraphrasing the Casio patent:

The text analysis unit produces phonemes, parts of speech, words and pitches. This information is sent to the acoustic model. The acoustic model unit estimates and outputs an acoustic feature sequence. The acoustic model represents a correspondence between the input linguistic feature sequence and the output acoustic feature sequence. Acoustic feature sequence includes:

  • Spectral information modeling the vocal tract (cepstrum MEL coefficients, line spectral pairs, or similar).
  • Sound source information modeling vocal chords (fundamental pitch frequency (F0) and power value).

The vocalization model unit receives the acoustic feature sequence. It generates singing voice inference data for a given singer. The singing voice inference data is output through a digital-to-analog converter (DAC). The vocalization model unit consists of:

  • A sound source generator:
    • Generates a pulse train for voiced phonemes.
    • Generates white noise for unvoiced phonemes.
  • A synthesis filter:
    • Uses the output signal from the sound source generator.
    • Is a digital filter that models the vocal tract based on spectral information.
    • Generates singing voice inference data (AKA “samples”).
Casio U.S. Patent 10,789,922 vocal synthesis process detail

This rather complicated diagram from U.S. Patent 10,789,922 shows the synthesis phase in more detail. It shows the lyric string decomposed into phoneme and frame sequences. Each frame is sent to an acoustic model which generates an acoustic feature sequence, that is, the acoustic parameters that were learned during training. The acoustic parameters are synthesized (vocoded) into 255 samples. Each frame is about 5.1 msec long.

Casio CT-S1000V Vocal Synthesis (User Guide)

Well, if the second patent diagram was TMI, here is the block diagram from the Casio CT-S1000V user guide. The simplified diagram is quite concise and accurate! You should be able to relate these blocks directly back to the patent.

I hope this discussion is informative. In a later post, I’ll take a look at a few practical details related to Casio CT1000V Vocal Synthesis.

Copyright © 2022 Paul J. Drongowski

Casio CT-S1000V: Quick tips

One feels “all thumbs” when starting in with a new keyboard. The Casio CT-S1000V has a lot of functionality and customization below the MENU button and within the SETTINGS item in the main MENU. I made a map to help me get around:

MENU                      SETTINGS 
My Set-up Transpose
Active DSP Touch Off Velocity
Balance Split Point
Octave Shift Rhythm Auto Set
Sustain Chord Finger Mode
Portamento Rhythm Controller Type
Pedal SUS/UPPER PORT Button
Pitch Band ARP/AH Button
Knob Rhythm Volume
Arpeggio Song Volume
Auto Harmony Tuning
Sampling Surround
Song Audio In Center Cancel
Metronome MIDI OUT Channel UPPER1, UPPER2, LOWER
System Effects Local Control
EQ MIDI SYNC Mode
Scale Auto Power Off
MIDI Control Battery
Wireless LCD Contrast
Media Button Long Press Time
Home Customization Speaker
Settings >>>>>>>>>>>> Phone Speaker
Demo Setting Initialize
Exit All Initialize
Version

The Casio CT-S500 is probably organized in the same way.

MY SETUP Power On Recall

Yesterday, I mentioned MY SETUP and how useful it is for establishing a global set-up for a given playing situation. It’s also useful for establishing an initial set-up during power-up. Simply enter MY SETUP, select one of the four set-up entries, and press the AT PW-ON soft button. The CT-S1000V will recall the selected set-up during power-ups.

Even though it’s cool to get kicked into Vocal Synthesis at power-up — a nice marketing/sales ploy — I’d rather have a B-3 at my fingertips. 🙂

Active DSP HOLD

Active DSP assigns effect parameters to the three front panel knobs: K1, K2 and K3. The “Amp Organ 1” tone, for example, assigns the knobs this way:

  • K1: M1 Speed
  • K2: M1 OD Gain
  • K3: M1 Brake

K1 is the rotary speaker speed, K2 is the overdrive and K3 is the speaker break which stops simulated rotor/horn.

That’s great until you hit HOME or MENU and — what the??? The knobs are re-assigned to cut-off, resonance and modulation. That’s when Active DSP HOLD comes into play.

If you press the Active DSP HOLD soft button before leaving the Active DSP screen, the CT-S1000V will remember the DSP knob assignments when you go HOME or whatever. Save this in your set-up, too.

Slow your roll, Sparky

The first time your spin up the rotary in “Amp Organ 1,” you’ll be appalled at the short ramp-up time (acceleration) and the final rotor/horn speed. The Active DSP screen is also your way into the DSP parameters. I have the Drive Rotary effect applied to the organ. It has the following parameters:

  • Rotary speaker type
  • Overdrive gain
  • Overdrive level
  • Speed
  • Brake
  • Fall acceleration (ramp down)
  • Rise acceleration (ramp up)
  • Slow rate
  • Fast rate
  • Vibrato/Chorus
  • Wet level
  • Dry level

I like the sound of a beat-up Leslie with slow motors and slipping belts. Feel free to adjust the acceleration and slow/fast rates down.

Nice to see the chorus/vibrato simulation (V1, C1, V2, C2, V3, C3). The Hammond had a unique chorus/vibrato scanner unit which is a necessary component of gospel organ registrations. I’d love to see more details about Casio’s rotary speaker emulation including the scanner and speaker types.

Oh, yeah, please let us assign rotary speaker speed to the foot pedal. Thanks.

Boing

I wish I could see more details about the reverb, chorus and delay effects, too.

CT-S1000V has three system-wide effects: reverb, chorus and delay. Usually you get only reverb and chorus, and don’t always see a separate delay unit. Cool.

Unlike the DSP effects, you do not get to tweak system-wide effect parameters. All you get are presets with rather uninformative names like Room 1, Hall 2, etc. I listened to the room reverbs and settled on Room 2 for church registrations. Although ears should be the final judge, I wish I could see the parameter values behind the presets in order to make good choices.

I recommend publishing an effect routing diagram like the one I found in the CT-S5000 manual (below). Thanks.

Casio CT-X3000/CT-X5000 effect routing diagram [Casio]

Copyright © 2022 Paul J. Drongowski

Casio CT-S1000V: Observations

Gonna post a few notes while I take an ear break.

I’m rather pleased with the sound and play-ability of the Casio CT-S1000V. For now, I’m focused on sound design and playing, having only dipped into the auto-accompaniment rhythms and vocal synthesis.

Patches

Sounds, not the song. (Sorry, Clarence Carter.)

My first order of business is building a bunch of sound combinations that are suitable for the contemporary and traditional church music that I play. Jazz, funk and pop will have to wait a little while…

The CT-S1000V provides two different means of storing a patch: My Set-up and registrations. My Set-up is accessed through the main MENU button. Up to four set-ups can be stored. As I quickly discovered, My Set-up stores everything but the kitchen sink including settings like speaker ON/OFF. I sacrificed the fourth location (SAX) and created a LINE OUT entry with the internal speaker turned off. This seems like a good use for My-Setup, namely, saving global configurations for different playing situations, e.g., home, gig, etc.

Registrations are more appropriate for tone (voice) programming. There’s more registrations than My-Setup locations: 16 banks with four registrations per bank, 64 registrations total. That may seem stingy by today’s standards, but I don’t need more than 8 to 16 locations to cover most of my gig needs. Plus, one can always save registrations to a USB flash drive and load them as playing situations arise.

Tone programming

Registrations can save most everything related to tone programming: split, layer, effects, and much more. Yes, you can edit CT-S1000V tones — one reason why I passed on the CT-S1 and waited.

Tone editing is similar to “quick edit” that you might find on a synth. You can tweak 21 parameters including cutoff, resonance, attack time, release, vibrato, volume, pan, effect sends and 4-band EQ. You don’t get synthesizer-level deep editing. Cutoff, etc. are offsets (-64:+64) from the preset value. If you want it all in front of you like MODX or Montage, this isn’t the droid you’re looking for.

With only three front panel knobs, you need to assign a tone parameter to a knob first, then tweak. The changed value is saved along with everything else in a registration (including the knob assignment).

DSP editing, on the other hand, is deep. Using a feature called “Active DSP”, you can choose an effect type, assign a parameter to a knob, and tweak effect parameters. I’m still experimenting with Active DSP, especially for controlling rotary speaker speed. I’ll have more to say when I have a better grip on Active DSP.

Splits and layers

The CT-S1000V works logically and supports two split zones: Lower and Upper. When Split is turned off, all’s you get is Upper. Upper supports two layers: Upper 1 and Upper 2; Lower cannot be layered. The split point is configurable and you can adjust the balance (level) between tones. This is just enough to be dangerous. If you’re looking for a pile of layers, move along.

Starting with my Roland days (circa 1995), I’ve kept notes about the most useful tone combinations for contemporary church music. Here are my favorite combinations:

               Tone 1       Tone 2       Tone 3        Tone 4 
----------- ----------- ------------ ------------
High School Tuba A Celesta Flute 1A Clarinet A
Warm Tp Sect Tbs mp C Tbs mp B Tbs mf C Brs LipNzl
CTp + Tb Sect C Tps mp A Tbs mp A C Tps f A Tbs f A
Horn+Wood Flute 1A Clarinet C Oboe mf A Horns mf A
NobleHornPop Horns f A Flugel C Tb Sect B Trumpet 1C
NobleHornPop French 1C Flugel C Tb Sect B Trumpet 1C

Orch Reeds Oboe mf A E.Horn C Oboe f A
Wind&Str1 Oboe mf A Flute 1B DolceStr.A JV Strings A
Wood Sect Oboe mf A Flute 1A Clarinet A E.Horn A
Flute/Clari Flute 1C Clarinet C

ChamberWinds Oboe mf B Oboe mf A Sop.Sax mf A Flute 1A
ChamberWoods Clarinet A Flute 1C Flute 1A

Warm Strings Soft Pad A F.Str mp A JP Strings2C JP Strings1A
ChmbrQuartet Violin C Violin 2 A Cello A Cello 2 A
ViolinCello Vc mp B Bassoon A Va mp A Oboe mf A

These combinations date back my old JV-90, XP-60 and XV-5050! You’ll find equivalents on my MODX and Genos. My task now is to build similar combinations (registrations) on CT-S1000V.

BTW, I’m also dialing back reverb where necessary. I try for a happy balance — not too dry for practice at home, but not so much as to murk up the sound in a reverberant church hall.

If I had one wish, I would like to give each registration a name. I have a running map of “which registration does what,” but wish the names appeared on the CT-S1000V screen, not “Bank 1-1”.

Tones for old bones

By and large, the CT-S1000V orchestral tones are decent; most of them are musically useful and sound good through both the built-in speaker and an external monitor. The tone parameters are enough to cure overly bright tones or sharp attacks. Although I haven’t worked with them yet, the chromatic percussion (celeste, glockenspiel, etc.) don’t have any obvious tuning issues and are musical.

Two layers aren’t much. Fortunately, CT-S1000V has a few preset combi tones — Brass & String, Violin Section, Chamber (orchestra), Flute & Oboe, Pipe Section — which provide another “layer” or two on the cheap.

Don’t forget about the ethnic voices. CT-S1000V has accordions, fiddle, and harmoniums. Harmoniums! Jazzers who want to get their Jon Batiste on should look to these for melodica.

Organs

The CT-S1000V pipe organs are decent. Yes, there is the usual over-done reedy sound, but there are three tone presets that are suitable for hymn-playing and congregational singing. Even though the keybed is squared-off and similar to piano keys, it has a nice resistance and allows the legato-like gestures one uses when playing a pipe organ.

The drawbar organ tones are serviceable. No, we’re not in clone territory here and you can’t change the drawbar settings. There are rotary speaker effect algorithms, but again, we’re not in Vent or clone territory quality-wise. The only way to change rotary speaker speed is via Active DSP and turning the knob to which rotary speed has been assigned. I wish there was a way to assign rotary speed to the foot pedal. (OK, I need two wishes from the Genie.)

On the up-side, the rotary effect has a brake setting. One could brake the rotary and put the LINE OUT into a Lester K, Vent, or whatever. I will be giving this a try and will post notes. With Lester on the floor, I could stomp on a foot switch and change rotary speed, too.

Summary

Well, I hope these observations are helpful! The Casio CT-S1000V has a lot of sound-making value for very little money. So many tones to try…

Copyright © 2022 Paul J. Drongowski

Casio CT-S1000V: First impressions

After test driving the Casio CT-S1 and CT-S410, I took the plunge and bought a Casio CT-S1000V (AiX Sound Source with Vocal Synthesis, $450USD street). The price was irresistible after making a trade-in. (The Yamaha SHS-500 Sonogenic retired.)

In terms of build quality, the Casio CT-S1000V is robust enough for light to moderate gigging. It feels solid. I miss the fabric speaker covering (Casio CT-S1) as it is a touch of class. I suspect that fabric would get dirty on gigs, however. I wouldn’t park any drinks on this keyboard (or any keyboard) with everything exposed! Yep, it weighs ten pounds, not bad for a keyboard with in-built speakers.

Casio CT-S1000V

The power supply is a small lump-in-the-middle brick. The mains lead is rather short with one of those “figure 8” IEC 60320 C7 plugs. Other accessories include a music stand and a Casio WU-BT10 Bluetooth dongle — don’t lose that tiny little bugger! The music stand isn’t super-robust and I’m not sure that I want to park a heavy binder o’tunes on it. It’s also too low for my reading glasses and I will probably stick to my usual tripod music stand.

The CT-S1000V keybed is rather nice for a keyboard in this price range. The keys are squared off and piano-like although there’s no hammer simulation, of course. The keys are evenly spaced, are level, and don’t wobble too much. The keys have a textured surface similar to the Roland GO:KEYS. The throw is a little bit light and soft, not unpleasant. (BTW, I couldn’t stand the Roland GO:KEYS and returned it due to keybed issues.)

I can hand-swipe without cutting my hands. I don’t know how the keys will stand up to this kind of abuse in the long run. Plus, this board is so light, I’m afraid of throwing it off the keyboard stand when swiping!

The speaker sound is OK. I regard the speakers as “courtesy speakers.” Sometimes it’s convenient to push only one switch and start playing. They’re loud enough for my studio room, maybe loud enough for the church gig where we don’t generate a lot of stage volume. They don’t get buzzy at loud volume. Since I don’t play at very loud volume at home, I’m good with that. Casio wisely blessed the CT-S1000V with 1/4″ stereo output jacks so I can send the CT to the church PA.

I read just enough of the manual to enable Active DSP, which assigns DSP parameters to the knobs. With an organ tone selected, turning knob 1 (K1) switched between slow and fast rotary speaker speed. Wish there was a way to assign rotary speed to a button or the foot switch… I need to experiment more with Active DSP. Gotta experiment with splits and layers, too. I guess everything is saved to a registration, but we’ll find out!

I played with Vocal Synthesis enough to know there are multiple Vocalists. Some Vocalists are more natural than others. One of the Vocalists is “Death Voice” and I would like to uncork that one in church. 🙂

Quite a playable instrument. I haven’t listened to any of the rhythms yet because I’m mostly interested in flat-out playing. Switching sections (intro, main, fill, etc.) with the buttons below the display reminds me of switching arpeggios on the Yamaha MOX/MOXF.

Hope these impressions help!

Copyright © 2022 Paul J. Drongowski

Ye olde Yamaha Dance Kit

Ya learn somethin’ every day. Thanks for to Mark — my neighborhood to the north in Vancouver — who looped me in.

As one might expect, Yamaha have updated their drum kit samples over the years. Who knew — the DanceKit circa 2000 is more heavy, punchy and analog than present-day DanceKit. According to Mark (and Musicnik), the Standard Kit had more punch back in the day.

The table below summaries the instruments in the Yamaha Standard Kit and Dance Kit:

                    Standard Kit      Dance Kit 
Keyboard MIDI 127/000/001 127/000/28
-------- -------- ---------------- ---------------
40 E 1 28 E 0 Brush Tap Swirl Reverse Cymbal *
41 F 1 29 F 0 Snare Roll Snare Roll
42 F# 1 30 F# 0 Castanet Hi Q 2 *
43 G 1 31 G 0 Snare Soft Snare Techno *
44 G# 1 32 G# 0 Sticks Sticks
45 A 1 33 A 0 Bass Drum Soft Kick Techno Q *
46 A# 1 34 A# 0 Open Rim Shot Rim Gate *
47 B 1 35 B 0 Bass Drum Hard Kick Techno L *
48 C 2 36 C 1 Bass Drum Kick Techno 2 *
49 C# 2 37 C# 1 Side Stick Side Stick Analog *
50 D 2 38 D 1 Snare Snare Clap *
51 D# 2 39 D# 1 Hand Clap Hand Clap
52 E 2 40 E 1 Snare Tight Snare Dry *
53 F 2 41 F 1 Floor Tom L Tom Analog 1 *
54 F# 2 42 F# 1 Hi-Hat Closed Hi-Hat Close Analog 1 *
55 G 2 43 G 1 Floor Tom H Tom Analog 2 *
56 G# 2 44 G# 1 Hi-Hat Pedal Hi-Hat Close Analog 2 *
57 A 2 45 A 1 Low Tom Tom Analog 3 *
58 A# 2 46 A# 1 Hi-Hat Open Hi-Hat Open Analog *
59 B 2 47 B 1 Mid Tom L Tom Analog 4 *
60 C 3 48 C 2 Mid Tom H Tom Analog 5 *
61 C# 3 49 C# 2 Crash Cymbal 1 Cymbal Analog *
62 D 3 50 D 2 High Tom Tom Analog 6 *
63 D# 3 51 D# 2 Ride Cymbal 1 Ride Cymbal 1
64 E 3 52 E 2 Chinese Cymbal Chinese Cymbal
65 F 3 53 F 2 Ride Cymbal Cup Ride Cymbal Cup
66 F# 3 54 F# 2 Tambourine Tambourine
67 G 3 55 G 2 Splash Cymbal Splash Cymbal
68 G# 3 56 G# 2 Cowbell Cowbell Analog *
69 A 3 57 A 2 Crash Cymbal 2 Crash Cymbal 2
70 A# 3 58 A# 2 Vibraslap Vibraslap
71 B 3 59 B 2 Ride Cymbal 2 Ride Cymbal 2
72 C 4 60 C 3 Bongo H Bongo H
73 C# 4 61 C# 3 Bongo L Bongo L
74 D 4 62 D 3 Conga H Mute Conga Analog H *
75 D# 4 63 D# 3 Conga H Open Conga Analog M *
76 E 4 64 E 3 Conga L Conga Analog L *
77 F 4 65 F 3 Timbale H Timbale H 7
8 F# 4 66 F# 3 Timbale L Timbale L
79 G 4 67 G 3 Agogo H Agogo H
80 G# 4 68 G# 3 Agogo L Agogo L
81 A 4 69 A 3 Cabasa Cabasa
82 A# 4 70 A# 3 Maracas Maracas 2 *
83 B 4 71 B 3 Samba Whistle H Samba Whistle H
84 C 5 72 C 4 Samba Whistle L Samba Whistle L
85 C# 5 73 C# 4 Guiro Short Guiro Short
86 D 5 74 D 4 Guiro Long Guiro Long
87 D# 5 75 D# 4 Claves Claves 2 *
88 E 5 76 E 4 Wood Block H Wood Block H
89 F 5 77 F 4 Wood Block L Wood Block L
90 F# 5 78 F# 4 Cuica Mute Scratch H *
91 G 5 79 G 4 Cuica Open Scratch L *

The starred (“*”) entries denote analog drum machine samples.

I decided to do a side-by-side comparison. I first recorded the DanceKit samples as dry as possible on the Yamaha PSS-A50 and the Yamaha QY-70 (circa 1997). Then I matched everything up, ignoring the toms and a few extraneous instruments.

You’ll hear all the PSS-A50 examples first followed by all of the QY70 examples. I’ll let you decide as to your personal preference. Although I tried to get the A50 dry, there seems to be a hint of reverb remaining.

Without further ado, here is a ZIP file containing the WAV for all of the Yamaha QY-70 Dance Kit instruments starting from the bottom of the keyboard to the top. Have fun! Slice and dice everything into audio mirepois.

If a drum machine plays in the forest and no one is around, does it still make a sound? 🙂

Copyright © 2022 Paul J. Drongowski

Casio speech synthesis technology

The voice synthesis in Casio’s new CT-S1000V keyboard raised quite a bit of interest on the Web, including my own curiosity.

I installed the Casio Lyric Creator app on my iPad just to see what I can see. Lo and behold, there is a long list of open source licensing statements which identify some of the voice synthesis technology in the app and the keyboard itself. Let’s take a look starting with the top of the list.

HMM-based speech synthesis engine, HTS_engine, developed by the HTS Working Group. That’s a lot of acronyms and shoulders to stand on:

  • HMM: Hidden Markov model
  • HTS: An HMM-based speech synthesis system
  • SPTK: Speech Signal Processing Toolkit

The HTS Working Group is a voluntary group developing the HMM-based speech synthesis system HTS. The software bears a joint copyright from two institutions:

  • Nagoya Institute of Technology, Department of Computer Science, and
  • Tokyo Institute of Technology, Interdisciplinary Graduate School of Science and Engineering

The HTS_engine API is released under the Modified BSD license. I won’t quote such chapter and verse everywhere, but it gives you a sense of the distribution terms and conditions. Read about HTS version 2 in “The HMM-based Speech Synthesis System (HTS) Version 2.0“, by Heiga Zen, et al., Sixth ISCA Workshop on Speech Synthesis, 2007.

HMM-based singing voice synthesis system, Sinsy, developed by the Sinsy Working Group. This software bears the copyright of Nagoya Institute of Technology, Department of Computer Science.

Speak Signal Processing Toolkit, SPTK, developed by the SPTK Working Group. Again, the toolkit has a joint copyright:

  • Nagoya Institute of Technology, Department of Computer Science, and
  • Tokyo Institute of Technology, Interdisciplinary Graduate School of Science and Engineering

CRF+ by Taku Kudo. “CRF” is an acronym for “conditional random fields”. CRFs are a class of statistical modeling methods that are used in pattern recognition and machine learning.

The developers also acknowledge other work which was used during speech analysis:

  • WORLD: A high-quality speech analysis and synthesis system based on vocoding.
  • CMUdict: The CMU Pronouncing Dictionary from Carnegie-Mellon University, Pittsburgh, PA (my old school)
  • Festival Speech Synthesis System, Centre for Speech Technology Research, University of Edinburgh, UK.

For (more than) an introduction to HMM-based speech synthesis, try: “An Introduction to HMM-Based Speech Synthesis” by Junichi Yamagishi, October 2006. That should be enough math for you. 🙂 This presentation is super helpful, too.

Casio’s voice synthesis technology is not Yamaha Vocaloidâ„¢. Vocaloidâ„¢, by the way, is a registered trademark belonging to Yamaha. I have seen punters on the Web attribute the technology to Vocaloid or Yamaha. “Oh, they must have licensed it.” Wrong. Please do not refer to Casio’s tech as “Vocaloid” as this is technically incorrect and a misuse of Yamaha’s trademark.

Plus, we want to give credit where credit is due. Casio have staked out their IP territory in a series of patents filed on their behalf.

Want more information? See Casio singing synthesis in pictures.

Copyright © 2022 Paul J. Drongowski

Yamaha PSR-E473 and PSR-EW425

The PSR-E473 and PSR-EW425 continue the evolution of the Yamaha E-series arranger keyboards.

Yamaha PSR-E473 and PSR-EW425 arranger keyboards

Main features are:

  • PSR-E473: 61 keys, PSR-EW425: 76 keys
  • Super Articulation Lite voices and articulation button
  • 820 voices (including 43 Super Articulation Lite)
  • Category access buttons to select voices
  • 290 auto-accompaniment styles
  • Two DSP effect channels (DSP1 and DSP2)
    • DSP1: 41 types of DSP insertion effects
    • DSP2: 12 effect types
  • New quick sampling user interface (44.1kHz, 16-bit, stereo, 9.6 sec)
  • Motion effects (57 types) and motion effect button
  • Mega Boost (adds +6dB to the apparent volume)
  • Two live control knobs
  • 1/4″ main audio out (R, L/L+R)

Pricing has not been announced as of this writing.

The PSR-EW425 has an exclusive organ sound from the YC stage keyboards. Although the E473 and EW425 share ten new drawbar organ voices, the EW425 has some extra tricks. Quoting Yamaha’s documentation, “On the PSR-EW425, a percussive click sound at key-on/key-off and a leakage sound are added, providing more realistic vintage organ sounds.”

DSP1 is automatically assigned to the main voice. DSP2 can be assigned to any part. DSP2 is assigned to all parts (including the keyboard and backing) by default. There is a dedicated DSP2 button on the front panel which provides direct access to DSP2 and turns it ON and OFF. You can choose the effect type for each DSP unit. Effect parameter editing is limited to that available through the Live Control knobs.

PSR-E473 and PSR-EW425 effect routing [Yamaha]

With reverb, chorus and two DSP effect units, effect routing (above) is more sophisticated than earlier E-series models. The routing adheres to the XG architecture. The MIDI implementation does not provide SysEx for effect selection and routing. (Well, at least it’s not documented…)

Motion effects are implemented via MIDI pitch bend and continuous control messages. (The approach is similar to the Yamaha PSS-A50.) Message-heavy effects will cut into song size when recording into MIDI.

The PSR-EW425 has two 12cm speakers and its amplifiers produce 12W per channel. The PSR-EW425 requires six D size batteries, which will affect final weight. The PSR-EW425 weighs 8.3kg (18 pounds, 5 ounces) without batteries.

The PSR-E473 requires six AA size batteries. The PSR-E473 weighs 7.0kg (15 pounds, 7 ounces) without batteries.

Live control knobs can be assigned to:

  • Keyboard:
    • Filter cutoff and resonance
    • Reverb and chorus level
    • DSP1 parameters A and B
  • Backing:
    • Filter cutoff and resonance
    • Reverb and chorus level
    • Volume balance and retrigger rate
  • System:
    • DSP2 parameter A and B

Check out my pre-announcement post. See how well I did. 🙂

Copyright © 2022 Paul J. Drongowski

New Casio portable keyboards

Casio CT-S1000V

Casio have announced the new CT-S1000V keyboard with vocal synthesis:

  • 61 full-size touch response keys plus pitch bend wheel
  • 64 voice polyphony
  • 3 assignable knobs for controlling modulation, effects, filters, and more
  • 800 AiX-powered Tones and 243 full accompaniment Rhythms
  • Advanced Tones (including vintage keyboards
  • Editable DSP effects (100 effects)
  • Split and layer (Upper 1/2, Lower 1)
  • Powerful bass-reflex stereo speaker system with surround effect
  • Two 13cm by 6cm speakers, 2.5W per channel
  • Audio sampler and 6-track MIDI recorder (sequencer)
  • Audio sample format: WAV, 44kHz, 16-bit, stereo
  • Vocal synthesis with personalized lyrics via the free Lyric Creator app
  • Vocal format: 44kHz, 16-bit, mono
  • Bright backlit LCD display with easy, intuitive interface
  • Strap pins for playing anywhere
  • 1/4″ line outputs to connect to mixers, PA systems, etc.
  • Class-compliant USB-MIDI connects to the free Casio Music Space iOS/Android app
  • Includes WU-BT01 Bluetooth MIDI/Audio adapter
  • Optional 6xAA battery power (AC adapter and music rest included)
  • Weight: 10 pounds
  • $449.99 USD (street)
Casio CT-S1000V portable keyboard with vocal synthesis

Quoting the Casio web site:

The CT-S1000V does what no other keyboard can do: Speak or type your lyrics into the free Lyric Creator app for iOS/Android, transfer them to the CT-S1000V, and play the keys to hear your words come alive. Choose from multiple vocalist models, and adjust age, vibrato, portamento and other parameters in real time. It can produce choirs, robotic sounds, vocoder-like textures, and more. You can even create a custom vocalist based on an audio recording.

Availability — “Coming soon”.

Please see my pre-announcement post for more pictures and information. I also have posted a list of recent Casio patents related to sound synthesis and vocal synthesis.

Vocal Synthesis

According to Casio, you can create lyrics using their tablet-/phone-based Lyric Creator app, transfer them to the CT-S1000V, and play them using the keys. You can dynamically change characteristics like age, gender, portamento and vibrato in real time. Of course, you can mangle the sound with DSP effects, too. The front panel knobs are assignable for real time control.

NOTE mode chooses how the lyrics play-back when keys are pressed. You can play a word or syllable with each key press or you can play choral harmonies. PHRASE mode follows your timing. Legato gestures change note (pitch) while the phrase is playing. You can also select a syllable with your left hand and use your right hand to play it.

Casio Lyric Creator [Casio]

Lyric Creator lets you edit, save and share lyrics. Lyrics can be imported from MusicXML files.

Casio CT-S500

Casio have announced the new CT-S500:

  • 61 full-size touch response keys plus pitch bend wheel
  • 64 voice polyphony
  • 3 assignable knobs for controlling modulation, effects, filters, and more
  • Editable DSP effects (100 effects)
  • Audio sampler and 6-track MIDI recorder (sequencer)
  • 1/4″ line outputs to connect to mixers, PA systems, etc.
  • Includes WU-BT01 Bluetooth MIDI/Audio adapter
  • Bright backlit LCD display with easy, intuitive interface
  • 800 AiX-powered Tones and 243 full accompaniment Rhythms
  • Advanced Tones (including vintage keyboards)
  • Splits and layers (Upper 1/2, Lower 1)
  • Powerful bass-reflex stereo speaker system with surround effect
  • Two 13cm by 6cm speakers, 2.5W per channel
  • Strap pins for playing anywhere
  • Class-compliant USB-MIDI connects to the free Casio Music Space iOS/Android app
  • Optional 6xAA battery power (AC adapter and music rest included)
  • 10 pounds (4.7kg)
  • $379.99 USD (street)

Available now!

Casio CT-S500 portable keyboard [Casio]

Quick reaction

Watching the Casio release video stream, the artist demonstrations are exciting. At these price points, Casio are going to sell a ton of these. I am so glad they included the “Advanced Tones”, that is, all the pianos, vintage keys and other instruments which created so much interest in the CT-S1. Hope they slash street prices on the CT-X series because the new S-series models blow them away.

Copyright © 2022 Paul J. Drongowski

Akai MPK mini play mk3

Even though Winter NAMM 2022 is postponed (or just outright moved) to June, a few manufacturers are sticking to their release schedule.

Include Akai on the list of schedule keepers.

Akai have revamped the MPK mini play giving it a new front panel layout and a better speaker. The mini-keyboard has been upgraded to Gen-2, too. The new MPK mini play mk3 is slightly larger: 317 x 178 x 58mm versus 312 x 172 x 46mm. No strain, there.

Akai MPK mini play mk3

Other specs are largely the same. I would think the mk3 is based on the same mk1 sound engine (probably a Dream Synthesis SAM2635). The speaker is larger and is a welcome change.

I rather like the new layout. The control knobs are larger (?) with a modern appearance. Maybe, possibly, the knob placement may interfere with finger drumming? Internal (initial) control assignments are the same. Styling overall is more “noir.” No visible changes to the arpeggiator.

Thomann indicate price at 129 EURO. Thomann USA have a $124 USD price for USA customers.

If you fancied one before, now you’re spoilt with choice.

  • More than 100 internal drum and instrument sounds
  • Gen-2 keyboard with 25 velocity-sensitive mini keys
  • 8 backlit MPC pads with Note Repeat and Full Level function (x2 banks)
  • 4 controls for editing internal sounds or MIDI parameters (x2 banks)
  • built-in speaker
  • OLED Display
  • Pitch / Modulation Joystick
  • Arpeggiator
  • Connection for sustain pedal: 1/4inch jack
  • USB-B Port
  • Headphone output: 3.5mm jack
  • Dimensions (W x D x H): 317W x 178D x 58H mm
  • Weight: 860 g (1.9 pounds)
  • Software package: Akai Pro MPC Beats, AIR Music Tech Hybrid 3, Mini Grand, Velvet and Melodics learning software with 60 lessons

Copyright © 2022 Paul J. Drongowski

Recent Casio patents (AiR/AiX)

Update: Here is the latest information about the CT-S500 and CT-S1000 from Casio.

After visiting the U.S. Patent and Trademark site, I normally do a deep dive into a few patents. Given the upcoming announcement of Casio’s new keyboards (Casio CT-S1000V and CT-S500), time is of the essence. Thus, here is a short list of the most relevant recent Casio patents:

11,094,307    8/17/2021  Sound generation 
10,937,404 3/ 2/2021 Sound generation, three contact switches
10,909,958 2/ 2/2021 Chord accompaniment
10,825,438 11/ 3/2020 Vocoder
10,825,434* 11/ 3/2020 Voice synthesis
10,825,433* 11/ 3/2020 Voice synthesis (MIDI)
10,810,981* 10/20/2020 Voice synthesis
10,803,844 10/13/2020 Picture/image generation in sync with music
10,789,922* 9/29/2020 Voice synthesis
10,629,179 4/21/2020 Voice synthesis
10,616,688 4/ 7/2020 Speaker box
10,559,290 2/11/2020 Waveform transfer
10,515,618* 12/24/2019 Waveform compression
10,490,172* 11/26/2019 Enlivement data (expression)
10,475,425* 11/12/2019 Waveform play-back and sound generation
10,474,387* 11/12/2019 Waveform transfer (ring buffering)

If you’re short on time, you might want to start with the starred (“*”) patents. Clearly, Casio have invested heavily in voice and instrument synthesis!

Of course, the usual caveats about patents and patent applications holds. Just because a company stakes out its intellectural property (IP) turf with a patent, does not mean they will build the technology into a finished product. Beware when drawing inferences!

A little background and context are needed. Please keep these points in mind when reading the Casio patents.

Casio have moved to a new generation of synthesis:

  • Acoustic and intelligent Resonator (AiR) sound source.
    • Multi-dimensional Morphing (changing volume and tone characteristics with variations in touch and passage of time).
    • Lossless audio compression.
    • Resonance system (string resonance, open string resonance, damper resonance, Aliquot resonance).
  • Acoustic and intelligent Xpression (AiX) sound source.
    • High performance DSP effects and EQ.
    • Higher waveform memory capacity (more waveforms, longer waveforms).
    • Amplifier and vintage effect modeling.

AiX evolved from AiR. AiR first appeared in Casio digital pianos. AiX first appeared in the CT-X series of portable keyboards. Both technologies are based on a new, proprietary, large scale integrated (LSI) processor. The Casio CT-S series, including the new models, exploits AiX technology.

Casio CT-S1 printed circuit board [Casio]

Casio have always relied upon a very high degree of functional integration. Previously, a wide range of products were based on the custom uPD800468P-012 processor. This processor was a sizable beast with 180 pins integrating major functions like digital-to-analog (DAC) and analog-to-digital conversion (ADC). The main clock ran at 48MHz. This may seem slow to people accustomed to personal computer technology, but the low clock speed is power efficient — small power supplies and/or battery operation and no fans or heat sinks. You don’t really want to lug around a PC power supply and all of its cooling elements, do you?

The uPD800468P-012 has been replaced by a new LSI processor with even more pins! The CT-S1 main board is positively sparse. Where did all of the other components go?

Slight digression. Casio and Yamaha are manufacturing juggernauts who slug it out in portable keyboards and digital pianos. Entry-level products are notoriously price-sensitive and both companies work hard to wring cost out of entry-level (and mid-range) products.

Casio ST-S1 speaker box [Casio]

Casio are pushing in other dimensions, not just LSI. With more waveform memory capacity, they have added “Advanced Tones” such as the new piano, clav, and organ samples in the popular CT-S1. They have also developed a horizontal bass-reflex speaker system for slim-line instruments. The system incorporates a new speaker box and elliptical speakers. Finally, Casio have developed their own low-latency Bluetooth audio and MIDI protocol.

For further information, I highly recommend the behind the scenes interview with the CT-S1 development team. Not only will you learn about the CT-S1, you’ll get a feeling for the “contact sport” of product engineering as the team try to balance conflicting design concerns.

[I hope that Casio don’t mind that I have reproduced a few of their pictures here.]

Copyright © 2022 Paul J. Drongowski