Yamaha introduced audio styles in the PSR-S950 arranger workstation. Audio styles are both loved and hated. Loved when they sound good, but hated when people try to change or repurpose them in new styles.
The term “audio style” is a bit of an overstatement. Only the percussion track is audio. At least, that’s how audio styles have been developed and used to this day. Yamaha just released the Audio Phraser application for creating and editing the basic skeleton of an audio style, so this situation may change now that people can more freely create, edit and share their own audio styles.
Audio style file internal format
Ever since Yamaha distributed the audio styles for Genos, I’ve been meaning to take a look inside of an audio style file. Here’s a little preliminary information.
An audio style file is an IFF-like container just like a Standard MIDI File (SMF). In fact, an audio style file has the same internal organization as a regular style file which we know to be a Type 0 SMF with extra chunks.
An audio style file has the following chunks (in order):
Type Purpose ---- ------------------------------------ MThd SMF header chunk MTrk SMF track chunk CASM Yamaha CASM chunk AASM Audio assembly (descriptor) chunk AFil Audio file (waveform) chunk OTSc Yamaha OTS chunk
The AASM and AFil chunks are new, additional chunks beyond the known MIDI, CASM and OTS chunks. All chunks have a four byte chunk identifier and a four byte chunk size. The chunk size does not include the identifier or chunk size bytes, as usual.
The AASM chunk is relatively small, about 2,500 bytes. It consists of 15 variable length ASEG subchunks. The ASEG subchunk has a four byte subchunk size. Each ASEG corresponds to a style section; that’s why there are fifteen of them.
An ASEG subchunk has three parts:
Type Purpose ---- ------------------------------------ Adec Identifies the style section Atab Identifies the audio file; other functions unknown AMix Function unknown
The Adec part is variable length, having an explicit four byte size. The Atab and AMix parts appears to be fixed length (101 and 28 bytes, respectively) and do not have an explicit size field.
The Adec part is ASCII text and is a style section name like “Main A” or “Fill In DD”. That is the only information in Adec.
I don’t know exactly what the Atab does. The Atab part contains an ASCII string which identifies the audio file associated with the style section. This string is clearly visible in a dump. (Example below.) All of the Atab and AMix parts in the test audio file have the same values except for the audio file names.
File Offset: 36965 Subchunk type: 'ASEG' Subchunk size: 151 Section name: Main D Atab type: 'Atab' 0 0 0 97 0 32 32 32 | 00 00 00 61 00 20 20 20 | ...a. 32 32 32 32 32 41 56 48 | 20 20 20 20 20 29 38 30 | )80 115 67 97 110 97 100 105 97 | 73 43 61 6E 61 64 69 61 | sCanadia 110 82 111 99 107 95 77 97 | 6E 52 6F 63 6B 5F 4D 61 | nRock_Ma 105 110 32 68 0 0 0 0 | 69 6E 20 44 00 00 00 00 | in D.... 0 0 0 0 0 0 0 0 | 00 00 00 00 00 00 00 00 | ........ 0 0 0 0 0 0 0 0 | 00 00 00 00 00 00 00 00 | ........ 0 0 0 0 0 0 0 0 | 00 00 00 00 00 00 00 00 | ........ 1 15 -1 7 -1 -1 -1 -1 | 01 0F FF 07 FF FF FF FF | ........ 0 0 0 127 0 0 0 0 | 00 00 00 7F 00 00 00 00 | ........ 127 0 0 0 0 0 127 0 | 7F 00 00 00 00 00 7F 00 | ........ 0 0 0 0 127 0 0 0 | 00 00 00 00 7F 00 00 00 | ........ 0 0 0 0 0 0 0 0 | 00 00 00 00 00 00 00 00 | ........ AMix type: 'AMix' 0 0 0 24 7 -128 0 -1 | 00 00 00 18 07 80 00 FF | ........ 88 4 4 2 24 8 0 -80 | 58 04 04 02 18 08 00 B0 | X....... 7 71 0 10 64 0 91 0 | 07 47 00 0A 40 00 5B 00 | .G..@.[. 0 -1 47 0 0 0 0 0 | 00 FF 2F 00 00 00 00 00 | ../.....
Etienne from the PSR Tutorial Forum points out that the AMix subchunk contains MIDI event codes:
AMix : header 00 00 00 18 : length of data 07 80 : 0780 hex = 1920 decimal (PPQN ?) 00 : delta time FF 58 04 04 02 18 08 : meta event Time signature 4/4 00 : delta time 0B 07 70 : controller volume 00 : delta time 0A 40 : controller Panpot 00 : delta time 5B 00 : Controller Reverb send level 00 : delta time FF 2F 00 : end of MTrk trunk
Nice catch, Etienne! The AMix content makes sense because something needs to set up the channel volume, pan and reverb level for the audio phrase. Yamaha love to use MIDI events for other purposes (like voice files, OTS, etc.) Why not?
The AFil chunk has substructure, too. The AFil chunk consists of ADSg chunks. As you might guess, the AFil chunk is pretty big because it contains waveform data.
The following table shows the offset and length information for the first ADSg in the example’s AFil:
AFil 37287 15261858 ADSg 37295 1219275 Container for an audio file ANdc 37303 50 File name AWav 37361 1219209 Container for audio waveform WAVE 37369 n/a Marker (no subchunk size) Afmt 37373 16 Audio format information Sfmt 37397 217 Container for section information Sdec 37608 6 Section name, e.g., Main A Adat 37622 1218300 Waveform data AInf 1255930 640 Container for audio information BPnt 1255938 136 OPnt 1256082 240 APnt 1256330 232 ATmp 1256570 0 Empty, subchunk size is 0 ADSg 1256578 Container for the next audio file ....
The container relationships are important because the containers and subchunks are nested:
AFil contains ADSg ADSg contains ANdc, AWav AWav contains WAVE, Afmt, Sfmt, Sdec, Adat, AInf AInf contains BPnt, OPnt, APnt, ATmp
The nesting is a bit of a pain in the patootie when writing code to parse a style file.
ADSg is the container chunk holding audio waveform (meta-)information. Like ASEG, there are fifteen ADSg chunks — one for each audio file. The ANdc subchunk inside contains the audio file name which matches up with the name in the ASEG. AWav is the container holding the audio waveform data itself.
The audio “file” format is WAV-like, but it is not exactly WAV (Microsoft RIFF). I was able to playback the audio by importing the audio style file as a raw (untyped) audio file. The audio format seems to be 44,100Hz, 16-bit stereo, big endian. No compression or encryption. It isn’t be too hard to dump the audio.
Yamaha Audio Phraser
Now that you know a little bit about what’s inside of an audio style file, here is brief overview of what the Audio Phraser program generates.
Audio Phraser generates an MThd MIDI file header chunk, a single MTrk chunk (Type 0), an ASEG chunk for each audio waveform, an AFil chunk (containing an ADSg subchunk for each audio file) and a CASM chunk.
The MIDI tempo and time signature are the same as the tempo set in Audio Phraser. The MIDI song title is set to “Audio Phraser”.
The MIDI track contains the usual markers at the beginning: SFF2 and SInt. A single SysEx message is generated after SInt: General MIDI System ON (F0 7E 7F 09 01 F7). The key signature is set to C/Am, followed by:
- SMPTE Offset
- Sequencer specific metadata: ff 7f 04 43 00 01 00 00
Oddly, MIDI channel 4 has four, whack-looking MIDI OFF events:
NOTE OFF G#9 NOTE OFF G5 NOTE OFF C0 NOTE OFF C0
A bug? The remaining markers indicate the start of the style sections. The section length corresponds to the length of the audio waveform for the section. Thus, if the audio waveform for “Main A” is 2 bars, then the MIDI section for “Main A” is 2 bars long.
The CASM chunk is minimal and sets NTR/NTT for MIDI channel 9 (Subrhythm). NTR is “Root Fixed” and NTT is “Bypass/Bass Off”. No NTR/NTT is given for channel 10 (rhythm/drums).
Audio Phraser does not generate an OTSc (One Touch Settings) chunk.
Audio Phraser creates an AWI file for each waveform that it imports into an audio style file. The AWI file most likely holds the results of Audio Phraser’s analysis (i.e., beat detection and so forth). It would be interesting and informative to compare the contents of an AWI file against the ASEG and AInf chunks in the resulting audio style file. I’m guessing that the AWI file is the “prototype” for the ASEG and AInf chunks.
Java source code
If you would like to explore audio style files, then download the source code for a simple audio style dump program. The code is relatively brittle and expects to encounter chunks in a certain order and/or quantity. Thus, be prepared to modify the code. This is an experimenter’s kit, after all. 😉
Copyright © 2018 Paul J. Drongowski