SSML - Speech Synthesis Markup Language



SSML, the speech synthesis markup language, is a W3C standard - read all about it here:
https://www.w3.org/TR/speech-synthesis/

Think of it as the HTML for speech. Just as you would put a <b> tag around a word, to show it in bold letters, when rendered in a web browser, there is an <emphasis> tag in SSML, to emphasize a word, when it gets synthesized.

Just like HTML, SSML has a variety of tags and the essential role of the markup language is to provide a standard way to control aspects of speech such as pronunciation, volume, pitch, rate, etc. across different synthesis-capable platforms.

Now here is the problem, Apple never implemented SSML in OS X nor in the newer macOS as a standard and instead introduced their own proprietary “Embedded Speech Commands," to provide some control over how to synthesize content.


Using Mac2Speech to synthesize SSML encoded content.


Since Apple does not support SSML, you need to install new voices and synthesizers that support this W3C standard. We have identified
Cepstral as a provider for such components.
Once you have identified the voice(s) you like and downloaded / installed them on your Mac, you will find a new icon in your Mac’s System-Preferences. Clicking on this icon, brings up a dialog that should look something like this:

cepstral

You need to acquire a license from Central to use those voices and this is, where it gets complicated. Straight forward licensing allows you to only synthesize text into sound, without ever storing the content. However, this is not how Mac2Speech works. Remember that Mac2Speech not only needs to synthesize, but also to encode the highest possible quality of the synthesis into the MP3 standard.

To allow Mac2Speech to use Cepstral’s voices to synthesize SSML content, voices need to be licensed with the “Save to File for Mac OS X” license.
This is not an effortless process and will certainly require a lot of determination on your side. However, the end result might be worth it. To give you just a taste, here for instance is an SSML document.

SSML


<?xml version="1.0"?>
<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US">
That is a <emphasis> big</emphasis> car!
This is going to make a <emphasis level="strong"> huge</emphasis> impression.
</speak>

SSML - Synthesis


And here is what the synthesis sounds like:

ssml


Since version 3, Mac2Speech supports SSML, which I think is a huge deal, given that Apple stayed as far away from this standard as possible and we truly have to be thankful for voice providers like Cepstral to make this possible.