Embedded Speech Commands
Embedded commands are used to fine-tune the pronunciation of individual words in the text you pass to a synthesizer.
An embedded command is enclosed in “[[“ and “]]” characters, so called delimiters.
All embedded commands consist of a 4-character command code and a parameter.
Separated by semicolons, more than one command may occur within a single pair of delimiters.
For example: Do [[emph 100; rate 100]] not [[rate 175]] over tighten the screw.
A parameter may consist of a string or numeric type, and may be accompanied by the + or - characters.
Some commands allow you to use the parameter to specify either an absolute value or a relative value.
For example, the volm command allows you to specify a particular volume or an amount by which to increase or decrease the current volume, as shown below:
[[volm 0.3]] This command sets the volume with which the following words are spoken to 0.3.
[[volm +0.1]] This command increases the volume with which the following words are spoken by 0.1.
Please not that not all commands are supported by all synthesizers nor by all voices. Also the implementation may vary greatly. Some use an effects until it’s switched off again, while others, apply an effect only to the word immediately preceded by the embedded command. Also, remember to URL parameter you parameter submission, e.g. “+” turns into “%2B”. Btw, the “Alex” voice seems to be the most compliant, with regards to the Embedded Speech Commands.
- Character Mode: NORM or LTRL
The character mode command char lets the synthesizer speaks the individual characters of every word.
For example: “This is how you spell the word cat, [[char LTRL]] cat [[char NORM]], cat.”
- Number Mode: NORM or LTRL
The number mode command nmbr sets the number-speaking mode of the synthesizer. The NORM parameter causes the synthesizer to speak the number 46 as “forty-six,” whereas the LTRL parameter causes the synthesizer to speak the same number as “four six.“
For example, to make it clear that the following 7-digit number is a phone number, you can use the nmbr command to tell the synthesizer to say each digit separately, as follows:
“Please call me at [[nmbr LTRL]] 5551990 [[nmbr NORM]].”
- Input Mode TEXT or PHON or TUNE
The input mode command input switches the input-processing mode to textual mode, phoneme mode, or TUNE format mode.
The default input-processing mode is textual, and you should always use the [[inpt TEXT]] command to revert to textual mode after you’re finished providing content in one of the other modes. In phoneme mode, the synthesizer interprets characters as representing phonemes (listed in Phonemes). In the TUNE format mode, the synthesizer recognizes the same set of phonemes but also interprets additional information that specifies a precise spoken contour, or tune, for the words. For more information about the TUNE format, consult Apple’s Speech Synthesis Programming Guide.
For example, to supply the phonemic representation of a name that synthesizers frequently mispronounce, you can use the inpt command as follows:
“My name is [[inpt PHON]] AY1yIY2SAX [[inpt TEXT]].”
- Emphasis Value
The emphasis command emph causes the synthesizer to speak the next word with greater or less emphasis than it is currently using. A positive parameter increases emphasis and the negative parameter decreases emphasis. Try values between +127 ..-127.
For example, to emphasize the word “not” in the following phrase, use the emph command as follows:
"Do [[emph 100]] not go. Do not [[emph 100]] go."
- Speech Pitch Value or +/- Value
The baseline pitch command pbas changes the current speech pitch to the specified value. If the pitch value is preceded by the + or - character, the speech pitch is adjusted relative to its current value. Baseline pitch values are always positive numbers in the range of 1 to 127.
For example: “Please [[pbas +12]] call me [[pbas -12]] at home.”
- Speech Pitch Modulation +/- Value
The pitch modulation command pmod changes the modulation range for the speech channel, based on the specified modulation-depth value.
For example: "[[pmod +100]] call me [[pmod -100]] call me"
- Speech Rate
rate sets the speech rate, in words per minute (default 175).
The speech rate command
For example: “Please, [[rate 300]] call me [[rate 175]] later at home.”
- Silence Value
The silence command slnc causes the synthesizer to generate pause for the specified number of milliseconds. You might want to insert extra silence between two sentences to allow listeners to fully absorb the meaning of the first one. Note that the precise timing of the silence will vary among synthesizers.
For example: "Hello [[slnc 800]] are you still here?"
- Volumen Value or +/- Value
The speech volume command volm sets the speech volume to the specified value. (0 = faint)
For example: "Hello World [[volm 0.1]]Hello World [[volm 0.5]]Hello World [[volm 1]]Hello World [[volm 2]]Hello World"