Speech Synthesis Markup Language (SSML) in the Voxey WordPress plugin

The Voxey plugin supports Speech Synthesis Markup Language(SSML). Using SSML-enhanced text gives you additional control over how Amazon Polly generates speech from the text you provide.

You can fine-tune for the following entities:

SSML is possible by using shortcodes inside the WordPress page. A shortcode is a WordPress-specific code that lets you do nifty things with very little effort. Shortcodes can embed files or create objects that would normally require lots of complicated, ugly code in just one line.

Multiple voices

You can use multiple voices on one page. And also you can voice different fragments of the page in different languages. Use the [voxey-voice] shortcode to speak the text with the specified voice. Each listed voice has its own individual character. The [voxey-voice] shortcode can be used in combination with all other SSML shortcodes.

NOTES

Make sure that there are no spaces between characters in the voice name.

You can find out the name of the voice that you need in the settings:

Voice name in the plugin settings

The following example is in English and French.

[voxey-voice name="en-US-Gregory-Neural"] Hello! [/voxey-voice]
[voxey-voice name="fr-FR-Celine-Standard"] Bonjour! [/voxey-voice]

You can find the list of all available voices on the Amazon site.

HTML tag attribute

The voxey-voice can be used as a tag attribute with the parameters listed above.<span voxey-voice=”en-US-Gregory-Neural“> Hello! </span>

<span voxey-voice="fr-FR-Celine-Standard"> Bonjour!</span>

Pause

To add a pause to your text, use the [voxey-break] shortcode. You can set a pause based on strength (equivalent to the pause after a comma, a sentence, or a paragraph), or you can set it to a specific length of time in seconds or milliseconds. If you don’t specify an attribute to determine the pause length, the plugin uses the default, which is [voxey-break strength="medium"], which adds a pause the length of a pause after a comma.

[voxey-break time="2s" strength="medium"]

strength attribute values:

  • none: No pause. Use none to remove a normally occurring pause, such as after a period.
  • x-weak: Has the same strength as none, no pause.
  • weak: Sets a pause of the same duration as the pause after a comma.
  • medium: Has the same strength as weak.
  • strong: Sets a pause of the same duration as the pause after a sentence.
  • x-strong: Sets a pause of the same duration as the pause after a paragraph.

time attribute values:

  • [number]s: The duration of the pause, in seconds. The maximum duration is 10s.
  • [number]ms: The duration of the pause, in milliseconds. The maximum duration is 10000ms.

Mute

You can turn off the spoken part of the page. Wrap it in [voxey-mute] shortcode to prevent this section of the page from being spoken.

[voxey-mute] Text to be removed from the audio [/voxey-mute] 

There are also several other ways to mute a piece of text in a post:

  • Add class="voxey-mute" in muted element
  • Add attribute voxey-mute="true" in muted element

Say as

Use the voxey-say-as shortcode with the interpret-as attribute to tell Amazon Polly how to say certain characters, words, and numbers. This enables you to provide additional context to eliminate any ambiguity on how it should render the text.

Except for the characters option, the voxey-say-as shortcode is supported by long-form, neural, and standard TTS formats. Note that if the plugin using a neural voice and encounters the voxey-say-as  with the characters option at runtime, the affected sentence will be synthesized using the related standard voice. However, the affected sentence will still be billed as if it uses a neural or long-form voice.

The voxey-say-as shortcode uses one attribute, interpret-as, which uses a number of possible available values. Each uses the same syntax:

[voxey-say‑as interpret-as="cardinal"] 12345 [/voxey-say‑as] 

The following values are available with interpret-as:

  • characters or spell-out: Spells out each letter of the text, as in a-b-c. Note: This option is not currently supported for neural voices. If you are using a neural voice and this SSML code is encountered by Amazon Polly at run-time, the affected sentence will be synthesized using the related standard voice. Please note, however, that this sentence will still be billed as if it uses a neural voice.
  • cardinal or number: Interprets the numerical text as a cardinal number, as in 1,234.
  • ordinal: Interprets the numerical text as an ordinal number, as in 1,234th.
  • digits: Spells out each digit individually, as in 1-2-3-4.
  • fraction: Interprets the numerical text as a fraction. This works for both common fractions such as 3/20, and mixed fractions, such as 2 ½. See below for more information.
  • unit: Interprets a numerical text as a measurement. The value should be either a number or a fraction followed by a unit with no space in between as in 1/2inch, or by just a unit, as in 1meter.
  • date: Interprets the text as a date. The format of the date must be specified with the format attribute. See below for more information.
  • time: Interprets the numerical text as duration, in minutes and seconds, as in 1'21".
  • address: Interprets the text as part of a street address.
  • expletive: “Beeps out” the content included within the tag.
  • telephone: Interprets the numerical text as a 7-digit or 10-digit telephone number, as in 2025551212. You can also use this value for handle telephone extensions, as in 2025551212x345. See below for more information.NoteCurrently the telephone option is not available for all languages. However, it is available for voices speaking English language variants (en-AU, en-GB, en-IN, en-US, and en-GB-WLS), Spanish language variants (es-ES, es-MX, and es-US), French language variants (fr-FR and fr-CA), and Portuguese variants (pt-BR and pt-PT), as well as German (de-DE), Italian (it-IT), Japanese (ja-JP), and Russian (ru-RU). It should also be noted that in some cases, languages such as Arabic (arb) automatically handle the number set as a telephone number and so do not actually implement the telephone SSML tag.

Substitution

Use the [speaker-sub] shortcode with the alias attribute to substitute a different word (or pronunciation) for selected text such as an acronym or abbreviation.

The following example is spoken as “World Wide Web Consortium” instead W3C:

[voxey-sub alias="World Wide Web Consortium"] W3C [/voxey-sub]

HTML tag attribute

Substitution can be used as a tag attribute with the parameters listed above.

Use example Substitution as tag attribute

<span voxey-sub="World Wide Web Consortium"> W3C [/span]

Emphasis

Emphasizing words changes the speaking rate and volume. More emphasis makes Amazon Polly speak the text louder and slower. Less emphasis makes it speak quieter and faster. To specify the degree of emphasis, use the level attribute. To emphasize words, use the [voxey-emphasis] shortcode.

This shortcode is supported only by the standard TTS format.

The following example to make an announcement:

[voxey-emphasis level="strong"] This is an important announcement [/voxey-emphasis]

level attribute values:

  • Strong: Increases the volume and slows the speaking rate so that the speech is louder and slower.
  • Moderate: Increases the volume and slows the speaking rate, but less than strongModerate is the default.
  • Reduced: Decreases the volume and speeds up the speaking rate. Speech is softer and faster.

HTML tag attribute

Emphasis can be used as a tag attribute with the parameters listed above.

Use example Emphasis as a tag attribute

<span voxey-emphasis="strong"> This is an important announcement </span>

Prosody

To control the volume, rate, or pitch of your selected voice, use the [voxey-prosody] shortcode.

Prosody tag attributes are fully supported by the standard TTS voices. Neural and long-form voices support the volume and rate attributes, but don’t support the pitch attribute.

Volume, speech rate, and pitch are dependent on the specific voice selected. In addition to differences between voices for different languages, there are differences between individual voices speaking the same language. Because of this, while attributes are similar across all languages, there are clear variations from language to language and no absolute value is available.

The [voxey-prosody] shortcode has three attributes, each of which has several available values to set the attribute.

For example, you could set the volume for a passage as follows:

[voxey-prosody volume="loud"] increase the volume for a specific speech. [/voxey-prosody]

volume attribute values:

  • default: Resets volume to the default level for the current voice.
    • silentx-softsoftmediumloudx-loud: Sets the volume to a predefined value for the current voice.
    • +ndB-ndB: Changes volume relative to the current level. A value of +0dB means no change, +6dB means approximately twice the current volume, and -6dB means approximately half the current volume.

rate attribute values:

  • x-slowslowmediumfast,x-fast. Sets the pitch to a predefined value for the selected voice.
  • n%: A non-negative percentage change in the speaking rate. For example, a value of 100% means no change in speaking rate, a value of 200% means a speaking rate twice the default rate, and a value of 50% means a speaking rate of half the default rate. This value has a range of 20-200%.

For example, you could set the speech rate for a passage as follows:

[voxey-prosody rate="slow"] slow up the speaking rate of your text. [/voxey-prosody]

pitch attribute values:

  • default: Resets pitch to the default level for the current voice.
  • x-lowlowmediumhighx-high: Sets the pitch to a predefined value for the current voice.
  • +n% or -n%: Adjusts pitch by a relative percentage. For example, a value of +0% means no baseline pitch change, +5% gives a little higher baseline pitch, and -5% results in a little lower baseline pitch.

For example, you could set the pitch for a passage as follows:

[voxey-prosody pitch="high"] with a pitch that is higher than normal?[/voxey-prosody]

The [voxey-prosody] shortcode must contain at least one attribute, but can include more

[voxey-prosody rate="slow" pitch="-2st"] Can you hear me now? [/voxey-prosody]

About compatibility

You can combine and use several Speaker shortcodes in one place on the page as well as use multiple shortcodes on the page.

Was this article helpful to you?