The Talker WordPress plugin supports Speech Synthesis Markup Language(SSML). SSML is part of a larger set of markup specifications for voice browsers developed through the open processes of the W3C.
You can fine-tune for the following entities:
- Multiple voices
- Pause
- Mute
- Say as
- Substitution
- Emphasis
- Sentence
- Audio file URL
- Prosody
- Say hidden text
SSML is possible by using shortcodes inside the WordPress page. A shortcode is a WordPress-specific code that lets you do nifty things with very little effort. Shortcodes can embed files or create objects that would normally require lots of complicated, ugly code in just one line.
Multiple voices
You can use multiple voices on one page. And also you can voice different fragments of the page in different languages. Use the talker-voice
tag to speak the text with the specified voice. Each listed voice has its own individual character. The
shortcode can be used in combination with all other SSML shortcodes.talker
-voice
NOTES
Make sure that there are no spaces between characters in the voice name.
[talker-voice
name="en-US-AmberNeural"]
Hello![/talker-voice]
[talker-voice
name="uk-UA-PolinaNeural"]
Привіт![/talker-voice]
You can find the list of all available voices on the Microsoft website.
HTML tag attribute
The talker-voice
can be used as a tag attribute with the parameters listed above.
<span talker-voice
="en-US-AmberNeural"> Hello! </span>
Pause
To create a pause in speech synthesis, use the following shortcode:
[talker-break
time="2s"]
Time
sets the length of the break by seconds or milliseconds (e.g. “3s” or “250ms“).
Mute
You can turn off the spoken part of the page. Wrap it in talker-mute
shortcode to prevent this section of the page from being spoken.
[talker-mute ]
Text to be removed from the audio[/talker-mute]
There are also several other ways to mute a piece of text in a post:
- Add
class="talker-mute"
in muted element - Add attribute
in muted elementtalker
-mute="true"
Say as
This group of shortcodes lets you indicate information about the type of text construct that is contained within the element. It also helps specify the level of detail for rendering the contained text.
Format matching is important for shortcodes in the talker-say-as
group. For example, the shortcode will not work if you use the shortcode for numbers(like cardinal
), but at least one not-number character is between the opening and closing shortcode.
The
shortcode has the required attribute, talker-say-as
interpret-as
, which determines how the value is spoken.
Say as cardinal
The following example is spoken as “Twelve thousand three hundred forty-five” (for US English) or “Twelve thousand three hundred and forty-five (for UK English)”:
[talker-say‑as
interpret-as="cardinal"] 12345[/talker-say‑as]
interpret-as | format | Interpretation |
---|---|---|
characters , spell-out | The text is spoken as individual letters (spelled out). The speech synthesis engine pronounces:[talker-say‑as interpret-as="characters"] test [/talker-say‑as] As “T E S T.” | |
cardinal , number | None | The text is spoken as a cardinal number. The speech synthesis engine pronounces:There are As “There are ten options.” |
ordinal | None | The text is spoken as an ordinal number. The speech synthesis engine pronounces:Select the As “Select the third option.” |
number_digit | None | The text is spoken as a sequence of individual digits. The speech synthesis engine pronounces:
As “1 2 3 4 5 6 7 8 9.” |
fraction | None | The text is spoken as a fractional number. The speech synthesis engine pronounces:
As “three eighths of an inch.” |
date | dmy, mdy, ymd, ydm, ym, my, md, dm, d, m, y | The text is spoken as a date. The format attribute specifies the date’s format (d=day, m=month, and y=year). The speech synthesis engine pronounces:Today is As “Today is October nineteenth two thousand sixteen.” |
time | hms12, hms24 | The text is spoken as a time. The format attribute specifies whether the time is specified by using a 12-hour clock (hms12) or a 24-hour clock (hms24). Use a colon to separate numbers representing hours, minutes, and seconds. Here are some valid time examples: 12:35, 1:14:32, 08:15, and 02:50:45. The speech synthesis engine pronounces:The train departs at As “The train departs at four A M.” |
duration | hms, hm, ms | The text is spoken as a duration. The format attribute specifies the duration’s format (h=hour, m=minute, and s=second). The speech synthesis engine pronounces:
As “one hour eighteen minutes and thirty seconds”. Pronounces: [talker-say‑as interpret-as="duration" format="ms"]01:18 As “one minute and eighteen seconds”. This tag is only supported on English and Spanish. |
telephone | None | The text is spoken as a telephone number. The format attribute can contain digits that represent a country code. Examples are “1” for the United States or “39” for Italy. The speech synthesis engine can use this information to guide its pronunciation of a phone number. The phone number might also include the country code, and if so, takes precedence over the country code in the format attribute. The speech synthesis engine pronounces:The number is As “My number is area code eight eight eight five five five one two one two.” |
currency | None | The text is spoken as a currency. The speech synthesis engine pronounces:
As “ninety-nine US dollars and ninety cents.” |
address | None | The text is spoken as an address. The speech synthesis engine pronounces:I'm at As “I’m at 150th Court Northeast Redmond Washington.” |
name | None | The text is spoken as a person’s name. The speech synthesis engine pronounces:
As [æd]. In Chinese names, some characters pronounce differently when they appear in a family name. For example, the speech synthesis engine says 仇 in
As [qiú] instead of [chóu]. |
Substitution
Indicate that the text in the alias attribute value replaces the contained text for pronunciation.
The following example is spoken as “World Wide Web Consortium” instead W3C:
[talker-sub
alias="World Wide Web Consortium"]
W3C[/talker-sub]
You can also use the talker-sub
shortcode to provide a simplified pronunciation of a difficult-to-read word.
Emphasis
Used the [talker-emphasis]
shortcode to add or remove emphasis from the text contained by the element.
NOTES
The
shortcode should only be used around a full sentence. Enclosing words within a sentence may cause unwanted pauses in speech.talker
-emphasis
The following example uses the
shortcode to make an announcement:talker
-emphasis
[talker-emphasis
level="moderate"]
This is an important announcement[/talker-emphasis]
This shortcode supports an optional “level
” attribute with the following valid values:
reduced
none
moderate
strong
When the level
attribute isn’t specified, the default level is moderate
.
Audio file URL
Use the shortcode [talker-file]
to display the audio record file URL of the current post/page.
Prosody
Use [
shortcode to customize the pitch, speaking rate, and volume of text contained by the element.talker
-prosody]
[talker-prosody
rate="slow" pitch="-2st"] Can you hear me now?[/talker-prosody]
Attribute | Description | Required or optional |
---|---|---|
pitch | Indicates the baseline pitch for the text. Pitch changes can be applied at the sentence level. The pitch changes should be within 0.5 to 1.5 times the original audio. You can express the pitch as:An absolute value: Expressed as a number followed by “Hz” (Hertz). For example, [talker-prosody pitch="600Hz">some text .A relative value:As a relative number: Expressed as a number preceded by “+” or “-” and followed by “Hz” or “st” that specifies an amount to change the pitch. For example: or . The “st” indicates the change unit is semitone, which is half of a tone (a half step) on the standard diatonic scale.As a percentage: Expressed as a number preceded by “+” (optionally) or “-” and followed by “%”, indicating the relative change. For example: or .A constant value:x-lowlowmediumhighx-highdefault | Optional |
rate | Indicates the speaking rate of the text. Speaking rate can be applied at the word or sentence level. The rate changes should be within 0.5 to 2 times the original audio. You can express rate as:A relative value:As a relative number: Expressed as a number that acts as a multiplier of the default. For example, a value of 1 results in no change in the original rate. A value of 0.5 results in a halving of the original rate. A value of 2 results in twice the original rate.As a percentage: Expressed as a number preceded by “+” (optionally) or “-” and followed by “%”, indicating the relative change. For example: or .A constant value:x-slowslowmediumfastx-fastdefault | Optional |
volume | Indicates the volume level of the speaking voice. Volume changes can be applied at the sentence level. You can express the volume as:An absolute value: Expressed as a number in the range of 0.0 to 100.0, from quietest to loudest. An example is 75. The default is 100.0.A relative value:As a relative number: Expressed as a number preceded by “+” or “-” that specifies an amount to change the volume. Examples are +10 or -5.5.As a percentage: Expressed as a number preceded by “+” (optionally) or “-” and followed by “%”, indicating the relative change. For example: or <prosody volume="+3%">some text .A constant value:silentx-softsoftmediumloudx-louddefault | Optional |
Say hidden text
Use [talker-say]
shortcode to voice text but not display it on the page front end. This means that the text will be displayed in the page editor, voiced by the Speaker but hidden for users.
[talker-say]
This text is converted to audio but not displayed to users[talker-say]