GoldWave Manual

The Speech Converter tool converts written text to spoken audio (text-to-speech) or spoken audio to text (speech recognition or dictation).

GoldWave uses the speech software in Windows to perform all conversion, so the quality of the voice or the accuracy of the recognition depends entirely on that software. The Speech Converter tool is not supported and will not work in versions of Windows that do not include the speech software. For more information see Microsoft's Speech website. Different voices and recognition engines are available from other vendors.

Use the Speech settings in the Windows Control Panel to configure text-to-speech and train speech recognition to your voice.

The Speech Converter tool consists of a text area with buttons above and below. The buttons along the top open a text file, save the text to a file, and perform basic editing functions on text. Use the Context Menu key (or Shift+F10) to display all the button commands as a menu for easier accessibility.

The buttons below the text area speak the audio, save the speech to an audio file, or take dictation from the microphone or an opened audio file.

Text To Speech (Reading)

Copy the contents of a website, document, report, or even chapters from a digital book (Sherlock Holmes, for example) and Paste them into the text area to have GoldWave read them to you.

You can save the audio directly to a file to copy it to your iPod or other portable player to listen to while jogging, working out, or doing other activities where reading isn't possible.

Use the Speak button to read the text. Playback is started from the edit cursor or at the beginning of the selection, if there is one. If the edit cursor is at the end of the text and there is no selection, then the entire contents of the text area is read.

Use the Voice Settings button to change the voice, volume, speed, and pitch. Windows usually includes just one voice, but others can be installed.

XML modifiers (such as <pitch middle='5'/>) are supported within the text to change the voice. Search online for "SAPI XML tags" for more information about modifying the voice using tags.

Use the Speak To File button to read the text directly to an audio file. Text is processed much faster than speaking through the sound hardware. Be sure to select all the text, otherwise only the selection or the text after the edit cursor is read to the file.

Audio files will be significantly larger than the original text files, so you must select a file type and attributes that minimize the audio file size while preserving good quality. Fortunately voice files do not require a high sampling rate or bitrate. The bitrate number, given in kilobits per second (kbps), controls the amount of space required per second of audio. Examples are given in the table below, with the amount of storage per minute. Note that CD quality audio requires 10MB per minute, which should be avoided if you intend to copy the audio to a portable player. The MP3 format may be the only one that will play on all portable players.

File Type, Attribues, and Size
File Type	Attributes	MB/minute
MPEG Audio	Layer-3, 22050 Hz, 32 kbps, mono	0.240
Windows Media Audio	WMA Voice 9, 20 kbps, 22.05 Hz, mono	0.150
Ogg	Vorbis 22050 Hz, 30kbps (0.1q), mono	0.225
Wave	PCM signed 16 bit, stereo (44100 Hz, CD Quality)	10.58

When attributes do not include a sampling rate, then the rate specified under Voice Settings button is used.

Voice Settings Window

Voice Settings button in Speech Converter.

Use Voice Settings to change the voice, volume, speed, and pitch. Not all voice engines support all settings or allow them to be finely adjusted.

Voice Settings
Setting	Description
Voice	Sets the voice engine to use. Windows may include just one voice, but others can be installed, such as IVONA.
Volume	Sets the loudness of the voice. Normally full volume is used.
Speed	Sets how slow or fast the voice speaks. Choose a setting you find most intelligible.
Pitch	Sets the tone of the voice. Some voices can be changed up or down by a full octave.
Rate	Sets the default sampling rate used when saving the speech to a file. This is used only if the selected file attributes do not specify a rate.

Speech To Text (Speech Recognition)

Use the Dictate button to record or convert audio to text. The source audio can be taken directly from the computer's Microphone or from the currently opened file (if present). Use the Dictation Settings to select the source and configure the microphone or train the recognition engine.

When processing from a file, a progress bar appears at the bottom of the window. Note that for long sections of speeches without any pauses, there will be a significant delay before any text appears and the progress bar advances.

Speech recognition is an evolving technology and is still far from perfect, so don't expect highly accurate dictation. Any background noise or music at all adversely affects accuracy. Using GoldWave's Noise Reduction effect or other filters may reduce accuracy as well. Recordings must be as clean as possible, without any effects processing.

Using the Microphone source may change your systems's audio settings and GoldWave's recording input. Use Record to reselect the input before starting any new recordings.

Dictation Settings Window

Dictate button in Speech Converter.

Before dictation can begin, the speech recognition software and the audio source must be specified.

Use the Recognizer drop-down list to select a speech recognition engine. Some versions of Windows may not include any speech recognition software. You can install the Microsoft Speech SDK (available from Microsoft's website) to enable speech recognition on some versions of Windows.

The audio Source may be from the microphone or from the current selection of an opened file in GoldWave. Use the Configure and Train buttons to configure the microphone and train the recognition software to understand your voice.

If you are unable to get speech recognition to work in GoldWave, use the Speech settings in Windows Control Panel to configure it.