Using Multiple TTS Streams On The Emacspeak Audio Desktop

1 Executive Summary

Emacspeak now uses multiple text-to-speech streams — as an example,
this enables spoken notifications that do not interrupt ongoing spoken
output. To make such notifications more perceivable, Emacspeak places
notifications to the right of the user by leveraging Linux-ALSA
features that allow one to scale the amplitude of the left and right
audio channels.

2 Background

Until now, Emacspeak has used a single instance of a Text-To-Speech
(TTS) engine to produce all spoken feedback. An unfortunate
consequence is that any spoken announcement necessarily interrupts
ongoing speech; as an example, an incoming instant-message (e.g.,
Jabber notification) can interrupt what you're currently
reading.

Emacs itself produces a large number of asynchronous messages
depending on the number of processes running within Emacs; at present,
all Emacs generated messages are equal though there are ongoing
plans to improve this situation going forward, e.g., using package
alert. With Emacspeak now able to use multiple TTS streams, arrival
of package alert within Emacs should facilitate smarter handling of
different categories of messages over time.

Playing multiple TTS streams simultaneously can make it hard to
understand the resulting output; Emacspeak leverages underlying ALSA
functionality to send notifications to a virtual ALSA device that
places the auditory output mostly on the right channel. See the
following paragraphs on setup/configuration. I'm presently using this
on Linux with the linux-outloud voice — you need to have a copy of
this TTS engine installed and working — see Voxin for details on
obtaining that engine. Note: the Emacspeak espeak server does not
use raw ALSA for its output — consequently, notifications produced
by espeak play on both left and right channels, making it
impossible to understand. The mac server may be able to support
this functionality using something Mac-specific — patches welcome.

3 Emacspeak Setup

Emacspeak now adds user-option
emacspeak-tts-use-notify-stream. If this is set to t in the
user's initialization file before Emacspeak is loaded, Emacspeak
checks to see if the user's selected TTS engine supports multiple
instances, and if so launches a second instance of the TTS engine
for use as a Notification TTS Stream. See my
tvr/emacs-startup.el in the Emacspeak Git Repository for an
example setup.
The Notification TTS Stream can be restarted via command
dtk-notify-initialize bound to C-e d C-n. You should
ordinarily not need to invoke this command.
The Notification TTS Stream can be shut-down using command
dtk-notify-shutdown bound to C-e d C-s. When the /Notification
TTS Stream is not available, Emacspeak defaults to using a single
TTS stream for all spoken output — i.e., no change.
At present, emacspeak tries to use a separate Notification TTS
Stream when the selected TTS engine is a software TTS
running locally.
File servers/linux-outloud/notify-asoundrc contains the
.asoundrc that I am using on my thinkpad. To have Emacspeak
place the Notification TTS Stream mostly on the right, the
contents of that file (suitably modified for your sound card)
need to be placed in file $HOME/.asoundrc. Warning: Handle with
care — a broken .asoundrc can kill all audio output.
The .asoundrc scales left and right amplitude to place the
output mostly on the right — to change this behavior, you can
edit the Transformation Table for virtual device tts_mono in
the .asoundrc file.
This set-up has not been tested with pulseaudio.

4 Summary

Share and enjoy —

EMACSPEAK The Complete Audio Desktop

Tuesday, November 10, 2015