Monday, February 20, 2006

Emacspeak And Voice Locking Using Aural CSS

This is slightly reformatted from what was posted to the Emacspeak mailing list as separate message.

  1. Emacspeak defines a number of voice overlays such as voice-bolden, and voice-lighten that can be applied to a given voice to change what it sounds like.
  2. Voice overlays are defined in terms of Aural CSS (ACSS) to keep them independent of a specific TTS engine.
  3. For each such overlay there is a corresponding <overlay-name>-settings variable that can be customized via custom.
  4. The numbers in voice-bolden-settings as an example:
Setting Value
family nil
average-pitch 1
pitch-range 6
stress 6
richness nil
punctuation nil
Unset values (nil) show up as "unspecified" in the customize interface.
  1. Do not directly customize voice-bolden and friends, instead customize the corresponding voice-bolden-settings, since that ensures that all voices that are defined in terms of voice-bolden get correctly updated.
  2. Discovering what to customize:

Command emacspeak-show-personality-at-point (bound by default to C-e M-v) will show you the value of properties personality and face at point. A recent update I implemented last weekend makes this more useful, so make sure you do a CVS update; earlier this command used to display the ACSS setting --- now it displays the abstract name. Describe-variable on these names should tell you what to customize; so as an example:

Put point on a comment line, and hit C-e M-v: you will hear

Personality emacspeak-voice-lock-comment-personality
Face font-lock-comment-delimiter-face

Describe-variable of emacspeak-voice-lock-comment-personality gives:

emacspeak-voice-lock-comment-personality's value is acss-p0-s0-all

Personality used for font-lock-comment-face
This personality uses  voice-monotone whose  effect can be changed globally by customizing voice-monotone-settings.

How It All Works

Here is a brief explanation of the connection between voice-bolden and its associated voice-bolden-settings.

  1. Voice settings are initially in voice-bolden-settings which is a list of numbers.
  2. That list of numbers needs to be translated to appropriate device-specific codes to send to the TTS engine.
  3. You do not want to do this translation each time you speak something.
  4. So when voice-bolden is defined, the definition happens in two steps:
  • The list of settings is stored away in voice-bolden-settings,
  • A corresponding voice-name is generated --- acss-a<n>-p<n>-r<n>-s<n> and the corresponding control codes to send to the device are stored away in a hash-table keyed by the above symbol.
  • Finally, voice-bolden is assigned the above symbol.

What this gives is:

  1. The ability to customize the voice via custom by editting the list of numbers in voice-bolden-settings
  2. When that list is editted, voice-bolden is arranged to be updated automatically.

Other Useful Commands

In addition, commands emacspeak-wizards-generate-voice-sampler can be useful in generating a buffer that shows what the various ACSS settings sound like. Command emacspeak-wizards-voice-sampler can be used to apply a specific voice to a region of text while experimenting with the various settings.