EMACSPEAK The Complete Audio Desktop

Monday, May 22, 2006

ASoundrc Parameters For Reliably Using ALSA Powered Software TTS

Advanced Linux Sound Architecture ALSA is a boon for software TTS users --- you can now use your soundcard to produce spoken output while not losing audio output from other applications such as music players and streaming radio stations.

Emacspeak implements an ALSA-enabled TTS server for the IBM ViaVoice engine --- using this server effectively requires appropriately tuning the parameters in the user's asoundrc file to:

Enable the DMix plugin to enable software mixing of multiple channels of audio.
To configure the various parameters ALSA itself uses.

Depending on how well your sound-card is supported by ALSA, the above can be either trivially simple or a tedious process of trial and error. I'm writing this up to:

Collect a list of sound cards on which the asoundrc provided with Emacspeak works as expected.
In the hope that the wider ALSA community discovers and helps flesh out this material; my hope is that the ALSA community has more insight into how these settings work.

For the above, works effectively means the following:

The TTS engine speaks without perceptible stuttering or other audio artifacts.
The engine is responsive with respect to starting and stopping speech; especially when typing fast at high speech rates.
The TTS engine does not interfere with other alsa-enabled applications, e.g. mplayer.

At the end of this entry, you can find the relevant section from the asoundrc file from the Emacspeak distribution, with comments indicating which sound cards perform well. An example of a card that does not work well with these settings is the Audigy-LS from Creative; the TTS engine works on that card, but performs degrades:

mplayer cannot use the audio device; (aplay and mpg321 are able to share the card with the TTS engine.)
Speech does not stop immediately as on the soundcards enumerated in the asoundrc file.

  Id: asoundrc,v 1.3 2006/05/23 00:22:16 raman Exp $
#these numbers work on the following:
# aplay -l | head 1
# I82801DBICH4 [Intel 82801DB-ICH4] (IBM Thinkpads)
# ICH6 [Intel ICH6],

#  default device is a mixer

pcm.!default {
    type plug
    slave.pcm "dmixer"
}

pcm.dmixer  {
    type dmix
    ipc_key 1024
    slave {
        pcm "hw:0,0"
        format s16_LE
        period_time 0
        period_size 1024
        buffer_size 4096
        rate 44100
    }
    bindings {
        0 0
        1 1
    }
}

Wednesday, May 03, 2006

Listening To The Web Through A Mobile Lens

The similarities between Web access issues faced by mobile users and those confronting eyes-free Web browsing are striking, and these similarities have often been used to advocate the creation of well-structured, accessible Web content. As an example of mobile-friendly content being a blessing for eyes-free spoken access to WebFormation, Emacspeak provides a mobile lens via the Google Mobile transcoder.

Here are a few convenient means of using the above within the Emacspeak Audio Desktop:

While browsing the Web using w3, press t on a link (command: emacspeak-w3-transcode-via-google) to view that link through the mobile transcoder.
Note that all links in the resulting mobile view automatically go through the transcoder.
To undo the effect of automatically viewing links in the mobile view through the transcoder, use t with a interactive prefix argument i.e., press C-u t to follow a link to view it in its original form.
Additionally, I bind command emacspeak-wizards-google-transcode to a convenient key so that I can launch Web sites using the mobile view.

I use this tool on a regular basis while commuting to work to browse mainstream news sites, it provides speech-friendly content that has the added benefit of downloading fast over a wireless link --- after all, this is Mobile content.

Tuesday, May 02, 2006

Announcing Emacspeak 24.0 (LiveDog)

For Immediate Release

San Jose, CA, (May 3, 2006)
Emacspeak-Alive: --- Bringing Live Access For Enlightened Users
--Zero cost of ownership makes priceless software affordable!

Major Enhancements

emacspeak-muse: Speech-enabled Muse Mode
emacspeak-ruby: Speech-enabled Ruby Mode
emacspeak-m-player: Updated for new MPlayer
emacspeak-sudoku.el: Speech-enabled SuDoKu
New Option: tts-strip-octals
emacspeak-keymap.el Updated keybindings
lisp/atom-blogger.el Light-weight blogging tool
emacspeak-atom-blogger: Speech-enables above
voice-setup.el Custom support
Multispeech related patches
User contributed patches

Friday, March 10, 2006

W3: Minor Patch To Handle Content-Type application/xhtml+xml

Here is a minor patch to w3.el to allow it to handle content-type application/xhtml+xml. For all practical purposes (at least as far as W3 is concerned), this can be handled by the html parser/renderer; however since that content-type did not exist at the time W3 was written, it offers to download/save documents of that type. The attached patch fixes this, and also adds a fix to a minor irritant with decoding of multimedia attachments.

Index: w3.el
===================================================================
RCS file: /cvsroot/w3/w3/lisp/w3.el,v
retrieving revision 1.32
diff -b -c -r1.32 w3.el
*** w3.el	12 Jan 2003 22:10:25 -0000	1.32
--- w3.el	11 Mar 2006 02:24:52 -0000
***************
*** 34,39 ****
--- 34,40 ----
  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
  
  (require 'w3-sysdp)
+ (eval-when-compile (require 'mm-decode))
  (require 'w3-cfg)
  
  (or (featurep 'efs)
***************
*** 325,331 ****
  				  (mm-handle-media-type handle)))))
        ;; Fixme: can handle be null?
        (cond
!        ((equal (mm-handle-media-type handle) "text/html")
  	;; Special case text/html if it comes through w3-fetch
  	(set-buffer (generate-new-buffer " *w3-html*"))
  	(mm-disable-multibyte)
--- 326,333 ----
  				  (mm-handle-media-type handle)))))
        ;; Fixme: can handle be null?
        (cond
!        ((or (equal (mm-handle-media-type handle) "application/xhtml+xml")
!          (equal (mm-handle-media-type handle) "text/html"))
  	;; Special case text/html if it comes through w3-fetch
  	(set-buffer (generate-new-buffer " *w3-html*"))
  	(mm-disable-multibyte)

Wednesday, March 08, 2006

Blogging From Emacs: Additional Atom-Blogger Documentation

Thanks to Jason Dunsmore for writing up some additional step-by-step documentation on using atom-blogger.

Thursday, February 23, 2006

Emacspeak: Connecting Lynx And W3

Emacs/W3 is still the best Web page rendering option inside Emacspeak given the ability to apply XSL transforms, as well as obtaining aural styling via ACSS. However W3's url handling layer often breaks when faced with multiple redirects, especially when some of these happen through the Host: HTTP header. Additionally, HTTPS authentication sometimes fails mysteriously in the presence of redirects.

In many of these cases, lynx happily fetches the pages correctly; however you're then stuck using a fairly weak auditory interface in that Emacspeak degrades to being aterminal level screenreader.

An effective solution to this problem is to use lynx within an Emacs terminal, and after finding the content that is worth reading, handing off that content to Emacs/W3. The next few paragraphs show how.

The `lynx-site.cfg` File

This is where you add site-specific configurations. Here are the lines I have in my lynx-site.cfg to integrate lynx and Emacs. Before you use any of this, make sure you have executed M-x server-start in your running Emacs, and make sure that all is well by experimenting with emacsclient to ensure that external programs can hand-off editting tasks to the currently running Emacs.

#site defaults
#for bookshare:
DOWNLOADER:BKS Unpack:bks.pl  %s %s:TRUE 
PRINTER:Edit:emacsclient %s:TRUE
KEYMAP:???:EDITTEXTAREA	# use external editor to edit a form textarea
PRETTYSRC:TRUE
SOURCE_CACHE:MEMORY
SAVE_SPACE:~/.wget/
BOLD_HEADERS:TRUE
PRINTER:W3:emacsclient -e '(w3-open-local "%s")':TRUE

Below, I'll describe what each of the above lines do:

DOWNLOADER:BKS Unpack:bks.pl %s %s:TRUE
The above line creates an additional item in the download menu that invokes the BookShare unpacker. Script bks-unpack.pl invokes the BookShare unpack tool with the appropriate options.
PRINTER:Edit:emacsclient %s:TRUE
This creates an Edit item in the print menu. Invoking this menu item causes the current page to be handed off to Emacs for editting. If you want to edit the source, first switch to source view by hitting \ before invoking print.
KEYMAP:???:EDITTEXTAREA # use external editor to edit a form textarea
This sets lynx up so that when editting a multiline textarea, you can hand off the editting job to Emacs. This is particularly useful for editting Wiki pages. Replace the ?? with the desired key sequence.
PRETTYSRC:TRUE SOURCE_CACHE:MEMORY
The above two settings make the edit source functionality more pleasant to use.
PRINTER:W3:emacsclient -e '(w3-open-local "%s")':TRUE
The above creates a W3 menu item in the print menu. Invoking this causes Emacs/W3 to display the current page --- again switch to source view before invoking this so that Emacs/W3 gets handed the HTML markup.

Script `bks-unpack.pl`

#!/usr/bin/perl -w
#$Id: bks.pl,v 1.1 2003/07/04 15:41:55 tvraman Exp tvraman $
#Description: Bookshare downloader for Lynx
use strict;
my $location="$ENV{HOME}/books/book-share";
my $password = 'xxxxxxx';
my $grabbed = shift;
my $target = shift;
my $dir =qx(basename $target .bks);
chomp $dir;
my $where = "$location/$dir";
qx(mkdir -p $where);
qx(mv $grabbed  $where/$target);
chdir $where;
qx(echo $password | bks-unpack -q $target 1>&- 2>&- &);

Tuesday, February 21, 2006

Emacspeak, SuDoKu And History

Here is a small enhancement to playing SuDoKu in Emacspeak. The feature is probably generally useful i.e., it's not specific to eyes-free interaction, but its presence encourages one to try different solution strategies.

Commands emacspeak-sudoku-history-push bound to m and emacspeak-sudoku-history-pop bound to M allow one to mark interesting states in the game and return to these prior states with a single keystroke. This means that when one is confronted with one of two choices, with no apparent additional information on which route to take, it becomes possible to push that state on to the history stack, try one of the alternatives and backtrack if necessary.

Monday, February 20, 2006

Emacspeak And Voice Locking Using Aural CSS

This is slightly reformatted from what was posted to the Emacspeak mailing list as separate message.

Emacspeak defines a number of voice overlays such as voice-bolden, and voice-lighten that can be applied to a given voice to change what it sounds like.
Voice overlays are defined in terms of Aural CSS (ACSS) to keep them independent of a specific TTS engine.
For each such overlay there is a corresponding <overlay-name>-settings variable that can be customized via custom.
The numbers in voice-bolden-settings as an example:

Setting	Value
family	nil
average-pitch	1
pitch-range	6
stress	6
richness	nil
punctuation	nil

Unset values (nil) show up as "unspecified" in the customize interface.

Do not directly customize voice-bolden and friends, instead customize the corresponding voice-bolden-settings, since that ensures that all voices that are defined in terms of voice-bolden get correctly updated.
Discovering what to customize:

Command emacspeak-show-personality-at-point (bound by default to C-e M-v) will show you the value of properties personality and face at point. A recent update I implemented last weekend makes this more useful, so make sure you do a CVS update; earlier this command used to display the ACSS setting --- now it displays the abstract name. Describe-variable on these names should tell you what to customize; so as an example:

Put point on a comment line, and hit C-e M-v: you will hear

Personality emacspeak-voice-lock-comment-personality
Face font-lock-comment-delimiter-face

Describe-variable of emacspeak-voice-lock-comment-personality gives:

emacspeak-voice-lock-comment-personality's value is acss-p0-s0-all

Documentation:
Personality used for font-lock-comment-face
This personality uses  voice-monotone whose  effect can be changed globally by customizing voice-monotone-settings.

How It All Works

Here is a brief explanation of the connection between voice-bolden and its associated voice-bolden-settings.

Voice settings are initially in voice-bolden-settings which is a list of numbers.
That list of numbers needs to be translated to appropriate device-specific codes to send to the TTS engine.
You do not want to do this translation each time you speak something.
So when voice-bolden is defined, the definition happens in two steps:

The list of settings is stored away in voice-bolden-settings,
A corresponding voice-name is generated --- acss-a<n>-p<n>-r<n>-s<n> and the corresponding control codes to send to the device are stored away in a hash-table keyed by the above symbol.
Finally, voice-bolden is assigned the above symbol.

What this gives is:

The ability to customize the voice via custom by editting the list of numbers in voice-bolden-settings
When that list is editted, voice-bolden is arranged to be updated automatically.

Other Useful Commands

In addition, commands emacspeak-wizards-generate-voice-sampler can be useful in generating a buffer that shows what the various ACSS settings sound like. Command emacspeak-wizards-voice-sampler can be used to apply a specific voice to a region of text while experimenting with the various settings.

Saturday, February 11, 2006

Playing SuDoKu Using Auditory Feedback

Emacspeak speech-enables SuDoKu implemented by sudoku.el. Speech-enabling games is an effective means of discovering what additions one needs to make to an auditory interface for working effectively in an eyes-free environment --- this was aptly demonstrated a few years ago by identifying interesting conversational gestures by speech-enabling the game of Tetris --- see Conversational Gestures For The Audio Desktop from Assets 1998.

Advicing Interactive Commands

As with speech-enabling any Emacs module, emacspeak-sudoku advices all interactive commands to produce spoken feedback. In addition to speaking the cell moved to, all navigation commands produce an auditory icon that is a function of whether the cell value is mutable --- original values cannot be changed and this is indicated with a distinctive icon.

Additional Interactive Commands

Playing SuDoKu effectively requires one to build a good mental image of the state of the board as well as the ability to effectively query the game for currently active constraints. The eye's ability to quickly move around the board and perceive row, column and sub-square constraints needs to be compensated for in an eyes-free environment. As an example, it is too difficult to build the necessary mental model by just listening to the board spoken aloud, or by listening to idnividual cells by navigating to them.

Here are the set of additional interactive commands that needed to be added in order to be able to play the game effectively.

r

Speak current row.

c

Speak current column

s

Speak current sub-square.

R

Speak number of remaining cells in current row.

C

Speak number of remaining cells in current column.

S

Speak number of remaining cells in current sub-square.

d

Move to the sub-square below the current sub-square.

u

Move to the sub-square above the current sub-square.

n

Move to the next sub-square.

p

Move to the previous sub-square.

a

Move to the beginning of current row.

e

Move to the end of the current row.

t

Move to the top of the current column.

b

Move to the bottom of the current column.

,

Speaks information about the overall distribution of numbers on the board.

d --- Conveys how many instances of each digit have been filled in.
s --- Conveys number of remaining cells in each sub-square.
r --- Conveys number of remaining cells in each row.
c --- Conveys number of remaining cells in each column.

/

Speaks number of remaining cells in the current board.

.

Speaks value in current cell.

Notes on how invormation is spoken:

Numbers are spoken in groups of 3 to achieve effective intonation.
When navigating by sub-squares, point always moves to the top left corner of the sub-square.
Additional commands bound to M-r, M-c and M-s erase the current row, column or sub-square respectively. These commands would probably be convenient to have independent of whether one is using visual output.

Effectiveness Of The Resulting Interface

With the above interface in place, the simpler levels of the game are a breeze, levels difficult and evil are sufficiently challenging to be fun.

Friday, January 27, 2006

Browsing Sourceforge Download Servers

Sourceforge is a nice service, but it can also be painful to use because of the heavy-weight Web page design, and the need to repeatedly click before you get the download you want.

The most irksome of these is the download mechanism provided by Sourceforge --- where you first need to browse a list of download servers, pick a mirror, and then download what you want. Emacspeak implements a Smart URL that enables one to download from Sourceforge in a single step.

By default, this uses a North American mirror; the behavior can be customized if outside the US. Use smart URL Sourceforge Browse Mirror and specify the name of a SF hosted project when prompted. This brings up the index page for the project's download area, sorted by date. Move to the bottom of the page and hit b to move to the latest available download.

The smart URL sets up the W3 buffer with a context-sensitive download function; when on a download link, hit C-d to start downloading. This command will prompt for the URL; rather than hitting return (which would bring you to the browse mirrors page, hit M-p to get the download URL for your SF mirror. Note that this wizard uses GNU wget to perform the download via Emacs module w3-wget.

BBC Channels On Emacspeak

Since the BBC's various channels are what I listen to the most, launching BBC channels has always been a couple of keystrokes in Emacspeak. As a first step, directory realaudio/radio contains shortcut files for launching live streams from the various BBC channels.

In addition, module emacspeak-url-template defines a number of Smart URLs for single-click access to BBC programs. The ones I use the most are:

Smart URL BBC Channels On Demand, and
Smart URL BBC Genres On Demand

These smart URLs prompt for the channel or genre respectively and bring up a Web page that lists the various shows that are available --- note that the BBC archives shows for a whole week. The resulting Web page is easy to browse in W3; the most effective way to skim the buffer is to repeatedly hit i which moves through the various items on the page. Hitting e e (that's the letter e twice) while on a hyperlink will launch the corresponding media stream by calling a context-aware command that knows about transforming the URL to one that accesses the program stream; --- note that simply following the hyperlink will get you first to a page about the program, rather than to the program stream itself.

To find out what channels and genres are available, browse the BBC Web site --- channel and genre names are not hard-wired into Emacspeak since these can change over time with channels and genres being added or renamed.

Thursday, January 26, 2006

Emacspeak World Clock For Timezone Travel

Command emacspeak-speak-time bound to C-e t speaks the current time. An additional convenience offered by this keystroke is to get the time at a specified time zone using Emacs' completion facility.

To use this feature, simply precede the keystroke with an interactive prefix arg i.e., use C-u C-e t. This will prompt for the timezone in the minibuffer. Using two C-u C-u will set the default timezone after speaking the time --- a useful way of avoiding jet-lag as you travel.

Sunday, January 22, 2006

Emacspeak Web Wizards: Obtaining Context From The Calendar

Emacspeak implements a number of smart URLs in module emacspeak-url-template.el --- see earlier post on Web Command Line. Many of these smart URLs prompt the user for the date, e.g. you can use smart URL NPR On Demand to play archived NPR shows.

The most intuitive means of specifying a date is of course using a calendar that functions as a date-picker, and Emacs has a very powerful built-in calendar. Emacspeak ties these two together by arranging for commands that prompt for a date to use the current date in the Emacs Calendar as the default. So the easiest way to play NPR Morning Edition for Monday, January 2, 2006 is to do the following:

Switch to the Emacs Calendar and move to the desired date Monday January 2, 2006 by pressing gd.
Invoke the NPR On Demand smart URL by pressing C-e u RET NPR RET
Specify the program code for Morning Edition by pressing me RET
Hit enter to pick the default date that is offered in the minibuffer.
Sit back and listen ...

Tuesday, January 17, 2006

Viewing Atom Feeds Within Emacspeak

The most effective way of viewing Atom Feeds in Emacspeak is to use command emacspeak-atom-display and specifying the URL of the feed when prompted. Thus, M-x emacspeak-atom-display RET http://emacspeak.blogspot.com/atom.xml displays a Web page generated from the Emacspeak Blog.

Notice the following in the generated Web page:

It starts with a navigable table of contents.
Each Blog entry has a link labeled edit next to it.
Each Blog entry ends with a link labeled Bookmark.
There is a link labeled Post at the top of the page.

The above links help you easily create and edit posts to the Blog if you have write access using commands provided by module atom-blogger. Eventually, I may add commands to these hyperlinks to automatically invoke the appropriate command from atom-blogger; for now, I find it sufficiently convenient to copy the URL under point to the kill-ring and later yank it back into the minibuffer when prompted by atom-blogger.

Finally, note that this and subsequent posts to this Blog will show up automatically on the Emacspeak Mailing List at Vassar.

Viewing Formatted Source Code In Emacs/W3

While reading online texts on programming in Python and Ruby, I noticed that Emacspeak was not announcing indented lines in preformatted source-code examples, even with audio indentation turned on. The reason is that many of these texts use an HTML non-breaking space for indentation, and though W3 was rendering these correctly, the default syntax table in W3 had not defined the resulting octal 240 to be of class white-space. Consequently, Emacspeak's audio indentation code was not treating the non-breaking space as white space.

I've checked in a patch to emacspeak-w3.el that modifies the syntax table in w3-mode by adding the appropriate lines to w3-mode-hook.

Saturday, January 14, 2006

Speech-Enabled ATOM-Blogger

Module atom-blogger is a light-weight Emacs client for creating or editting blogger posts using ATOM. Emacspeak bundles atom-blogger and speech-enables it via module emacspeak-atom-blogger.

Module emacspeak-setup.el has been updated to set up the Emacs' load-path to locate package atom-blogger, so if correctly installed, Emacspeak users should be able to launch and use atom-blogger with no further configuration.

Thursday, January 12, 2006

Emacspeak And Ruby

Emacspeak now speech-enables ruby-mode to support developing Web applications using Ruby On Rails. I presently use nxml-mode for editing the .rhtml files, but am looking for an alternative to using multi-mode or its variants when editing the embedded Ruby code. Sadly, one has to turn off nxml-mode's validity checking while editing .rhtml files --- otherwise it complains about the <% directives.

Monday, January 02, 2006

Emacspeak Wizard: Recording Audio Streams For Later Playback

Emacspeak includes a large collection of wizards implemented in module emacspeak-wizards.el One of these ---emacspeak-wizrds-rivo works hand-in-hand with script etc/rivo.pl to provide a simple record for later playback facility that can be used to record live realaudio streams for future playback. This is useful for listening to live broadcasts at a more convenient time.

Wizard emacspeak-wizrds-rivo prompts for the time at which to record, the length of the recording, the stream to record, and the location in which the recording is to be stored. It then uses command trplayer (text-mode RealPlayer) with command vsound to capture the audio stream, and converts the result to MP3 using command lame. ToDo: With mplayer now able to play RealAudio streams, the etc/rivo.pl script should be updated to use mplayer since this will :

Remove the vsound dependency.
Enable us to record more than just RealAudio streams.

Saturday, December 31, 2005

Emacs Tip: Viewing Commands Available On A Prefix Key

Emacs, and consequently Emacspeak, uses a number of multi-key sequences. The initial key that makes up such a multi-key sequence --- AKA the prefix-key that leads to all Emacspeak commands is C-e (control-e).

Pressing any prefix-key followed by the help-key (C-h) results in Emacs displaying a *Help* buffer that lists all key-sequences beginning with that prefix.

This can be a very useful way of quickly seeing what keyboard commands are available --- as the resulting listing is shorter than what is produced by C-h b (command describe-bindings) which lists all key-bindings.

To use this feature to review Emacspeak's key-bindings, I have now moved command emacspeak-learn-emacs-mode to C-h C-l and C-e F1 from its original key-binding of C-e C-h --- the change is checked into CVS.

Friday, December 30, 2005

Using The New MPlayer With ALSA Support

New versions of the Linux mplayer can now play RealMedia files in addition to the various DVD and Windows audio and video streams. As an added bonus, mplayer can be built to use ALSA ---(note that default RPMs available on the Web still default to OSS as of December 2006). This, with the additional support for newer RealMedia formats has finally caused me to switch away from trplayer a---a command line RealPlayer that was built against RealPlayer 8.0

Note that building mplayer from source to include ALSA support, and locating all the codecs you need will require a few Google searches. Specifically, to play NPR streams, you will need to grab avisynth.dll from the SourceForge project of the same name. But once this is done, mplayer will happily play multiple streams of audio without requiring multichannel support from the sound card. For Emacspeak users this means that you get auditory icons, and more importantly software TTS while listening to audio streams.

To use software TTS while mplayer is playing audio streams requires that you use a TTS server that uses ALSA; the default Emacspeak servers still use OSS. To build a TTS server for IBM ViaVoice TTS, obtain Emacspeak sources from CVS, and follow the instructions in file servers/linux-outloud/ALSA.

As of a few weeks ago, The Emacspeak CVS repository has been updated so that all access to streaming media goes through mplayer by default. Make sure to configure your mplayer with a smaller cache size if you use it exclusively for streaming audio; by default it uses a rather large cache, and streaming realaudio often takes a long time to start before the cache is configured to be something smaller; I use a cache size of 64kb on my laptop.