Saturday, April 25, 2015

Emacspeak 3.0: Released 20 Years Ago Today!

twenty-years-after

1 Emacspeak Was Released Twenty Years Ago Today

The more things change, the more they remain the same.

Emacspeak was released 20 years ago on April 25, 1995 with this announcement. The Emacspeak mailing list itself did not exist in its present form — note that the original announcement talks about a mailing list at DEC CRL. When Greg started the mailing list at Vassar, we seeded the list from some/all of the messages from the archive for the mailing list at DEC.e

Wednesday, April 15, 2015

HowTo: Log Speech Server Output To Aid In Developing TTS Servers

HowTo: Log TTS Server Output To Aid In TTS Server Development

1 HowTo: Log TTS Server Output To Aid In TTS Server Development

This is mostly of interest to developers of Emacspeak speech servers. This article outlines how one can log TTS server output to a file. The loggeds record all commands send by Emacspeak to the TTS server. It is best to generate the logs in an Emacs session that is separate from the Emacs session where you are developping your code. This keeps the logs short, and makes isolating problems much easier.

1.1 How It Works

The emacspeak/servers directory now contains log_<tts-name> servers for the various supported speech servers. When selected, these log-speech servers produce no speech output; instead, they output the speech server commands received from Emacspeak to a file in /tmp named tts-log-$$. Once you're done logging, you can examine this file from the primary Emacs session.

1.2 Typical Workflow

Assume you want to see the speech-server commands sent by Emacs when you perform a specific action, in this instance, pressing C-e m to execute command emacspeak-speak-mode-line.

  1. In a separate Linux console or X-Window, launch Emacs with Emacspeak loaded — this is separate from your primary Emacs session.
  2. In this Emacs session, use C-e d d (command dtk-select-server) and select log-<tts-name> as the speech server, where tts-name corresponds to the speech engine you're testing.
  3. Emacspeak will now start the logging server, and fall silent; all commands sent by Emacspeak to the speech-server will be logged to a file in /tmp.
  4. Press C-e m – to produce the log output you want to see.
  5. Use command _emacspeak-emergency-tts-restart to get speech back.
  6. Open a dired buffer on /tmp, press s to sort files by date, and find your generated log output at the top of the list.
  7. Note: It is useful to configure your default speech engine via Custom – see user option emacspeak-emergency-tts-server. It provides a quick-fire means to get speech back if you ever switch to a speech-server that fails for some reason.

Share And Enjoy

Date: <2015-04-15 Wed>

Author: raman

Created: 2015-04-15 Wed 17:33

Emacs 25.0.50.1 (Org mode 8.2.10)

Validate

Monday, March 09, 2015

Emacspeak Development Is Moving To GitHub

Emacspeak Development Is Moving To GitHub

1 Summary:

Emacspeak development is moving from Google Code Hosting to GitHub. If you have been running from the SVN repository, I recommend you switch to the GitHub version by executing:

 
git clone https://github.com/tvraman/emacspeak.git 
make config 
make -j 
  • If using Outloud TTS:
 
cd servers/linux-outloud  &&  make 
  • If using Espeak TTS:
 
cd servers/linux-espeak && make 
  • After this, all you should need to stay up to date is a periodic
 
git pull; make config; make  

2 A Brief History

  • The first five years of Emacspeak development used a local RCS repository on my home machine (1994 –1999).
  • The first few releases of Emacspeak were distributed through the Web site and FTP server at Digital Research; they were also mirrored at Cornell.
  • After moving to Adobe Systems in the fall of 1995, Emacspeak was distributed exclusively through my Web page on the Cornell CS Department Web server, which also hosted my personal Web site.
  • In 2000, I created Emacspeak On SourceForge

and used that site for both hosting the Emacspeak source code as well as the Web site — coincidentally, I lost the ability to update my Web site at Cornell CS around the same time.

  • Over time it became harder and harder to publish new Emacspeak releases through the SourceForge interface. Luckily, Google Code Hosting came along a few months after I joined Google, and moving the source code repository to Google Code SVN was a no-brainer.
  • My friend and colleague Fitz helped me migrate the 5+ years of CVS history to SVN; this meant that the source code repository on Google Code also recorded all of the development history that had been built up on Sourceforge.
  • Now, it's time to move to GitHub. I've been using Git for most of my work the last few years, but was simply too lazy to move Emacspeak development from SVN to Git on GoogleCode.
  • But over time, the advantages present in Git as a source control system and GitHub as a hosting service have increased — primary among these — a rich set of Emacs tools that have been written to leverage the GitHub API.
  • For Git integration in Emacs, my personal preference is package Magit available through Elpa —
 
M-x package-install magit in Emacs. 
  • The GitHub Web site itself is fairly heavy-weight in terms of its use of scripting, i.e. performing all operations through the github.com Web site from within Emacs is fairly unpleasant. But the afore-mentioned GitHub API makes this a non-issue at this point with respect to the type of workflow I prefer.
  • So this week, I did the work to migrate Emacspeak development to Emacspeak On GitHub.

3 Status Of Migration

  • With help from some of the kind folk at Google Code Hosting, I've successfully migrated the source code repository including all release tags to GitHub.
  • I am now checking in changes into GitHub; the SVN repository on Google Code Hosting is now frozen, and I do not plan to make any commits there.
  • I presently have no immediate plans to start using features of GitHub like the Issue Tracker; for now we will continue to use the Emacspeak mailing list which has served us well for 20 years.
  • I have also taken this opportunity to prune out legacy portions of the Emacspeak codebase by moving modules to obsolete at each level of the directory tree.
  • Since starting the Emacspeak Blog in late 2005, I have published a sequence of articles describing Emacspeak features and usage patterns; I felt that having these articles for local reference made a useful supplement to the emacspeak online documentation. Toward this end, I have downloaded all articles published so far and checked in both XML and HTML versions into sub-directory blogs.
  • Note that newer articles are also available as .org files under sub-directory announcements.

4 Next Steps

  • I still need to learn how to do software releases on GitHub.

Share And Enjoy!

Date: <2015-03-06 Fri>

Author: raman

Created: 2015-03-07 Sat 08:12

Emacs 25.0.50.1 (Org mode 8.2.10)

Validate

Tuesday, February 17, 2015

Enhanced Audio On The Emacspeak Desktop

Enhanced Audio On The Emacspeak Desktop

1 Enhanced Audio On The Emacspeak Desktop

I recently added a set of modules to Emacspeak to leverage some of the high-end audio functionality available on Linux machines on modern hardware. As an example, applying effects such as 3D spatialization, high-end reverb effects etc. once consumed CPU cycles to the extent where it was not possible to play with these in realtime. All these now take less than 1–5% of the CPU, and that when my laptop is running in power-save mode!

1.1 An Audio Workbench Using SoX

Sound Exchange (SoX), described as the Swiss army knife of sound processing, has been around since the time I was a graduate student. Today it provides a versatile set of tools for editing and manipulating both wave and mp3 files. Module sox.el implements a simple Audio Workbench for the Emacspeak desktop.

1.1.1 Pre-Requisites

SoX with all available auxillary packages for adding support for various filetypes such as mp3. The various Ladspa related packages for installing additional audio effect filters.

1.1.2 Usage Instructions For SoX.el

  • Launch the Audio Workbench via by executing M-x sox.
  • Use f to open a wave or mp3 file you wish to manipulate.
  • Add any of the supported effects using e.
  • Use upper-case E to add more than one effect.
  • Hit p to play the result, s to save the result to a new file.

At present sox.el supports a few effects such as trim for clipping files, reverb for adding a reverb etc., with more to come as I use it.

1.2 Adding High-End Reverb When Playing Media Streams

I use mplayer to play both local and network media streams. MPlayer can apply a wide range of filters to the audio stream; more interestingly it can also apply effects implemented as Ladspa plugins. Package tap-plugins implements a large number of high-quality Ladspa filters, including a versatile Reverb filter.

Once you have package tap-plugins and its dependencies installed, and a relatively new version of MPlayer (with support for Ladspa plugins), you can now apply various Reverb Presets to your media streams via Emacspeak MPlayer command emacspeak-m-player-apply-reverb-preset bound to P in Emacspeak MPlayer. Package tap-plugins defines a total of 42 Reverb Presets, experiment with these when wearing headsets. Once you have applied a Reverb Preset, you can edit its current settings via command emacspeak-m-player-edit-reverb bound to R in Emacspeak MPlayer. Alternatively, you can pick a default effect to use via Emacs' custom interface; see option emacspeak-m-player-reverb-filter.

1.3 Defining Convenient MPLayer Shortcuts

Finally, You can now bind shortcut keys for launching Emacspeak MPlayer from specific locations where you store media, e.g., you can have separate shortcuts for Music vs Audio Books – see command emacspeak-m-player-accelerator. This is best used by customizing Emacspeak option emacspeak-media-location-bindings — just use the Customize interface to specify pairs of shortcut keys and media locations.

Date: <2015-02-17 Tue>

Author: raman

Created: 2015-02-17 Tue 17:37

Emacs 25.0.50.1 (Org mode 8.2.10)

Validate

Internet Radio: Tune-In For Emacspeak

Internet Radio: Tune-In For Emacspeak

1 Internet Radio: Tune-In For Emacspeak

I just checked in the ability to browse, search and play radio-stations from TuneIn on the Emacspeak Audio Desktop.

1.1 Pre-Requisites

  1. xsltproc for xsl stylesheet processing.
  2. Linux mplayer for playing streams, preferably the latest version

2 Simple Usage Summary

  • M-x emacspeak-wizards-tune-in-radio-browse brings up the browse interface.
  • M-x emacspeak-wizards-tune-in-radio-search prompts for a query and brings up search results.
  • Both browse and search get the results back as an OPML feed, which gets displayed as a simple Web page within the Emacs Web Browser (EWW if using 24.4).
  • Items identified as (link) are themselves OPML feeds and can be opened via command emacspeak-feeds-opml-display.
  • The initial browse buffer is set up to use opml-display when you click on link items.
  • You can play (audio) links by invoking command emacspeak-webutils-play-media-at-point — this command is bound to _; in EWW.
  • You need to provide an interactive prefix argument to the above command to indicate that it is a playlist — so you actually press C-u ; on audio links.
  • Many of the audio links do not return a playlist – they instead return a link that is a pointer to a playlist. Newer versions of mplayer will throw a security error — you can tell mplayer to follow them by invoking the earlier command with two prefix args so C-u C-u ;.
  • All this works about 90% of the time. In some cases – depending on whether the server failed to generate the right

mimetype for the play URL etc, you may need to run

curl --silent <url> 

where <url> is the URL of the audio link, then pass that resulting URL to command emacspeak-m-player-url.

Share And Enjoy!

Date: <2015-02-17 Tue>

Author: T.V Raman

Created: 2015-02-17 Tue 16:41

Emacs 25.0.50.1 (Org mode 8.2.10)

Validate

Tuesday, January 06, 2015

3D: A Spatial Auditory Icon Theme Generated Using CSound

Using CSound To Generate Auditory Icons

1 Auditory Icon Theme: 3d

CSound is a sophisticated music sound synthesis system with a strong community of developers. I've played off and on with CSound ever since the early 90's and have always been intrigued by the possibility of algorithmically creating new auditory icon themes for Emacspeak using CSound.

Over the holidays last December, I finally got around to creating my first complete CSound-generated auditory icon theme – it is now checked into Emacspeak SVN as theme 3d. This theme is best experienced with headphones — many of the generated icons use spatial audio to good effect.

Wednesday, November 26, 2014

Announcing Emacspeak 41.0: NiceDog

Emacspeak 41.0—NiceDog—Unleashed!

1 Emacspeak-41.0 (NiceDog) Unleashed!

** For Immediate Release:

San Jose, Calif., (Nov 26, 2014) Emacspeak: Redefining Accessibility In The Era Of Web Computing –Zero cost of upgrades/downgrades makes priceless software affordable!

Emacspeak Inc (NASDOG: ESPK) --http://emacspeak.sf.net– announces the immediate world-wide availability of Emacspeak 41.0 (NiceDog) –a powerful audio desktop for leveraging today's evolving data, social and service-oriented Web cloud.

1.1 Investors Note:

With several prominent tweeters expanding coverage of #emacspeak, NASDOG: ESPK has now been consistently trading over the social net at levels close to that once attained by DogCom high-fliers—and as of November 2014 is trading at levels close to that achieved by once better known stocks in the tech sector.

1.2 What Is It?

Emacspeak is a fully functional audio desktop that provides complete eyes-free access to all major 32 and 64 bit operating environments. By seamlessly blending live access to all aspects of the Internet such as Web-surfing, blogging, social computing and electronic messaging into the audio desktop, Emacspeak enables speech access to local and remote information with a consistent and well-integrated user interface. A rich suite of task-oriented tools provides efficient speech-enabled access to the evolving service-oriented social Web cloud.

1.3 Major Enhancements:

  • Emacs EWW: Consume Web content efficiently. ��
  • emacspeak-url-templates: Smart Web access. ♅
  • emacspeak-websearch.el Find things fast. ♁
  • emacspeak-google.el: Improved Google integration. ⁈
  • Calibre integration for searching and viewing epub ��
  • Complete anything via company integration ��
  • Speech-enabled 2048 ��
  • Emacspeak At Twenty -Historical Overview ⛬
  • gmaps.el: Find places, read reviews, get there. ��
  • Feed Browser Consume feeds post Google-Reader. ␌
  • Freebase Search: Explore freebase knowledge base. ��
  • Emacs 24.4: Supports all new features in Emacs 24.4. ��
  • And a lot more than wil fit this margin. …

1.4 Establishing Liberty, Equality And Freedom:

Never a toy system, Emacspeak is voluntarily bundled with all major Linux distributions. Though designed to be modular, distributors have freely chosen to bundle the fully integrated system without any undue pressure—a documented success for the integrated innovation embodied by Emacspeak. As the system evolves, both upgrades and downgrades continue to be available at the same zero-cost to all users. The integrity of the Emacspeak codebase is ensured by the reliable and secure Linux platform used to develop and distribute the software.

Extensive studies have shown that thanks to these features, users consider Emacspeak to be absolutely priceless. Thanks to this wide-spread user demand, the present version remains priceless as ever—it is being made available at the same zero-cost as previous releases.

At the same time, Emacspeak continues to innovate in the area of eyes-free social interaction and carries forward the well-established Open Source tradition of introducing user interface features that eventually show up in luser environments.

On this theme, when once challenged by a proponent of a crash-prone but well-marketed mousetrap with the assertion "Emacs is a system from the 70's", the creator of Emacspeak evinced surprise at the unusual candor manifest in the assertion that it would take popular idiot-proven interfaces until the year 2070 to catch up to where the Emacspeak audio desktop is today. Industry experts welcomed this refreshing breath of Courage Certainty and Clarity (CCC) at a time when users are reeling from the Fear Uncertainty and Doubt (FUD) unleashed by complex software systems backed by even more convoluted press releases.

1.5 Independent Test Results:

Independent test results have proven that unlike some modern (and not so modern) software, Emacspeak can be safely uninstalled without adversely affecting the continued performance of the computer. These same tests also revealed that once uninstalled, the user stopped functioning altogether. Speaking with Aster Labrador, the creator of Emacspeak once pointed out that these results re-emphasize the user-centric design of Emacspeak; "It is the user –and not the computer– that stops functioning when Emacspeak is uninstalled!".

1.5.1 Note from Aster,Bubbles and Tilden:

UnDoctored Videos Inc. is looking for volunteers to star in a video demonstrating such complete user failure.

1.6 Obtaining Emacspeak:

Emacspeak can be downloaded from Google Code –see http://code.google.com/p/emacspeak/ You can visit Emacspeak on the WWW at http://emacspeak.sf.net. You can subscribe to the emacspeak mailing list emacspeak@cs.vassar.edu by sending mail to the list request address emacspeak-request@cs.vassar.edu. The Emacspeak Blog is a good source for news about recent enhancements and how to use them. The WowDog release is at http://emacspeak.googlecode.com/svn/wiki/downloads/emacspeak-41.0.tar.bz2. The latest development snapshot of Emacspeak is always available via Subversion from Google Code at http://emacspeak.googlecode.com/svn/trunk/

1.7 History:

Emacspeak 41.0 continues to improve upon the desire to provide not just equal, but superior access — technology when correctly implemented can significantly enhance the human ability. Emacspeak 40.0 goes back to Web basics by enabling efficient access to large amounts of readable Web content. Emacspeak 39.0 continues the Emacspeak tradition of increasing the breadth of user tasks that are covered without introducing unnecessary bloatware. Emacspeak 38.0 is the latest in a series of award-winning releases from Emacspeak Inc. Emacspeak 37.0 continues the tradition of delivering robust software as reflected by its code-name. Emacspeak 36.0 enhances the audio desktop with many new tools including full EPub support — hence the name EPubDog. Emacspeak 35.0 is all about teaching a new dog old tricks — and is aptly code-named HeadDog in honor of our new Press/Analyst contact. emacspeak-34.0 (AKA Bubbles) established a new beach-head with respect to rapid task completion in an eyes-free environment. Emacspeak-33.0 AKA StarDog brings unparalleled cloud access to the audio desktop. Emacspeak 32.0 AKA LuckyDog continues to innovate via open technologies for better access. Emacspeak 31.0 AKA TweetDog — adds tweeting to the Emacspeak desktop. Emacspeak 30.0 AKA SocialDog brings the Social Web to the audio desktop—you cant but be social if you speak! Emacspeak 29.0—AKAAbleDog—is a testament to the resilliance and innovation embodied by Open Source software—it would not exist without the thriving Emacs community that continues to ensure that Emacs remains one of the premier user environments despite perhaps also being one of the oldest. Emacspeak 28.0—AKA PuppyDog—exemplifies the rapid pace of development evinced by Open Source software. Emacspeak 27.0—AKA FastDog—is the latest in a sequence of upgrades that make previous releases obsolete and downgrades unnecessary. Emacspeak 26—AKA LeadDog—continues the tradition of introducing innovative access solutions that are unfettered by the constraints inherent in traditional adaptive technologies. Emacspeak 25 —AKA ActiveDog —re-activates open, unfettered access to online information. Emacspeak-Alive —AKA LiveDog —enlivens open, unfettered information access with a series of live updates that once again demonstrate the power and agility of open source software development. Emacspeak 23.0 – AKA Retriever—went the extra mile in fetching full access. Emacspeak 22.0 —AKA GuideDog —helps users navigate the Web more effectively than ever before. Emacspeak 21.0 —AKA PlayDog —continued the Emacspeak tradition of relying on enhanced productivity to liberate users. Emacspeak-20.0 —AKA LeapDog —continues the long established GNU/Emacs tradition of integrated innovation to create a pleasurable computing environment for eyes-free interaction. emacspeak-19.0 –AKA WorkDog– is designed to enhance user productivity at work and leisure. Emacspeak-18.0 –code named GoodDog– continued the Emacspeak tradition of enhancing user productivity and thereby reducing total cost of ownership. Emacspeak-17.0 –code named HappyDog– enhances user productivity by exploiting today's evolving WWW standards. Emacspeak-16.0 –code named CleverDog– the follow-up to SmartDog– continued the tradition of working better, faster, smarter. Emacspeak-15.0 –code named SmartDog–followed up on TopDog as the next in a continuing a series of award-winning audio desktop releases from Emacspeak Inc. Emacspeak-14.0 –code named TopDog–was the first release of this millennium. Emacspeak-13.0 –codenamed YellowLab– was the closing release of the 20th. century. Emacspeak-12.0 –code named GoldenDog– began leveraging the evolving semantic WWW to provide task-oriented speech access to Webformation. Emacspeak-11.0 –code named Aster– went the final step in making Linux a zero-cost Internet access solution for blind and visually impaired users. Emacspeak-10.0 –(AKA Emacspeak-2000) code named WonderDog– continued the tradition of award-winning software releases designed to make eyes-free computing a productive and pleasurable experience. Emacspeak-9.0 –(AKA Emacspeak 99) code named BlackLab– continued to innovate in the areas of speech interaction and interactive accessibility. Emacspeak-8.0 –(AKA Emacspeak-98++) code named BlackDog– was a major upgrade to the speech output extension to Emacs.

Emacspeak-95 (code named Illinois) was released as OpenSource on the Internet in May 1995 as the first complete speech interface to UNIX workstations. The subsequent release, Emacspeak-96 (code named Egypt) made available in May 1996 provided significant enhancements to the interface. Emacspeak-97 (Tennessee) went further in providing a true audio desktop. Emacspeak-98 integrated Internetworking into all aspects of the audio desktop to provide the first fully interactive speech-enabled WebTop.

About Emacspeak:


Originally based at Cornell (NY) http://www.cs.cornell.edu/home/raman –home to Auditory User Interfaces (AUI) on the WWW– Emacspeak is now maintained on GoogleCode --http://code.google.com/p/emacspeak —and Sourceforge —http://emacspeak.sf.net. The system is mirrored world-wide by an international network of software archives and bundled voluntarily with all major Linux distributions. On Monday, April 12, 1999, Emacspeak became part of the Smithsonian's Permanent Research Collection on Information Technology at the Smithsonian's National Museum of American History.

The Emacspeak mailing list is archived at Vassar –the home of the Emacspeak mailing list– thanks to Greg Priest-Dorman, and provides a valuable knowledge base for new users.

1.8 Press/Analyst Contact: Tilden Labrador

Going forward, Tilden acknowledges his exclusive monopoly on setting the direction of the Emacspeak Audio Desktop, and promises to exercise this freedom to innovate and her resulting power responsibly (as before) in the interest of all dogs.

**About This Release:


Windows-Free (WF) is a favorite battle-cry of The League Against Forced Fenestration (LAFF). –see http://www.usdoj.gov/atr/cases/f3800/msjudgex.htm for details on the ill-effects of Forced Fenestration.

CopyWrite )C( Aster and Hubbell Labrador. All Writes Reserved. HeadDog (DM), LiveDog (DM), GoldenDog (DM), BlackDog (DM) etc., are Registered Dogmarks of Aster, Hubbell and Tilden Labrador. All other dogs belong to their respective owners.

Monday, September 15, 2014

Emacspeak At Twenty: Looking Back, Looking Forward

Emacspeak At Twenty: Looking Back, Looking Forward

1 Introduction

One afternoon in the third week of September 1994, I started writing myself a small Emacs extension using Lisp Advice to make Emacs speak to me so I could use a Linux laptop. As Emacspeak turns twenty, this article is both a quick look back over the twenty years of lessons learned, as well as a glimpse into what might be possible as we evolve to a world of connected, ubiquitous computing. This article draws on Learning To Program In 10 Years by Peter Norvig for some of its inspiration.

2 Using UNIX With Speech Output — 1994

As a graduate student at Cornell, I accessed my Unix workstation (SunOS) from an Intel 486 PC running IBM Screen-Reader. There was no means of directly using a UNIX box at the time; after graduating, I continued doing the same for about six months at Digital Research in Cambridge — the only difference being that my desktop workstation was now a DEC-Alpha. Throughout this time, Emacs was my environment of choice for everything from software development and Internet access to writing documents.

In fall of 1994, I wanted to start using a laptop running Linux; a colleague (Dave Wecker) was retiring his 386mhz laptop that already had Linux on it and I decided to inherit it. But there was only one problem — until then I had always accessed a UNIX machine from a secondary PC running a screen-reader — something that would clearly make no sense with a laptop!

Another colleague, Win Treese, had pointed out the interesting possibilities presented by package advice in Emacs 19.23 — a few weeks earlier, he had sent around a small snippet of code that magically modified Emacs' version-control primitive to first create an RCS directory if none existed before adding a file to version control. When I speculated about using the Linux laptop, Dave remarked — you live in Emacs anyway — why dont you just make it talk!

Connecting the dots, I decided to write myself a tool that augmented Emacs' default behavior to speak — within about 4 hours, version 0.01 of Emacspeak was up and running.

3 Key Enabler — Emacs And Lisp Advice

It took me a couple of weeks to fully recognize the potential of what I had built with Emacs Lisp Advice. Until then, I had used screen-readers to listen to the contents of the visual display — but Lisp Advice let me do a lot more — it enabled Emacspeak to generate highly context-specific spoken feedback, augmented by a set of auditory icons. I later formalized this design under the name speech-enabled applications. For a detailed overview of the architecture of Emacspeak, see the chapter on Emacspeak in the book Beautiful Code from O'Reilly.

4 Key Component — Text To Speech (TTS)

Emacspeak is a speech-subsystem for Emacs; it depends on an external Text-To-Speech (TTS) engine to produce speech. In 1994, Digital Equipment released what would turn out to be the last in the line of hardware DECTalk synthesizers, the DECTalk Express. This was essentially an Intel 386with 1mb of flash memory that ran a version of the DECTalk TTS software — to date, it still remains my favorite Text-To-Speech engine. At the time, I also had a software version of the same engine running on my DEC-Alpha workstation; the desire to use either a software or hardware solution to produce speech output defined the Emacspeak speech-server architecture.

I went to IBM Research in 1999; this coincided with IBM releasing a version of the Eloquennce TTS engine on Linux under the name ViaVoice Outloud. My colleague Jeffrey Sorenson implemented an early version of the Emacspeak speech-server for this engine using the OSS API; I later updated it to use the ALSA library while on a flight back to SFO from Boston in 2001. That is still the TTS engine that is speaking as I type this article on my laptop.

20 years on, TTS continues to be the weakest link on Linux; the best available solution in terms of quality continues to be the Linux port of Eloquence TTS available from Voxin in Europe for a small price. Looking back across 20 years, the state of TTS on Linux in particular and across all platforms in general continues to be a disappointment; most of today's newer TTS engines are geared toward mainstream use-cases where naturalness of the voice tends to supersede intelligibility at higher speech-rates. Ironically, modern TTS engines also give applications far less control over the generated output — as a case in point, I implemented Audio System For Technical Readings (AsTeR) in 1994 using the DECTalk; 20 years later, we implemented MathML support in ChromeVox using Google TTS. In 2013, it turned out to be difficult or impossible to implement the type of audio renderings that were possible with the admittedly less-natural sounding DECTalk!

5 Emacspeak And Software Development

Version 0.01 of Emacspeak was written using IBM Screen-Reader on a PC with a terminal emulator accessing a UNIX workstation. But in about 2 weeks, Emacspeak was already a better environment for developing Emacspeak in particular and software development in general. Here are a few highlights in 1994 that made Emacspeak a good software development environment, present-day users of Emacspeak will see that that was just scratching the surface.

  • Audio formatting using voice-lock to provide aural syntax highlighting.
  • Succinct auditory icons to provide efficient feedback.
  • Emacs' ability to navigate code structurally —

as opposed to moving around by plain-text units such as characters, lines and words. S-Expressions are a major win!

  • Emacs' ability to specialize behavior based on major and minor modes.
  • Ability to browse program code using tags, and getting fluent spoken feedback.
  • Completion everywhere.
  • Everything is searchable — this is a huge win when you cannot see the screen.
  • Interactive spell-checking using ISpell with continuous spoken feedback augmented by aural highlights.
  • Running code compilation and being able to jump to errors with spoken feedback.
  • Ability to move through diff chunks when working with source code and source control systems; refined diffs as provided by the ediff package when speech-enabled is a major productivity win.
  • Ability to easily move between email, document authoring and programming — though this may appear trivial, it continues to be one of Emacs' biggest wins.

Long-term Emacs users will recognize all of the above as being among the reasons why they do most things inside Emacs — there is little that is Emacspeak specific in the above list — except that Emacspeak was able to provide fluent, well-integrated contextual feedback for all of these tasks. And that was a game-changer given what I had had before Emacspeak. As a case in point, I did not dare program in Python before I speech-enabled Emacs' Python-Mode; the fact that white space is significant in Python made it difficult to program using a plain screen-reader that was unaware of the semantics of the underlying content being accessed.

5.1 Programming Defensively

As an aside, note that all of Emacspeak has been developed over the last 20 years with Emacspeak being the only adaptive technology on my system. This has led to some interesting design consequences, primary among them being a strong education in programming defensively. Here are some other key features of the Emacspeak code-base:

  1. The code-base is extremely bushy rather than deeply hierarchical — this means that when a module breaks, it does not affect the rest of the system.
  2. Separation of concerns with respect to the various layers, a tightly knit core speech library interfaces with any one of many speech servers running as an external process.
  3. Audio formatting is abstracted by using the formalism defined in Aural CSS.
  4. Emacspeak integrates with Emacs' user interface conventions by taking over a single prefix key C-e with all Emacspeak commands accessed through that single keymap. This helps embedding Emacspeak functionality into a large variety of third party modules without any loss of functionality.

6 Emacspeak And Authoring Documents

In 1994, my preferred environment for authoring all documents was LaTeX using the Auctex package. Later I started writing either LaTeX or HTML using the appropriate support modes; today I use org-mode to do most of my content authoring. Personally, I have never been a fan of What You See Is What You Get (WYSIWYG) authoring tools — in my experience that places an undue burden on the author by drawing attention away from the content to focus on the final appearance. An added benefit of creating content in Emacs in the form of light-weight markup is that the content is long-lived — I can still usefully process and re-use things I have written 25 years ago.

Emacs, with Emacspeak providing audio formatting and context-specific feedback remains my environment of choice for writing all forms of content ranging from simple email messages to polished documents for print publishing. And it is worth repeating that I never need to focus on what the content is going to look like — that job is best left to the computer.

As an example of producing high-fidelity visual content, see this write-up on Polyhedral Geometry that I published in 2000; all of the content, including the drawings were created by me using Emacs.

7 Emacspeak And The Early Days Of The Web

Right around the time that I was writing version 0.01 of emacspeak, a far more significant software movement was under way — the World Wide Web was moving from the realms of academia to the mainstream world with the launch of NCSA Mosaic — and in late 1994 by the first commercial Web browser in Netscape Navigator. Emacs had always enabled integrated access to FTP archives via package ange-ftp; in late 1993, William Perry released Emacs-W3, a Web browser for Emacs written entirely in Emacs Lisp. W3 was one of the first large packages to be speech-enabled by Emacspeak — later it was the browser on which I implemented the first draft of the Aural CSS specification. Emacs-W3 enabled many early innovations in the context of providing non-visual access to Web content, including audio formatting and structured content navigation; in summer of 1995, Dave Raggett and I outlined a few extensions to HTML Forms, including the label element as a means of associating metadata with interactive form controls in HTML, and many of these ideas were prototyped in Emacs-W3 at the time. Over the years, Emacs-W3 fell behind the times — especially as the Web moved away from cleanly structured HTML to a massive soup of unmatched tags. This made parsing and error-correcting badly-formed HTML markup expensive to do in Emacs-Lisp — and performance suffered. To add to this, mainstream users moved away because Emacs' rendering engine at the time was not rich enough to provide the type of visual renderings that users had come to expect. The advent of DHTML, and JavaScript based Web Applications finally killed off Emacs-W3 as far as most Emacs users were concerned.

But Emacs-W3 went through a revival on the emacspeak audio desktop in late 1999 with the arrival of XSLT, and Daniel Veillard's excellent implementation via the libxml2 and libxslt packages. With these in hand, Emacspeak was able to hand-off the bulk of HTML error correction to the xsltproc tool. The lack of visual fidelity didn't matter much for an eyes-free environment; so Emacs-W3 continued to be a useful tool for consuming large amounts of Web content that did not require JavaScript support.

During the last 24 months, libxml2 has been built into Emacs; this means that you can now parse arbitrary HTML as found in the wild without incurring a performance hit. This functionality was leveraged first by package shr (Simple HTML Renderer) within the gnus package for rendering HTML email. Later, the author of gnus and shr created a new light-weight HTML viewer called eww that is now part of Emacs 24. With improved support for variable pitch fonts and image embedding, Emacs is once again able to provide visual renderings for a large proportion of text-heavy Web content where it becomes useful for mainstream Emacs users to view at least some Web content within Emacs; during the last year, I have added support within emacspeak to extend package eww with support for DOM filtering and quick content navigation.

8 Audio Formatting — Generalizing Aural CSS

A key idea in Audio System For Technical Readings (AsTeR) was the use of various voice properties in combination with non-speech auditory icons to create rich aural renderings. When I implemented Emacspeak, I brought over the notion of audio formatting to all buffers in Emacs by creating a voice-lock module that paralleled Emacs' font-lock module. The visual medium is far richer in terms of available fonts and colors as compared to voice parameters available on TTS engines — consequently, it did not make sense to directly map Emacs' face properties to voice parameters. To aid in projecting visual formatting onto auditory space, I created property personality analogous to Emacs' face property that could be applied to content displayed in Emacs; module voice-lock applied that property appropriately, and the Emacspeak core handled the details of mapping personality values to the underlying TTS engine.

The values used in property personality were abstract, i.e., they were independent of any given speech engine. Later in the fall of 1995, I re-expressed these set of abstract voice properties in terms of Aural CSS; the work was published as a first draft toward the end of 1995, and implemented in Emacs-W3 in early 1996. Aural CSS was an appendix in the CSS-1.0 specification; later, it graduated to being its own module within CSS-2.0.

Later in 1996, all of Emacs' voice-lock functionality was re-implemented in terms of Aural CSS; the implementation has stood the test of time in that as I added support for more TTS engines, I was able to implement engine-specific mappings of Aural-CSS values. This meant that the rest of Emacspeak could define various types of voices for use in specific contexts without having to worry about individual TTS engines. Conceptually, property personality can be thought of as holding an aural display list — various parts of the system can annotate pieces of text with relevant properties that finally get rendered in the aggregate. This model also works well with the notion of Emacs overlays where a moving overlay is used to temporarily highlight text that has other context-specific properties applied to it.

Audio formatting as implemented in Emacspeak is extremely effective when working with all types of content ranging from richly structured mark-up documents (LaTeX, org-mode) and formatted Web pages to program source code. Perceptually, switching to audio formatted output feels like switching from a black-and-white monitor to a rich color display. Today, Emacspeak's audio formatted output is the only way I can correctly write else if vs elsif in various programming languages!

9 Conversational Gestures For The Audio Desktop

By 1996, Emacspeak was the only piece of adaptive technology I used; in fall of 1995, I had moved to Adobe Systems from DEC Research to focus on enhancing the Portable Document Format (PDF) to make PDF content repurposable. Between 1996 and 1998, I was primarily focused on electronic document formats — I took this opportunity to step back and evaluate what I had built as an auditory interface within Emacspeak. This retrospect proved extremely useful in gaining a sense of perspective and led to formalizing the high-level concept of Conversational Gestures and structured browsing/searching as a means of thinking about user interfaces.

By now, Emacspeak was a complete environment — I formalized what it provided under the moniker Complete Audio Desktop. The fully integrated user experience allowed me to move forward with respect to defining interaction models that were highly optimized to eyes-free interaction — as an example, see how Emacspeak interfaces with modes like dired (Directory Editor) for browsing and manipulating the filesystem, or proced (Process Editor) for browsing and manipulating running processes. Emacs' integration with ispell for spell checking, as well as its various completion facilities ranging from minibuffer completion to other forms of dynamic completion while typing text provided more opportunities for creating innovative forms of eyes-free interaction. With respect to what had gone before (and is still par for the course as far as traditional screen-readers are concerned), these types of highly dynamic interfaces present a challenge. For example, consider handling a completion interface using a screen-reader that is speaking the visual display. There is a significant challenge in deciding what to speak e.g., when presented with a list of completions, the currently typed text, and the default completion, which of these should you speak, and in what order? The problem gets harder when you consider that the underlying semantics of these items is generally not available from examining the visual presentation in a consistent manner. By having direct access to the underlying information being presented, Emacspeak had a leg up with respect to addressing the higher-level question — when you do have access to this information, how do you present it effectively in an eyes-free environment? For this and many other cases of dynamic interaction, a combination of audio formatting, auditory icons, and the ability to synthesize succinct messages from a combination of information items — rather than having to forcibly speak each item as it is rendered visually provided for highly efficient eyes-free interaction.

This was also when I stepped back to build out Emacspeak's table browsing facilities — see the online Emacspeak documentation for details on Emacspeak's table browsing functionality which continues to remain one of the richest collection of end-user affordances for working with two-dimensional data.

9.1 Speech-Enabling Interactive Games

So in 1997, I went the next step in asking — given access to the underlying infromation, is it possible to build effective eyes-free interaction to highly interactive tasks? I picked Tetris as a means of exploring this space, the result was an Emacspeak extension to speech-enable module tetris.el. The details of what was learned were published as a paper in Assets 98, and expanded as a chapter on Conversational Gestures in my book on Auditory Interfaces; that book was in a sense a culmination of stepping back and gaining a sense of perspective of what I had build during this period. The work on Conversational Gestures also helped in formalizing the abstract user interface layer that formed part of the XForms work at the W3C.

Speech-enabling games for effective eyes-free interaction has proven highly educational. Interactive games are typically built to challenge the user, and if the eyes-free interface is inefficient, you just wont play the game — contrast this with a task that you must perform, where you're likely to make do with a sub-optimal interface. Over the years, Emacspeak has come to include eyes-free interfaces to several games including Tetris, Sudoku, and of late the popular 2048 game. Each of these have in turn contributed to enhancing the interaction model in Emacspeak, and those innovations typically make their way to the rest of the environment.

10 Accessing Media Streams

Streaming real-time audio on the Internet became a reality with the advent of RealAudio in 1995; soon there were a large number of media streams available on the Internet ranging from music streams to live radio stations. But there was an interesting twist — for the most part, all of these media streams expected one to look at the screen, even though the primary content was purely audio (streaming video hadn't arrived yet!). Starting in 1996, Emacspeak started including a variety of eyes-free front-ends for accessing media streams. Initially, this was achieved by building a wrapper around trplayer — a headless version of RealPlayer; later I built Emacspeak module emacspeak-m-player for interfacing with package mplayer. A key aspect of streaming media integration in emacspeak is that one can launch and control streams without ever switching away from one's primary task; thus, you can continue to type email or edit code while seamlessly launching and controlling media streams. Over the years, Emacspeak has come to integrate with Emacs packages like emms as well as providing wrappers for mplayer and alsaplayer — collectively, these let you efficiently launch all types of media streams, including streaming video, without having to explicitly switch context.

In the mid-90's, Emacspeak started including a directory of media links to some of the more popular radio stations — primarily as a means of helping users getting started — Emacs' ability to rapidly complete directory and file-names turned out to be the most effective means of quickly launching everything from streaming radio stations to audio books. And even better — as the Emacs community develops better and smarter ways of navigating the filesystem using completions, e.g., package ido, these types of actions become even more efficient!

11 EBooks— Ubiquitous Access To Books

AsTeR — was motivated by the increasing availability of technical material as online electronic documents. While AsTeR processed the TeX family of markup languages, more general ebooks came in a wide range of formats, ranging from plain text generated from various underlying file formats to structured EBooks, with Project Gutenberg leading the way. During the mid-90's, I had access to a wide range of electronic materials from sources such as O'Reilly Publishing and various electronic journals — The Perl Journal (TPJ) is one that I still remember fondly.

Emacspeak provided fairly light-weight but efficient access to all of the electronic books I had on my local disk — Emacs' strengths with respect to browsing textual documents meant that I needed to build little that was specific to Emacspeak. The late 90's saw the arival of Daisy as an XML-based format for accessible electronic books. The last decade has seen the rapid convergence to epub as a distribution format of choice for electronic books. Emacspeak provides interaction modes that make organizing, searching and reading these materials on the Emacspeak Audio Desktop a pleasant experience. Emacspeak also provides an OCR-Mode — this enables one to call out to an external OCR program and read the content efficiently.

The somewhat informal process used by publishers like O'Reilly to make technical material available to users with print impairments was later formalized by BookShare — today, qualified users can obtain a large number of books and periodicals initially as Daisy-3 and increasingly as EPub. BookShare provides a RESTful API for searching and downloading books; Emacspeak module emacspeak-bookshare implements this API to create a client for browsing the BookShare library, downloading and organizing books locally, and an integrated ebook reading mode to round off the experience.

A useful complement to this suite of tools is the Calibre package for organizing ones ebook collection; Emacspeak now implements an EPub Interaction mode that leverages Calibre (actually sqlite3) to search and browse books, along with an integrated EPub mode for reading books.

12 Leveraging Computational Tools — From SQL And R To IPython Notebooks

The ability to invoke external processes and interface with them via a simple read-eval-loop (REPL) is perhaps one of Emacs' strongest extension points. This means that a wide variety of computational tools become immediately available for embedding within the Emacs environment — a facility that has been widely exploited by the Emacs community. Over the years, Emacspeak has leveraged many of these facilities to provide a well-integrated auditory interface.

Starting from a tight code, eval, test form of iterative programming as encouraged by Lisp. Applied to languages like Python and Ruby to explorative computational tools such as R for data analysis and SQL for database interaction, the Emacspeak Audio Desktop has come to encompass a collection of rich computational tools that provide an efficient eyes-free experience.

In this context, module ein — Emacs IPython Notebooks — provides another excellent example of an Emacs tool that helps interface seamlessly with others in the technical domain. IPython Notebooks provide an easy means of reaching a large audience when publishing technical material with interactive computational content; module ein brings the power and convenience of Emacs ' editting facilities when developing the content. Speech-enabling package ein is a major win since editting program source code in an eyes-free environment is far smoother in Emacs than in a browser-based editor.

13 Social Web — EMail, Instant Messaging, Blogging And Tweeting Using Open Protocols

The ability to process large amounts of email and electronic news has always been a feature of Emacs. I started using package vm for email in 1990, along with gnus for Usenet access many years before developing Emacspeak. So these were the first major packages that Emacspeak speech-enabled. Being able to access the underlying data structures used to visually render email messages and Usenet articles enabled Emacspeak to produce rich, succinct auditory output — this vastly increased my ability to consume and organize large amounts of information. Toward the turn of the century, instant messaging arrived in the mainstream — package tnt provided an Emacs implementation of a chat client that could communicate with users on the then popular AOL Instant Messenger platform. At the time, I worked at IBM Research, and inspired by package tnt, I created an Emacs client called ChatterBox using the Lotus Sametime API — this enabled me to communicate with colleagues at work from the comfort of Emacs. Packages like vm, gnus, tnt and ChatterBox provide an interesting example of how availability of a clean underlying API to a specific service or content stream can encourage the creation of efficient (and different) user interfaces. The touchstone of such successful implementations is a simple test — can the user of a specific interface tell if the person whom he is communicating with is also using the same interface? In each of the examples enumerated above, a user at one end of the communication chain cannot tell, and in fact shouldn't be able to tell what client the user at the other end is using. Contrast this with closed services that have an inherent lock-in model e.g., proprietary word processors that use undocumented serialization formats — for a fun read, see this write-up on Universe Of Fancy Colored Paper.

Today, my personal choice for instant messaging is the open Jabber platform. I connect to Jabber via Emacs package emacs-jabber and with Emacspeak providing a light-weight wrapper for generating the eyes-free interface, I can communicate seamlessly with colleagues and friends around the world.

As the Web evolved to encompass ever-increasing swathes of communication functionality that had already been available on the Internet, we saw the world move from Usenet groups to Blogs — I remember initially dismissing the blogging phenomenon as just a re-invention of Usenet in the early days. However, mainstream users flocked to Blogging, and I later realized that blogging as a publishing platform brought along interesting features that made communicating and publishing information much easier. In 2005, I joined Google; during the winter holidays that year, I implemented a light-weight client for Blogger that became the start of Emacs package g-client — this package provides Emacs wrappers for Google services that provide a RESTful API.

14 The RESTful Web — Web Wizards And URL Templates For Faster Access

Today, the Web, based on URLs and HTTP-style protocols is widely recognized as a platform in its own right. This platform emerged over time — to me, Web APIs arrived in the late 90's when I observed the following with respect to my own behavior on many popular sites:

  1. I opened a Web page that took a while to load (remember, I was still using Emacs-W3),
  2. I then searched through the page to find a form-field that I filled out, e.g., start and end destinations on Yahoo Maps,
  3. I hit submit, and once again waited for a heavy-weight HTML page to load,
  4. And finally, I hunted through the rendered content to find what I was looking for.

This pattern repeated across a wide-range of interactive Web sites ranging from AltaVista for search (this was pre-Google), Yahoo Maps for directions, and Amazon for product searches to name but a few. So I decided to automate away the pain by creating Emacspeak module emacspeak-websearch that did the following:

  1. Prompt via the minibuffer for the requisite fields,
  2. Consed up an HTTP GET URL,
  3. Retrieved this URL,
  4. And filtered out the specific portion of the HTML DOM that held the generated response.

Notice that the above implementation hard-wires the CGI parameter names used by a given Web application into the code implemented in module emacspeak-websearch. REST as a design pattern had not yet been recognized, leave alone formalized, and module emacspeak-websearch was initially decryed as being fragile.

However, over time, the CGI parameter names remained fixed — the only things that have required updating in the Emacspeak code-base are the content filtering rules that extract the response — for popular services, this has averaged about one to two times a year.

I later codified these filtering rules in terms of XPath, and also integrated XSLT-based pre-processing of incoming HTML content before it got handed off to Emacs-W3 — and yes, Emacs/Advice once again came in handy with respect to injecting XSLT pre-processing into Emacs-W3!

Later, in early 2000, I created companion module emacspeak-url-templates — partially inspired by Emacs' webjump module. URL templates in Emacspeak leveraged the recognized REST interaction pattern to provide a large collection of Web widgets that could be quickly invoked to provide rapid access to the right pieces of information on the Web.

The final icing on the cake was the arrival of RSS and Atom feeds and the consequent deep-linking into content-rich sites — this meant that Emacspeak could provide audio renderings of useful content without having to deal with complex visual navigation! While Google Reader existed, Emacspeak provided a light-weight greader client for managing ones feed subscriptions; with the demise of Google Reader, I implemented module emacspeak-feeds for organizing feeds on the Emacspeak desktop. A companion package emacspeak-webspace implements additional goodies including a continuously updating ticker of headlines taken from the user's collection of subscribed feeds.

15 Mashing It Up — Leveraging Evolving Web APIs

The next step in this evolution came with the arrival of richer Web APIs — especially ones that defined a clean client/server separation. In this respect, the world of Web APIs is a somewhat mixed bag in that many Web sites equate a Web API with a JS-based API that can be exclusively invoked from within a Web-Browser run-time. The issue with that type of API binding is that the only runtime that is supported is a full-blown Web browser; but the arrival of native mobile apps has actually proven a net positive in encouraging sites to create a cleaner separation. Emacspeak has leveraged these APIs to create Emacspeak front-ends to many useful services, here are a few:

  1. Minibuffer completion for Google Search using Google Suggest to provide completions.
  2. Librivox for browsing and playing free audio books.
  3. NPR for browsing and playing NPR archived programs.
  4. BBC for playing a wide variety of streaming content available from the BBC.
  5. A Google Maps front-end that provides instantaneous access to directions and Places search.
  6. Access to Twitter via package twittering-mode.

And a lot more than will fit this margin! This is an example of generalizing the concept of a mashup as seen on the Web with respect to creating hybrid applications by bringing together a collection of different Web APIs. Another way to think of such separation is to view an application as a head and a body — where the head is a specific user interface, with the body implementing the application logic. A cleanly defined separation between the head and body allows one to attach different user interfaces i.e., heads to the given body without any loss of functionality, or the need to re-implement the entire application. Modern platforms like Android enable such separation via an Intent mechanism. The Web platform as originally defined around URLs is actually well-suited to this type of separation — though the full potential of this design pattern remains to be fully realized given today's tight association of the Web to the Web Browser.

16 Conclusion

In 1996, I wrote an article entitled User Interface — A Means To An End pointing out that the size and shape of computers were determined by the keyboard and display. This is even more true in today's world of tablets, phablets and large-sized phones — with the only difference that the keyboard has been replaced by a touch screen. The next generation in the evolution of personal devices is that they will become truly personal by being wearables — this once again forces a separation of the user interface peripherals from the underlying compute engine. Imagine a variety of wearables that collectively connect to ones cell phone, which itself connects to the cloud for all its computational and information needs. Such an environment is rich in possibilities for creating a wide variety of user experiences to a single underlying body of information; Eyes-Free interfaces as pioneered by systems like Emacspeak will come to play an increasingly vital role alongside visual interaction when this comes to pass.

–T.V. Raman, San Jose, CA, September 12, 2014

17 References