Monday, September 15, 2014

Emacspeak At Twenty: Looking Back, Looking Forward

Emacspeak At Twenty: Looking Back, Looking Forward

1 Introduction

One afternoon in the third week of September 1994, I started writing myself a small Emacs extension using Lisp Advice to make Emacs speak to me so I could use a Linux laptop. As Emacspeak turns twenty, this article is both a quick look back over the twenty years of lessons learned, as well as a glimpse into what might be possible as we evolve to a world of connected, ubiquitous computing. This article draws on Learning To Program In 10 Years by Peter Norvig for some of its inspiration.

2 Using UNIX With Speech Output — 1994

As a graduate student at Cornell, I accessed my Unix workstation (SunOS) from an Intel 486 PC running IBM Screen-Reader. There was no means of directly using a UNIX box at the time; after graduating, I continued doing the same for about six months at Digital Research in Cambridge — the only difference being that my desktop workstation was now a DEC-Alpha. Throughout this time, Emacs was my environment of choice for everything from software development and Internet access to writing documents.

In fall of 1994, I wanted to start using a laptop running Linux; a colleague (Dave Wecker) was retiring his 386mhz laptop that already had Linux on it and I decided to inherit it. But there was only one problem — until then I had always accessed a UNIX machine from a secondary PC running a screen-reader — something that would clearly make no sense with a laptop!

Another colleague, Win Treese, had pointed out the interesting possibilities presented by package advice in Emacs 19.23 — a few weeks earlier, he had sent around a small snippet of code that magically modified Emacs' version-control primitive to first create an RCS directory if none existed before adding a file to version control. When I speculated about using the Linux laptop, Dave remarked — you live in Emacs anyway — why dont you just make it talk!

Connecting the dots, I decided to write myself a tool that augmented Emacs' default behavior to speak — within about 4 hours, version 0.01 of Emacspeak was up and running.

3 Key Enabler — Emacs And Lisp Advice

It took me a couple of weeks to fully recognize the potential of what I had built with Emacs Lisp Advice. Until then, I had used screen-readers to listen to the contents of the visual display — but Lisp Advice let me do a lot more — it enabled Emacspeak to generate highly context-specific spoken feedback, augmented by a set of auditory icons. I later formalized this design under the name speech-enabled applications. For a detailed overview of the architecture of Emacspeak, see the chapter on Emacspeak in the book Beautiful Code from O'Reilly.

4 Key Component — Text To Speech (TTS)

Emacspeak is a speech-subsystem for Emacs; it depends on an external Text-To-Speech (TTS) engine to produce speech. In 1994, Digital Equipment released what would turn out to be the last in the line of hardware DECTalk synthesizers, the DECTalk Express. This was essentially an Intel 386with 1mb of flash memory that ran a version of the DECTalk TTS software — to date, it still remains my favorite Text-To-Speech engine. At the time, I also had a software version of the same engine running on my DEC-Alpha workstation; the desire to use either a software or hardware solution to produce speech output defined the Emacspeak speech-server architecture.

I went to IBM Research in 1999; this coincided with IBM releasing a version of the Eloquennce TTS engine on Linux under the name ViaVoice Outloud. My colleague Jeffrey Sorenson implemented an early version of the Emacspeak speech-server for this engine using the OSS API; I later updated it to use the ALSA library while on a flight back to SFO from Boston in 2001. That is still the TTS engine that is speaking as I type this article on my laptop.

20 years on, TTS continues to be the weakest link on Linux; the best available solution in terms of quality continues to be the Linux port of Eloquence TTS available from Voxin in Europe for a small price. Looking back across 20 years, the state of TTS on Linux in particular and across all platforms in general continues to be a disappointment; most of today's newer TTS engines are geared toward mainstream use-cases where naturalness of the voice tends to supersede intelligibility at higher speech-rates. Ironically, modern TTS engines also give applications far less control over the generated output — as a case in point, I implemented Audio System For Technical Readings (AsTeR) in 1994 using the DECTalk; 20 years later, we implemented MathML support in ChromeVox using Google TTS. In 2013, it turned out to be difficult or impossible to implement the type of audio renderings that were possible with the admittedly less-natural sounding DECTalk!

5 Emacspeak And Software Development

Version 0.01 of Emacspeak was written using IBM Screen-Reader on a PC with a terminal emulator accessing a UNIX workstation. But in about 2 weeks, Emacspeak was already a better environment for developing Emacspeak in particular and software development in general. Here are a few highlights in 1994 that made Emacspeak a good software development environment, present-day users of Emacspeak will see that that was just scratching the surface.

  • Audio formatting using voice-lock to provide aural syntax highlighting.
  • Succinct auditory icons to provide efficient feedback.
  • Emacs' ability to navigate code structurally —

as opposed to moving around by plain-text units such as characters, lines and words. S-Expressions are a major win!

  • Emacs' ability to specialize behavior based on major and minor modes.
  • Ability to browse program code using tags, and getting fluent spoken feedback.
  • Completion everywhere.
  • Everything is searchable — this is a huge win when you cannot see the screen.
  • Interactive spell-checking using ISpell with continuous spoken feedback augmented by aural highlights.
  • Running code compilation and being able to jump to errors with spoken feedback.
  • Ability to move through diff chunks when working with source code and source control systems; refined diffs as provided by the ediff package when speech-enabled is a major productivity win.
  • Ability to easily move between email, document authoring and programming — though this may appear trivial, it continues to be one of Emacs' biggest wins.

Long-term Emacs users will recognize all of the above as being among the reasons why they do most things inside Emacs — there is little that is Emacspeak specific in the above list — except that Emacspeak was able to provide fluent, well-integrated contextual feedback for all of these tasks. And that was a game-changer given what I had had before Emacspeak. As a case in point, I did not dare program in Python before I speech-enabled Emacs' Python-Mode; the fact that white space is significant in Python made it difficult to program using a plain screen-reader that was unaware of the semantics of the underlying content being accessed.

5.1 Programming Defensively

As an aside, note that all of Emacspeak has been developed over the last 20 years with Emacspeak being the only adaptive technology on my system. This has led to some interesting design consequences, primary among them being a strong education in programming defensively. Here are some other key features of the Emacspeak code-base:

  1. The code-base is extremely bushy rather than deeply hierarchical — this means that when a module breaks, it does not affect the rest of the system.
  2. Separation of concerns with respect to the various layers, a tightly knit core speech library interfaces with any one of many speech servers running as an external process.
  3. Audio formatting is abstracted by using the formalism defined in Aural CSS.
  4. Emacspeak integrates with Emacs' user interface conventions by taking over a single prefix key C-e with all Emacspeak commands accessed through that single keymap. This helps embedding Emacspeak functionality into a large variety of third party modules without any loss of functionality.

6 Emacspeak And Authoring Documents

In 1994, my preferred environment for authoring all documents was LaTeX using the Auctex package. Later I started writing either LaTeX or HTML using the appropriate support modes; today I use org-mode to do most of my content authoring. Personally, I have never been a fan of What You See Is What You Get (WYSIWYG) authoring tools — in my experience that places an undue burden on the author by drawing attention away from the content to focus on the final appearance. An added benefit of creating content in Emacs in the form of light-weight markup is that the content is long-lived — I can still usefully process and re-use things I have written 25 years ago.

Emacs, with Emacspeak providing audio formatting and context-specific feedback remains my environment of choice for writing all forms of content ranging from simple email messages to polished documents for print publishing. And it is worth repeating that I never need to focus on what the content is going to look like — that job is best left to the computer.

As an example of producing high-fidelity visual content, see this write-up on Polyhedral Geometry that I published in 2000; all of the content, including the drawings were created by me using Emacs.

7 Emacspeak And The Early Days Of The Web

Right around the time that I was writing version 0.01 of emacspeak, a far more significant software movement was under way — the World Wide Web was moving from the realms of academia to the mainstream world with the launch of NCSA Mosaic — and in late 1994 by the first commercial Web browser in Netscape Navigator. Emacs had always enabled integrated access to FTP archives via package ange-ftp; in late 1993, William Perry released Emacs-W3, a Web browser for Emacs written entirely in Emacs Lisp. W3 was one of the first large packages to be speech-enabled by Emacspeak — later it was the browser on which I implemented the first draft of the Aural CSS specification. Emacs-W3 enabled many early innovations in the context of providing non-visual access to Web content, including audio formatting and structured content navigation; in summer of 1995, Dave Raggett and I outlined a few extensions to HTML Forms, including the label element as a means of associating metadata with interactive form controls in HTML, and many of these ideas were prototyped in Emacs-W3 at the time. Over the years, Emacs-W3 fell behind the times — especially as the Web moved away from cleanly structured HTML to a massive soup of unmatched tags. This made parsing and error-correcting badly-formed HTML markup expensive to do in Emacs-Lisp — and performance suffered. To add to this, mainstream users moved away because Emacs' rendering engine at the time was not rich enough to provide the type of visual renderings that users had come to expect. The advent of DHTML, and JavaScript based Web Applications finally killed off Emacs-W3 as far as most Emacs users were concerned.

But Emacs-W3 went through a revival on the emacspeak audio desktop in late 1999 with the arrival of XSLT, and Daniel Veillard's excellent implementation via the libxml2 and libxslt packages. With these in hand, Emacspeak was able to hand-off the bulk of HTML error correction to the xsltproc tool. The lack of visual fidelity didn't matter much for an eyes-free environment; so Emacs-W3 continued to be a useful tool for consuming large amounts of Web content that did not require JavaScript support.

During the last 24 months, libxml2 has been built into Emacs; this means that you can now parse arbitrary HTML as found in the wild without incurring a performance hit. This functionality was leveraged first by package shr (Simple HTML Renderer) within the gnus package for rendering HTML email. Later, the author of gnus and shr created a new light-weight HTML viewer called eww that is now part of Emacs 24. With improved support for variable pitch fonts and image embedding, Emacs is once again able to provide visual renderings for a large proportion of text-heavy Web content where it becomes useful for mainstream Emacs users to view at least some Web content within Emacs; during the last year, I have added support within emacspeak to extend package eww with support for DOM filtering and quick content navigation.

8 Audio Formatting — Generalizing Aural CSS

A key idea in Audio System For Technical Readings (AsTeR) was the use of various voice properties in combination with non-speech auditory icons to create rich aural renderings. When I implemented Emacspeak, I brought over the notion of audio formatting to all buffers in Emacs by creating a voice-lock module that paralleled Emacs' font-lock module. The visual medium is far richer in terms of available fonts and colors as compared to voice parameters available on TTS engines — consequently, it did not make sense to directly map Emacs' face properties to voice parameters. To aid in projecting visual formatting onto auditory space, I created property personality analogous to Emacs' face property that could be applied to content displayed in Emacs; module voice-lock applied that property appropriately, and the Emacspeak core handled the details of mapping personality values to the underlying TTS engine.

The values used in property personality were abstract, i.e., they were independent of any given speech engine. Later in the fall of 1995, I re-expressed these set of abstract voice properties in terms of Aural CSS; the work was published as a first draft toward the end of 1995, and implemented in Emacs-W3 in early 1996. Aural CSS was an appendix in the CSS-1.0 specification; later, it graduated to being its own module within CSS-2.0.

Later in 1996, all of Emacs' voice-lock functionality was re-implemented in terms of Aural CSS; the implementation has stood the test of time in that as I added support for more TTS engines, I was able to implement engine-specific mappings of Aural-CSS values. This meant that the rest of Emacspeak could define various types of voices for use in specific contexts without having to worry about individual TTS engines. Conceptually, property personality can be thought of as holding an aural display list — various parts of the system can annotate pieces of text with relevant properties that finally get rendered in the aggregate. This model also works well with the notion of Emacs overlays where a moving overlay is used to temporarily highlight text that has other context-specific properties applied to it.

Audio formatting as implemented in Emacspeak is extremely effective when working with all types of content ranging from richly structured mark-up documents (LaTeX, org-mode) and formatted Web pages to program source code. Perceptually, switching to audio formatted output feels like switching from a black-and-white monitor to a rich color display. Today, Emacspeak's audio formatted output is the only way I can correctly write else if vs elsif in various programming languages!

9 Conversational Gestures For The Audio Desktop

By 1996, Emacspeak was the only piece of adaptive technology I used; in fall of 1995, I had moved to Adobe Systems from DEC Research to focus on enhancing the Portable Document Format (PDF) to make PDF content repurposable. Between 1996 and 1998, I was primarily focused on electronic document formats — I took this opportunity to step back and evaluate what I had built as an auditory interface within Emacspeak. This retrospect proved extremely useful in gaining a sense of perspective and led to formalizing the high-level concept of Conversational Gestures and structured browsing/searching as a means of thinking about user interfaces.

By now, Emacspeak was a complete environment — I formalized what it provided under the moniker Complete Audio Desktop. The fully integrated user experience allowed me to move forward with respect to defining interaction models that were highly optimized to eyes-free interaction — as an example, see how Emacspeak interfaces with modes like dired (Directory Editor) for browsing and manipulating the filesystem, or proced (Process Editor) for browsing and manipulating running processes. Emacs' integration with ispell for spell checking, as well as its various completion facilities ranging from minibuffer completion to other forms of dynamic completion while typing text provided more opportunities for creating innovative forms of eyes-free interaction. With respect to what had gone before (and is still par for the course as far as traditional screen-readers are concerned), these types of highly dynamic interfaces present a challenge. For example, consider handling a completion interface using a screen-reader that is speaking the visual display. There is a significant challenge in deciding what to speak e.g., when presented with a list of completions, the currently typed text, and the default completion, which of these should you speak, and in what order? The problem gets harder when you consider that the underlying semantics of these items is generally not available from examining the visual presentation in a consistent manner. By having direct access to the underlying information being presented, Emacspeak had a leg up with respect to addressing the higher-level question — when you do have access to this information, how do you present it effectively in an eyes-free environment? For this and many other cases of dynamic interaction, a combination of audio formatting, auditory icons, and the ability to synthesize succinct messages from a combination of information items — rather than having to forcibly speak each item as it is rendered visually provided for highly efficient eyes-free interaction.

This was also when I stepped back to build out Emacspeak's table browsing facilities — see the online Emacspeak documentation for details on Emacspeak's table browsing functionality which continues to remain one of the richest collection of end-user affordances for working with two-dimensional data.

9.1 Speech-Enabling Interactive Games

So in 1997, I went the next step in asking — given access to the underlying infromation, is it possible to build effective eyes-free interaction to highly interactive tasks? I picked Tetris as a means of exploring this space, the result was an Emacspeak extension to speech-enable module tetris.el. The details of what was learned were published as a paper in Assets 98, and expanded as a chapter on Conversational Gestures in my book on Auditory Interfaces; that book was in a sense a culmination of stepping back and gaining a sense of perspective of what I had build during this period. The work on Conversational Gestures also helped in formalizing the abstract user interface layer that formed part of the XForms work at the W3C.

Speech-enabling games for effective eyes-free interaction has proven highly educational. Interactive games are typically built to challenge the user, and if the eyes-free interface is inefficient, you just wont play the game — contrast this with a task that you must perform, where you're likely to make do with a sub-optimal interface. Over the years, Emacspeak has come to include eyes-free interfaces to several games including Tetris, Sudoku, and of late the popular 2048 game. Each of these have in turn contributed to enhancing the interaction model in Emacspeak, and those innovations typically make their way to the rest of the environment.

10 Accessing Media Streams

Streaming real-time audio on the Internet became a reality with the advent of RealAudio in 1995; soon there were a large number of media streams available on the Internet ranging from music streams to live radio stations. But there was an interesting twist — for the most part, all of these media streams expected one to look at the screen, even though the primary content was purely audio (streaming video hadn't arrived yet!). Starting in 1996, Emacspeak started including a variety of eyes-free front-ends for accessing media streams. Initially, this was achieved by building a wrapper around trplayer — a headless version of RealPlayer; later I built Emacspeak module emacspeak-m-player for interfacing with package mplayer. A key aspect of streaming media integration in emacspeak is that one can launch and control streams without ever switching away from one's primary task; thus, you can continue to type email or edit code while seamlessly launching and controlling media streams. Over the years, Emacspeak has come to integrate with Emacs packages like emms as well as providing wrappers for mplayer and alsaplayer — collectively, these let you efficiently launch all types of media streams, including streaming video, without having to explicitly switch context.

In the mid-90's, Emacspeak started including a directory of media links to some of the more popular radio stations — primarily as a means of helping users getting started — Emacs' ability to rapidly complete directory and file-names turned out to be the most effective means of quickly launching everything from streaming radio stations to audio books. And even better — as the Emacs community develops better and smarter ways of navigating the filesystem using completions, e.g., package ido, these types of actions become even more efficient!

11 EBooks— Ubiquitous Access To Books

AsTeR — was motivated by the increasing availability of technical material as online electronic documents. While AsTeR processed the TeX family of markup languages, more general ebooks came in a wide range of formats, ranging from plain text generated from various underlying file formats to structured EBooks, with Project Gutenberg leading the way. During the mid-90's, I had access to a wide range of electronic materials from sources such as O'Reilly Publishing and various electronic journals — The Perl Journal (TPJ) is one that I still remember fondly.

Emacspeak provided fairly light-weight but efficient access to all of the electronic books I had on my local disk — Emacs' strengths with respect to browsing textual documents meant that I needed to build little that was specific to Emacspeak. The late 90's saw the arival of Daisy as an XML-based format for accessible electronic books. The last decade has seen the rapid convergence to epub as a distribution format of choice for electronic books. Emacspeak provides interaction modes that make organizing, searching and reading these materials on the Emacspeak Audio Desktop a pleasant experience. Emacspeak also provides an OCR-Mode — this enables one to call out to an external OCR program and read the content efficiently.

The somewhat informal process used by publishers like O'Reilly to make technical material available to users with print impairments was later formalized by BookShare — today, qualified users can obtain a large number of books and periodicals initially as Daisy-3 and increasingly as EPub. BookShare provides a RESTful API for searching and downloading books; Emacspeak module emacspeak-bookshare implements this API to create a client for browsing the BookShare library, downloading and organizing books locally, and an integrated ebook reading mode to round off the experience.

A useful complement to this suite of tools is the Calibre package for organizing ones ebook collection; Emacspeak now implements an EPub Interaction mode that leverages Calibre (actually sqlite3) to search and browse books, along with an integrated EPub mode for reading books.

12 Leveraging Computational Tools — From SQL And R To IPython Notebooks

The ability to invoke external processes and interface with them via a simple read-eval-loop (REPL) is perhaps one of Emacs' strongest extension points. This means that a wide variety of computational tools become immediately available for embedding within the Emacs environment — a facility that has been widely exploited by the Emacs community. Over the years, Emacspeak has leveraged many of these facilities to provide a well-integrated auditory interface.

Starting from a tight code, eval, test form of iterative programming as encouraged by Lisp. Applied to languages like Python and Ruby to explorative computational tools such as R for data analysis and SQL for database interaction, the Emacspeak Audio Desktop has come to encompass a collection of rich computational tools that provide an efficient eyes-free experience.

In this context, module ein — Emacs IPython Notebooks — provides another excellent example of an Emacs tool that helps interface seamlessly with others in the technical domain. IPython Notebooks provide an easy means of reaching a large audience when publishing technical material with interactive computational content; module ein brings the power and convenience of Emacs ' editting facilities when developing the content. Speech-enabling package ein is a major win since editting program source code in an eyes-free environment is far smoother in Emacs than in a browser-based editor.

13 Social Web — EMail, Instant Messaging, Blogging And Tweeting Using Open Protocols

The ability to process large amounts of email and electronic news has always been a feature of Emacs. I started using package vm for email in 1990, along with gnus for Usenet access many years before developing Emacspeak. So these were the first major packages that Emacspeak speech-enabled. Being able to access the underlying data structures used to visually render email messages and Usenet articles enabled Emacspeak to produce rich, succinct auditory output — this vastly increased my ability to consume and organize large amounts of information. Toward the turn of the century, instant messaging arrived in the mainstream — package tnt provided an Emacs implementation of a chat client that could communicate with users on the then popular AOL Instant Messenger platform. At the time, I worked at IBM Research, and inspired by package tnt, I created an Emacs client called ChatterBox using the Lotus Sametime API — this enabled me to communicate with colleagues at work from the comfort of Emacs. Packages like vm, gnus, tnt and ChatterBox provide an interesting example of how availability of a clean underlying API to a specific service or content stream can encourage the creation of efficient (and different) user interfaces. The touchstone of such successful implementations is a simple test — can the user of a specific interface tell if the person whom he is communicating with is also using the same interface? In each of the examples enumerated above, a user at one end of the communication chain cannot tell, and in fact shouldn't be able to tell what client the user at the other end is using. Contrast this with closed services that have an inherent lock-in model e.g., proprietary word processors that use undocumented serialization formats — for a fun read, see this write-up on Universe Of Fancy Colored Paper.

Today, my personal choice for instant messaging is the open Jabber platform. I connect to Jabber via Emacs package emacs-jabber and with Emacspeak providing a light-weight wrapper for generating the eyes-free interface, I can communicate seamlessly with colleagues and friends around the world.

As the Web evolved to encompass ever-increasing swathes of communication functionality that had already been available on the Internet, we saw the world move from Usenet groups to Blogs — I remember initially dismissing the blogging phenomenon as just a re-invention of Usenet in the early days. However, mainstream users flocked to Blogging, and I later realized that blogging as a publishing platform brought along interesting features that made communicating and publishing information much easier. In 2005, I joined Google; during the winter holidays that year, I implemented a light-weight client for Blogger that became the start of Emacs package g-client — this package provides Emacs wrappers for Google services that provide a RESTful API.

14 The RESTful Web — Web Wizards And URL Templates For Faster Access

Today, the Web, based on URLs and HTTP-style protocols is widely recognized as a platform in its own right. This platform emerged over time — to me, Web APIs arrived in the late 90's when I observed the following with respect to my own behavior on many popular sites:

  1. I opened a Web page that took a while to load (remember, I was still using Emacs-W3),
  2. I then searched through the page to find a form-field that I filled out, e.g., start and end destinations on Yahoo Maps,
  3. I hit submit, and once again waited for a heavy-weight HTML page to load,
  4. And finally, I hunted through the rendered content to find what I was looking for.

This pattern repeated across a wide-range of interactive Web sites ranging from AltaVista for search (this was pre-Google), Yahoo Maps for directions, and Amazon for product searches to name but a few. So I decided to automate away the pain by creating Emacspeak module emacspeak-websearch that did the following:

  1. Prompt via the minibuffer for the requisite fields,
  2. Consed up an HTTP GET URL,
  3. Retrieved this URL,
  4. And filtered out the specific portion of the HTML DOM that held the generated response.

Notice that the above implementation hard-wires the CGI parameter names used by a given Web application into the code implemented in module emacspeak-websearch. REST as a design pattern had not yet been recognized, leave alone formalized, and module emacspeak-websearch was initially decryed as being fragile.

However, over time, the CGI parameter names remained fixed — the only things that have required updating in the Emacspeak code-base are the content filtering rules that extract the response — for popular services, this has averaged about one to two times a year.

I later codified these filtering rules in terms of XPath, and also integrated XSLT-based pre-processing of incoming HTML content before it got handed off to Emacs-W3 — and yes, Emacs/Advice once again came in handy with respect to injecting XSLT pre-processing into Emacs-W3!

Later, in early 2000, I created companion module emacspeak-url-templates — partially inspired by Emacs' webjump module. URL templates in Emacspeak leveraged the recognized REST interaction pattern to provide a large collection of Web widgets that could be quickly invoked to provide rapid access to the right pieces of information on the Web.

The final icing on the cake was the arrival of RSS and Atom feeds and the consequent deep-linking into content-rich sites — this meant that Emacspeak could provide audio renderings of useful content without having to deal with complex visual navigation! While Google Reader existed, Emacspeak provided a light-weight greader client for managing ones feed subscriptions; with the demise of Google Reader, I implemented module emacspeak-feeds for organizing feeds on the Emacspeak desktop. A companion package emacspeak-webspace implements additional goodies including a continuously updating ticker of headlines taken from the user's collection of subscribed feeds.

15 Mashing It Up — Leveraging Evolving Web APIs

The next step in this evolution came with the arrival of richer Web APIs — especially ones that defined a clean client/server separation. In this respect, the world of Web APIs is a somewhat mixed bag in that many Web sites equate a Web API with a JS-based API that can be exclusively invoked from within a Web-Browser run-time. The issue with that type of API binding is that the only runtime that is supported is a full-blown Web browser; but the arrival of native mobile apps has actually proven a net positive in encouraging sites to create a cleaner separation. Emacspeak has leveraged these APIs to create Emacspeak front-ends to many useful services, here are a few:

  1. Minibuffer completion for Google Search using Google Suggest to provide completions.
  2. Librivox for browsing and playing free audio books.
  3. NPR for browsing and playing NPR archived programs.
  4. BBC for playing a wide variety of streaming content available from the BBC.
  5. A Google Maps front-end that provides instantaneous access to directions and Places search.
  6. Access to Twitter via package twittering-mode.

And a lot more than will fit this margin! This is an example of generalizing the concept of a mashup as seen on the Web with respect to creating hybrid applications by bringing together a collection of different Web APIs. Another way to think of such separation is to view an application as a head and a body — where the head is a specific user interface, with the body implementing the application logic. A cleanly defined separation between the head and body allows one to attach different user interfaces i.e., heads to the given body without any loss of functionality, or the need to re-implement the entire application. Modern platforms like Android enable such separation via an Intent mechanism. The Web platform as originally defined around URLs is actually well-suited to this type of separation — though the full potential of this design pattern remains to be fully realized given today's tight association of the Web to the Web Browser.

16 Conclusion

In 1996, I wrote an article entitled User Interface — A Means To An End pointing out that the size and shape of computers were determined by the keyboard and display. This is even more true in today's world of tablets, phablets and large-sized phones — with the only difference that the keyboard has been replaced by a touch screen. The next generation in the evolution of personal devices is that they will become truly personal by being wearables — this once again forces a separation of the user interface peripherals from the underlying compute engine. Imagine a variety of wearables that collectively connect to ones cell phone, which itself connects to the cloud for all its computational and information needs. Such an environment is rich in possibilities for creating a wide variety of user experiences to a single underlying body of information; Eyes-Free interfaces as pioneered by systems like Emacspeak will come to play an increasingly vital role alongside visual interaction when this comes to pass.

–T.V. Raman, San Jose, CA, September 12, 2014

17 References

Tuesday, May 27, 2014

Emacspeak And Company: Complete Anything Front-End For emacspeak

Emacspeak And Company: Complete Anything Front-End For Emacspeak

1 Emacspeak And Company: Complete Anything Front-End For Emacspeak

Module emacspeak-company speech-enables package Company — a flexible complete-anything extension for Emacs. Package company gains much of its flexibility by providing an extensible framework for both back-ends and front-ends; back-ends are responsible for language-specific support e.g., C++ vs Emacs Lisp; front-ends can provide different visualizations of the available completions.

I started using package company as I taught myself to program in Go over the last couple of weeks, and package emacspeak-company was one of the bi-products.

1.1 Using Company With Emacspeak

You can turn on company-mode in dividual buffers; you can also turn it on globally. Company comes pre-packaged with backend support for many programming languages; for programming in Go, I use module company-go in conjunction with the GoCode tool.

See customization group company to customize package company; Emacspeak loads package emacspeak-company when package company is loaded, and that automatically sets up the Emacspeak front-end.

Once activated, package company shows available completions where available once you type a prescribed number of characters. Available candidates are displayed visually via an overlay and can be traversed using either the up/down arrows or keys M-n and M-p. You can also search and filter the available completions, see documentation for command company-mode. The available visual front-ends also display relevant metadata for the current candidate in the echo area.

Front-end emacspeak-company performs the following additional actions:

  • Speaks current candidate along with the relevant metadata.
  • The metadata is spoken using voice-annotate.
  • Auditory icon help indicates that completion has started.
  • pressing F1 during completion displays documentation for the current candidate.
  • You can choose the current candidate by pressing RET; this

speaks the selected candidate.

  • Auditory icon close-object indicates that completion has finished.

1.2 Insights From Speech-Enabling Company

Company uses a fluid visual interface to display completions without the user having to switch contexts — it achieves this by using overlays that appear briefly in the form of a conceptual tooltip. These pseudo tooltips are created and destroyed via a timer; keyboard interaction causes these to be updated — including hiding the tooltip where appropriate.

Module emacspeak-company speech-enables this interface by examining the underlying information used to create the visualization to produce an effective audio-formatted representation. The net effect is that you can write code with completion helping you along the way; you do not need to switch tasks to lookup details as to what completions are available.

1.3 Acknowledgements

Thanks again to the authors of package company for a really nice tool — it's a real productivity winner — especially when learning a new language and its built-in packages.

I found these articles really helpful while learning to write package emacspeak-company.

Learning Go was a pleasure (it's still a pleasure — I'm still learning:-)) and the documentation on GoLang is excellent. As an added bonus, that entire site uses clean, well-formed HTML without any unnecessary artifacts that make so much of today's Web a giant mess; I have been able to use Emacs/EWW exclusively while working with — a real bonus for someone programming heavily in Emacs.

Date: <2014-05-27 Tue>

Author: T.V Raman

Created: 2014-05-27 Tue 08:51

Emacs (Org mode 8.2.6)


Monday, May 12, 2014

Announcing Emacspeak 40.0 AKA WowDog!

Emacspeak 40.0—WowDog—Unleashed!

1 Emacspeak-40.0 (WowDog) Unleashed!

** For Immediate Release:

San Jose, Calif., (May 13, 2014) Emacspeak: Redefining Accessibility In The Era Of Web Computing –Zero cost of upgrades/downgrades makes priceless software affordable!

Emacspeak Inc (NASDOG: ESPK) --– announces the immediate world-wide availability of Emacspeak 40.0 (WowDog) –a powerful audio desktop for leveraging today's evolving data, social and service-oriented Web cloud.

1.1 Investors Note:

With several prominent tweeters expanding coverage of #emacspeak, NASDOG: ESPK has now been consistently trading over the social net at levels close to that once attained by DogCom high-fliers—and as of November 2013 is trading at levels close to that achieved by once better known stocks in the tech sector.

1.2 What Is It?

Emacspeak is a fully functional audio desktop that provides complete eyes-free access to all major 32 and 64 bit operating environments. By seamlessly blending live access to all aspects of the Internet such as Web-surfing, blogging, social computing and electronic messaging into the audio desktop, Emacspeak enables speech access to local and remote information with a consistent and well-integrated user interface. A rich suite of task-oriented tools provides efficient speech-enabled access to the evolving service-oriented social Web cloud.

1.3 Major Enhancements:

  • Emacs EWW: Consume Web content efficiently. ��
  • emacspeak-url-templates: Smart Web access. ♅
  • emacspeak-websearch.el Find things fast. ♁
  • gmaps.el: Find places, read reviews, get there. ��
  • Feed Browser Consume feeds post Google-Reader. ␌
  • Freebase Search: Explore freebase knowledge base. ��
  • Emacs 24.4: Supports all new features in Emacs 24.4. ��
  • And a lot more than wil fit this margin. …

1.4 Establishing Liberty, Equality And Freedom:

Never a toy system, Emacspeak is voluntarily bundled with all major Linux distributions. Though designed to be modular, distributors have freely chosen to bundle the fully integrated system without any undue pressure—a documented success for the integrated innovation embodied by Emacspeak. As the system evolves, both upgrades and downgrades continue to be available at the same zero-cost to all users. The integrity of the Emacspeak codebase is ensured by the reliable and secure Linux platform used to develop and distribute the software.

Extensive studies have shown that thanks to these features, users consider Emacspeak to be absolutely priceless. Thanks to this wide-spread user demand, the present version remains priceless as ever—it is being made available at the same zero-cost as previous releases.

At the same time, Emacspeak continues to innovate in the area of eyes-free social interaction and carries forward the well-established Open Source tradition of introducing user interface features that eventually show up in luser environments.

On this theme, when once challenged by a proponent of a crash-prone but well-marketed mousetrap with the assertion "Emacs is a system from the 70's", the creator of Emacspeak evinced surprise at the unusual candor manifest in the assertion that it would take popular idiot-proven interfaces until the year 2070 to catch up to where the Emacspeak audio desktop is today. Industry experts welcomed this refreshing breath of Courage Certainty and Clarity (CCC) at a time when users are reeling from the Fear Uncertainty and Doubt (FUD) unleashed by complex software systems backed by even more convoluted press releases.

1.5 Independent Test Results:

Independent test results have proven that unlike some modern (and not so modern) software, Emacspeak can be safely uninstalled without adversely affecting the continued performance of the computer. These same tests also revealed that once uninstalled, the user stopped functioning altogether. Speaking with Aster Labrador, the creator of Emacspeak once pointed out that these results re-emphasize the user-centric design of Emacspeak; "It is the user –and not the computer– that stops functioning when Emacspeak is uninstalled!".

1.5.1 Note from Aster,Bubbles and Tilden:

UnDoctored Videos Inc. is looking for volunteers to star in a video demonstrating such complete user failure.

1.6 Obtaining Emacspeak:

Emacspeak can be downloaded from Google Code –see You can visit Emacspeak on the WWW at You can subscribe to the emacspeak mailing list by sending mail to the list request address The Emacspeak Blog is a good source for news about recent enhancements and how to use them. The WowDog release is at The latest development snapshot of Emacspeak is always available via Subversion from Google Code at

1.7 History:

Emacspeak 40.0 goes back to Web basics by enabling efficient access to large amounts of readable Web content. Emacspeak 39.0 continues the Emacspeak tradition of increasing the breadth of user tasks that are covered without introducing unnecessary bloatware. Emacspeak 38.0 is the latest in a series of award-winning releases from Emacspeak Inc. Emacspeak 37.0 continues the tradition of delivering robust software as reflected by its code-name. Emacspeak 36.0 enhances the audio desktop with many new tools including full EPub support — hence the name EPubDog. Emacspeak 35.0 is all about teaching a new dog old tricks — and is aptly code-named HeadDog in honor of our new Press/Analyst contact. emacspeak-34.0 (AKA Bubbles) established a new beach-head with respect to rapid task completion in an eyes-free environment. Emacspeak-33.0 AKA StarDog brings unparalleled cloud access to the audio desktop. Emacspeak 32.0 AKA LuckyDog continues to innovate via open technologies for better access. Emacspeak 31.0 AKA TweetDog — adds tweeting to the Emacspeak desktop. Emacspeak 30.0 AKA SocialDog brings the Social Web to the audio desktop—you cant but be social if you speak! Emacspeak 29.0—AKAAbleDog—is a testament to the resilliance and innovation embodied by Open Source software—it would not exist without the thriving Emacs community that continues to ensure that Emacs remains one of the premier user environments despite perhaps also being one of the oldest. Emacspeak 28.0—AKA PuppyDog—exemplifies the rapid pace of development evinced by Open Source software. Emacspeak 27.0—AKA FastDog—is the latest in a sequence of upgrades that make previous releases obsolete and downgrades unnecessary. Emacspeak 26—AKA LeadDog—continues the tradition of introducing innovative access solutions that are unfettered by the constraints inherent in traditional adaptive technologies. Emacspeak 25 —AKA ActiveDog —re-activates open, unfettered access to online information. Emacspeak-Alive —AKA LiveDog —enlivens open, unfettered information access with a series of live updates that once again demonstrate the power and agility of open source software development. Emacspeak 23.0 – AKA Retriever—went the extra mile in fetching full access. Emacspeak 22.0 —AKA GuideDog —helps users navigate the Web more effectively than ever before. Emacspeak 21.0 —AKA PlayDog —continued the Emacspeak tradition of relying on enhanced productivity to liberate users. Emacspeak-20.0 —AKA LeapDog —continues the long established GNU/Emacs tradition of integrated innovation to create a pleasurable computing environment for eyes-free interaction. emacspeak-19.0 –AKA WorkDog– is designed to enhance user productivity at work and leisure. Emacspeak-18.0 –code named GoodDog– continued the Emacspeak tradition of enhancing user productivity and thereby reducing total cost of ownership. Emacspeak-17.0 –code named HappyDog– enhances user productivity by exploiting today's evolving WWW standards. Emacspeak-16.0 –code named CleverDog– the follow-up to SmartDog– continued the tradition of working better, faster, smarter. Emacspeak-15.0 –code named SmartDog–followed up on TopDog as the next in a continuing a series of award-winning audio desktop releases from Emacspeak Inc. Emacspeak-14.0 –code named TopDog–was the first release of this millennium. Emacspeak-13.0 –codenamed YellowLab– was the closing release of the 20th. century. Emacspeak-12.0 –code named GoldenDog– began leveraging the evolving semantic WWW to provide task-oriented speech access to Webformation. Emacspeak-11.0 –code named Aster– went the final step in making Linux a zero-cost Internet access solution for blind and visually impaired users. Emacspeak-10.0 –(AKA Emacspeak-2000) code named WonderDog– continued the tradition of award-winning software releases designed to make eyes-free computing a productive and pleasurable experience. Emacspeak-9.0 –(AKA Emacspeak 99) code named BlackLab– continued to innovate in the areas of speech interaction and interactive accessibility. Emacspeak-8.0 –(AKA Emacspeak-98++) code named BlackDog– was a major upgrade to the speech output extension to Emacs.

Emacspeak-95 (code named Illinois) was released as OpenSource on the Internet in May 1995 as the first complete speech interface to UNIX workstations. The subsequent release, Emacspeak-96 (code named Egypt) made available in May 1996 provided significant enhancements to the interface. Emacspeak-97 (Tennessee) went further in providing a true audio desktop. Emacspeak-98 integrated Internetworking into all aspects of the audio desktop to provide the first fully interactive speech-enabled WebTop.

About Emacspeak:

Originally based at Cornell (NY) –home to Auditory User Interfaces (AUI) on the WWW– Emacspeak is now maintained on GoogleCode -- —and Sourceforge — The system is mirrored world-wide by an international network of software archives and bundled voluntarily with all major Linux distributions. On Monday, April 12, 1999, Emacspeak became part of the Smithsonian's Permanent Research Collection on Information Technology at the Smithsonian's National Museum of American History.

The Emacspeak mailing list is archived at Vassar –the home of the Emacspeak mailing list– thanks to Greg Priest-Dorman, and provides a valuable knowledge base for new users.

1.8 Press/Analyst Contact: Tilden Labrador

Going forward, Tilden acknowledges his exclusive monopoly on setting the direction of the Emacspeak Audio Desktop, and promises to exercise this freedom to innovate and her resulting power responsibly (as before) in the interest of all dogs.

**About This Release:

Windows-Free (WF) is a favorite battle-cry of The League Against Forced Fenestration (LAFF). –see for details on the ill-effects of Forced Fenestration.

CopyWrite )C( Aster and Hubbell Labrador. All Writes Reserved. HeadDog (DM), LiveDog (DM), GoldenDog (DM), BlackDog (DM) etc., are Registered Dogmarks of Aster, Hubbell and Tilden Labrador. All other dogs belong to their respective owners.

Author: T.V Raman

Created: 2014-05-09 Fri 08:44

Emacs (Org mode 8.2.6)


Thursday, May 01, 2014

Emacspeak: EWW Updates For The Complete Audio Desktop

Emacspeak EWW Updates

1 Emacspeak EWW Updates

Within a few weeks, EWW has become my prefered way of consuming large amounts of Web content — except for simple fill-out forms, it has entirely replaced Emacs/W3 for me. Goes without saying that I still use ChromeVox for Js-heavy Web sites.

This article summarizes some of the major enhancements to EWW implemented in module emacspeak-eww; See the online documentation and key-binding help for complete details.

1.1 EWW And Masquerade Mode

You can now have EWW masquerade as modern browsers; note that some sites might serve you more feature-rich content in this mode.

1.2 Smart Google Searches

All of the features from module emacspeak-google have been integrated to work with EWW. In addition, if running in masquerade-mode, you can quickly access knowledge cards if available on the current results page.

1.3 Rich DOM Filtering

The suite of DOM filtered views has been enhanced to support filtering by class, id, role, or element-list. In addition, you can also invert these filters.

1.4 Structure Navigation

Emacspeak now supports structured navigation in pages rendered by EWW, see the key-bindings for details.

1.5 Integration With URL-Templates And Feeds

EWW is now fully integrated with Emacspeak WebSearch, URL-Templates and Feeds. This means that hitting g in an EWW buffer does the right thing with respect to updating the rendered buffer:

  • If viewing a feed, the feed is reloaded before it is rendered as HTML.
  • If viewing a url-template, the template is re-opened, prompting for user-input if needed.

1.6 XSLT Integration

Most of the functionality provided by module emacspeak-xslt for filtering the DOM in the world of Emacs/W3 is achieved more effectively via the DOM filtering commands in emacspeak-eww —that said, XSLT pre-processing is fully integrated with EWW via supporting modules emacspeak-ew and emacspeak-webutils.

1.7 Other Fun Things To Do

Here are some more fun things that might be worth doing:

  • Integrate PhantomJS with EWW to load content that is rendered via JS document.write.
  • Integrate with CasperJS to enable interaction with light-weight WebApps.
  • Integrate with Chrome over the debugger API to access the live DOM within Chrome.

Share And Enjoy

Monday, March 24, 2014

Emacspeak Webspace: Glancing At Information On The Audio Desktop

A Web News Ticker For Emacs

1 WebSpace: A Web News Ticker For Emacs

Module Emacspeak-Webspace provides a rolling ticker of information that is automatically retrieved, cached and maintained by Emacspeak. Using this functionality, you can set up specific buffers to have interesting tidbits of information displayed automatically in the header-line; Emacspeak speaks these items of information as you switch contexts. This article explains the usage model and underlying design of Emacspeak Webspaces.

1.1 Background

The Emacspeak Webspace module was originally created in early Interaction Free Information Access (2008) because I wanted the audio equivalent of being able to quickly glance at information. Here are some aspects of visual interaction that I wanted to emulate:

  • You can quickly glance at something while switching contexts, and ignore it if it is not important.
  • The object that you glance at while switching contexts does not become an object of attention ie, the casual task remains casual, as opposed to becoming the primary task. Email is the antithesis to this model — where if you start glancing at email, it's a sufficiently strong distraction that you'll start doing email — as opposed to what you were supposed to be doing.
  • If the item you glanced at deserves further attention, you can come back to it later — and the system gives you sufficient confidence in your ability to come back to it later — note that this is essential to ensure the previous requirement.
  • Items are cached but get pushed out by newer items — this makes sure you dont feel pressured to read everything or have to explicitly catch-up — in prior systems including email and Google Reader, I always found the task of hitting catch-up without reading everything a fairly stressful experience.
  • Applied to information updates, think hallway conversations outside your office — you mostly ignore them, but sometime get drawn because you hear some specific keywords and/or concepts that draw your attention.

1.2 Early Implementation In 2009

I used the WebSpace functionality in Emacspeak for news and weather updates starting 2009; at some time in late 2009, I cut it over to get updates from my Google Reader stream. It was extremely effective for my usage pattern — I typically activated the functionality in all shell buffers. In my work style where I switch among the primary tasks of engineering (writing/reviewing code), writing/reviewing design documents, and doing email to facilitate the previous two tasks, the shell buffer is where I switch to while context-switching e.g., launching a build after writing code as an example. Having the Webspace functionality say something interesting at those times was optimal.

1.3 Initial Implementation And Design

The information to be pulled in the rolling header line is pulled from a cache — in 2009, this cache was populated from my Google Reader stream. The cache was maintained in a ring with older items falling off the end. You could optionally switch to a buffer displaying all of the currently cached items — this functionality assured me that I could always later find an item that had caught my attention while I was in the process of context switching amongst tasks. Notice that if I didn't go back and check for that item within a day, it would fall off the ring-buffer cache — and this usually would mean that it likely wasn't that important after all.

1.4 Life After Google Reader

With the passing of Google Reader last year, I started implementing the feed-reading functionality I needed in Emacspeak independent of Google Reader; see the earlier article in this blog titled Managing And Accessing Feeds On The Emacspeak Audio Desktop. Next, I updated the Emacspeak WebSpace functionality to build its cache from the set of feeds in emacspeakfeeds.

1.5 Usage Pattern

This section details my own usage pattern and set-up — this is by no means the only way to use this functionality.

  1. Emacspeak binds Webspace functionality to Hyper Space as a prefix key.
  2. Hyper Space h invokes command emacspeak-webspace-headlines — this command initializes the feed-store cache, and sets up the header-line in the current buffer to display a rolling ticker. Note that you can invoke this command in multiple buffers; those buffers will share a common headlines cache.
  3. The feed-store is updated during Emacs idle-time; I often invoke the elisp form (emacspeak-webspace-headlines-populate) to populate the cache initially. Note that depending on your network, and the number of feeds you have in emacspeak-feeds, this can block emacs for a couple of minutes.
  4. Command emacspeak-webspace-headlines-browse displays an interactive buffer containing the current set of cached headlines — this is where you go to track down a headline you heard in passing. I bind this to Super h by customizing emacspeak-super-keys.
  5. You can set up other types of information in your rolling header — something I initially used it for was weather — see command emacspeak-webspace-weather personally, I 've not found this as useful in CA given how consistently good the weather is here.
  6. For related work in Emacs, see Emacs package newsticker. That package works well with Emacspeak, but in using it earlier, I found that I could not prevent myself from starting to read content i.e., it failed to meet the glance and continue requirement.

Date: <2014-03-24 Mon>

Author: T.V Raman

Created: 2014-03-24 Mon 18:00

Emacs (Org mode 8.2.5c)


Saturday, February 08, 2014

Searching GMail Using IMap And GNUS

Searching GMail Using IMap and GNUS

1 Searching GMail Using IMap and GNUS

Emacs package GNUS provides a very efficient interface for consuming large amounts of email. You can access GMail using GNUS' IMap interface, for my own configuration for doing this, see file tvr/gnus-prepare.el in the Emacspeak SVN repository. Module gm-nnir.el in package g-client implements some convenience hooks to enable efficient searching of GMail. Module emacspeak-gnus has been updated to bind commands from module gm-nnir.el to ? and / in the Group buffer.

1.1 Basic Usage

Assuming you already have GNUS configured to read GMail via IMap, you can:

  • Press / in the groups buffer to search your mail. This command accepts all GMail queries, so for example,
after: 2014/02/01 to: me

Will find all messages received after February 1, 2014 and addressed to you.

label: foo after: 2014/01/01

Will find messages with label foo and received after January 1, 2014.

  • Press ? in the Group buffer to execute a more extensive search command; this accepts both IMap query specifications (per RFC 3501) as well as GMail query specifications. The command provides smart completion, follow the prompts to build up complex queries. In general, there is almost nothing you cannot do with the GMail query language, so this command is mostly there as a backup.

1.2 The Technical Details

The GMail query language is exposed to IMap via custom search key X-GM-RAW; commands gm-nnir-group-make-gmail-group and gm-nnir-group-make-nnir-group use this functionality to construct ephemeral groups that hold the search results.

Wednesday, January 01, 2014

Exploring And Accessing BBC Podcasts and Program Archives

Exploring BBC Podcasts And Program Archives

1 Exploring BBC Podcasts And Program Archives

1.1 Summary

A short overview of tools on the emacspeak desktop for easily exploring and accessing BBC program content.

1.2 Background

The BBC offers a wealth of audio content from both domestic BBC Radio as well as BBC World Service. Much of this content is available as Podcasts for a week after it has been broadcast; in some instances, content is archived and available for more than a week.

The primary gateway to this content is BBC IPlayer. In addition, one can subscribe to RSS feeds for BBC Podcasts.

1.3 Accessing BBC Content From Emacspeak

Here are some of the tools I use on the Emacspeak desktop to quickly find and access content from the BBC:

  • The BBC publishes a continuously updated directory of RSS feeds; Emacspeak url template BBC Podcast Directory can be used to open this directory of feeds.
  • With the above directory of feeds at hand, it is easy to subscribe to oft-accessed feeds via emacspeak-feeds — see Managing And Accessing Feeds.
  • In addition to the directory of feeds covered above, the BBC publishes a detailed program guide as XML; Emacspeak url template BBC Program Guide accesses the program guide.
  • The program guide described above gives access to RSS feeds for both current programs as

well as past archives. The program guide is a wealth of information that makes all the information available in one location, unlike the BBC IPlayer site.

  • A note for UK users; the program guide above is presently set up to only show content that is available world-wide; if you're in the UK, you may want to remove the test for

in the XSL stylesheet emacspeak/xsl/bbc-ppg.xsl.

  • You can find the XML feed for the BBC Program Guide, as well as the associated XML Schema definition on the BBC's Web site.
  • Finally, you can access the BBC IPlayer page for any given BBC channel via Emacspeak url template BBC IPlayer.

Share And Enjoy! And Hear's Wishing Everyone A Very Happy 2014!