Saturday, October 10, 2020

Extracted A Light-Weight Key-Reader By Progressive Simplification

Extracting A Light-Weight Key-Sequence Reader By Progressive Simplification

1 Background

In the previous article entitled On Defining Keys In Emacs I covered the issue of declaring key-sequences when defining keyboard short-cuts. During the last week, I took the underlying Emacs Lisp function edmacro-parse-key and via a process of progressive simplification derived a new new-kbd function that is much simpler and consequently easier to understand. You can see the step-by-step simplification via the Git history for file new-kbd.el (note: used to be called ems-kbd.el). That file contains the final version of the simplified function, along with a test-suite that verifies that it's behavior is convenient with the solution built into Emacs. The updated function is now part of Emacspeak and is named ems-kbd in that package.

The next section gives a high-level overview of the steps that led to the final version.

2 Steps Toward Simplification

2.1 Separate Tokenization From Processing

Function edmacro-parse-keys interweaves the process of tokenizing its input string and how various parts of that string are processed in a single while loop.

The first step in simplification was to separate these steps, by using function split-string to split the input string on whitespace to generate a list of words.

A simple cl-loop is then used to turn each word into a key that is accumulated into a result vector.

2.2 Refactoring Case Analysis

Once tokenization is factored out, the remainder of function edmacro-parse-keys converts each key-specification into either the corresponding string or vector representation.

The original requirement of parsing the serialization of keyboard-macros brought along additional logic that I first eliminated, since my goal was to create a function to be used in defining keyboard-shortcuts.

  • I eliminated code that handled invocation of M-x execute-extended-command during a keyboard-macro.
  • I eliminated processing of comments within the keyboard-macro serialization.

2.3 Rearranging Conditionals

Next, I rearranged conditionals and in that process eliminated cond clauses that were now effectively dead-code.

In the process, I also eliminated test-predicates that had side-effects to hopefully result in less fragile code.

2.4 Lexically Bind Regex Patterns

To improve readability, I created let-bindings to some of the regex patterns used to identify key-sequence patterns. In the process, I also made these more readable by using [:space:] for white-space tests.

2.5 Always Return A Vector

Finally, I setup the new function to always return a vector; function edmacro-parse-keys returns either a string or a vector based on how it is called. Since Emacs now takes a vector in every context where a key-sequence is expected, this simplification does not break when using our simplified function for defining keys.

   (defun new-kbd (string )
  "Simplified and hopefully more robust kbd function.
Always returns a vector i.e. like passing need-vector to edmacro-parse-keys. "
  (let ((res [])
        (special-char-reg "^\\(NUL\\|RET\\|LFD\\|ESC\\|SPC\\|DEL\\)$")
        (modifier+angle-reg "^\\(\\([ACHMsS]-\\)*\\)<\\(.+\\)>$"))
    (cl-loop
     for word in (split-string string)
     do
     (let* ((key nil))
       (cond 
        ((and ;;; modifier+-<key> without DEL etc
          (not (string-match special-char-reg word))
          (string-match modifier+angle-reg word))
         (setq key
               (list
                (intern 
                 (concat ;;; strip < and >
                  (substring word (match-beginning 1) (match-end 1))
                  (substring word (match-beginning 3) (match-end 3)))))))
        (t
         (let ((prefix 0)
               (bits 0))
           (while ;;; calculate modifier bits
               (string-match "^[ACHMsS]-." word)
             (cl-incf bits
                      (cdr
                       (assq (aref word 0)
                             '((?A . ?\A-\^@)
                               (?C . ?\C-\^@)
                               (?H . ?\H-\^@)
                               (?M . ?\M-\^@)
                               (?s . ?\s-\^@)
                               (?S . ?\S-\^@)))))
             (cl-incf prefix 2)
             (cl-callf substring word 2))
           (when (string-match "^\\^.$" word)
             (cl-incf bits ?\C-\^@)
             (cl-incf prefix)
             (cl-callf substring word 1))
           (when-let
               (found
                (assoc word
                       '(("NUL" . "\0")
                         ("RET" . "\r")
                         ("LFD" . "\n")
                         ("TAB" . "\t")
                         ("ESC" . "\e")
                         ("SPC" . " ")
                         ("DEL" . "\177"))))
             (setq word (cdr found)))
           (cond ;;; apply modifiers 
            ((= bits 0) (setq key word))
            ((/= (length word) 1)
             (error "%s: Prefix  must precede a single character, not %s"
                    string word))
            ((and
              (/= (logand bits ?\C-\^@) 0)
              (string-match "[@-_a-z]" word))
             (setq key
                   (list (+ bits (- ?\C-\^@)
                            (logand (aref word 0) 31)))))
            (t (setq key (list (+ bits (aref word 0)))))))))
;;; push key on to the result vector 
       (when key (cl-callf vconcat res key))))
    res))

You can verify the code above by running the tests found at the end of file new-kbd.el — the tests were extracted from the various patterns described in the Elisp Reference, as well as by reading the code in edmacro-parse-keys.

2.6 Closing Thoughts

The above simplification exercise was done by:

  1. Starting with the original edmacro-parse-keys copied over to a new file and renamed to function new-kbd.
  2. Adding a set of tests at the end of file, essentially this is a let that binds a set of tests, then compares the result of calling our new function on each value with that returned by the original.
  3. Modifying and simplifying our new function and running eval-buffer after each step.
  4. It was a fun exercise to see order emerge from chaos at each step!

Monday, October 05, 2020

On Defining Keys In Emacs

On Defining Keys In Emacs

1 Overview

Emacs is extremely flexible with respect to letting you bind keys to commands. Interestingly, and perhaps because of its history tracing back to keyboard macros, naming the key you wish to bind is not so straight-forward as it appears on the surface. Here are things I've used and learnt over the last 30 years of Emacs hackery:

  1. The most basic way of naming a key is via a string of the form \C-a or \M-\C-a where \C denotes the Control Modifier.
  2. This notation gets unwieldy fast when using multiple modifiers as in the \M-\C-a example.
  3. It gets worse, under X, you can bind \C- / for /Control Space — notice that the space character here which is easy to miss as white-space is significant.
  4. The \<modifier>-char notation mostly appears to date back to the age of the terminal emulator (tty), so for example, it cannot denote Control+Right for example.
  5. The Emacs and Elisp info manuals recommend using function kbd for this reason, this function on the surface provides a unified interface for naming keys.

If you look no further, you remain happy, but …

2 Function kbd

But if you do peel the onion, here is what you find:

  1. Function kbd does nothing more than calling function read-kbd-macro.
  2. Function read-kbd-macro documents itself as reading the region as a keyboard macro.
  3. The above overloading is achieved by the call to function edmacro-parse-keys.
  4. And finally we reach the scary core of the onion that is sufficient to make your eyes water!
  5. Function edmacro-parse-keys is an inscrutable piece of code that both looks extremely clever and magical and like all clever code, feels fragile.

3 Understanding And (Hopefully) Simplifying Function kbd

A simplified ems-kbd function that is free from the complexity resulting from reading in a saved keyboard macro feels like a worthwhile goal. I started down this road both to understand function edmacro-parse-keys — sadly, I haven't yet found a specification for naming all available keys outside of the implementation of that function. So I started ems-kbd.el along with a simple test-suite to see how far I could get toward the goal of creating an equivalent kbd function.

3.1 Steps So Far

  1. Documented the implementation.
  2. Added a simple set of tests in the file.
  3. Changed if to cond or when as appropriate.
  4. Eliminated the logic that handled the keyboard macro serialization for repeated keys.
  5. Removed the logic for handling invocations of execute-extended-command — otherwise known as M-x.
  6. A lot more remains, e.g.,
    • simplifying how modifier bits are handled,
    • Simplifying how special patterns like SPC or TAB are handled.
    • And likely a lot more.

An alternative approach might be to rewrite it from scratch, but that feels intractable unless I can track down a specification for naming keys that is exhaustive with respect to the various conventions that have been used over the years.