Composiphrase: Composable editing language like Vim, but moreso

2025-02-11 :: text-editing, vim, emacs, composiphrase

One of the best ideas in Vi and Vim, which I rightfully see praised online frequently, is the idea of a “composable text editing language”. Vi has a bunch of movements, like w for “forward word”, and some operations, like d for “delete”, and you can put them together. dw deletes forward one word. You can also add a number: 4dw deletes 4 words. Vim adds “text objects”, which are “selection” instead of movement, eg. select a word, a sentence, etc. You can also compose these with commands, so diw deletes a word, di) deletes inside parentheses, etc. But what if we took the ideas of composability and language more seriously?

(Note, this is part 4 in a series about my “new composable text editor”, Composiphrase. See part 3 here. This is the main explanation of what Composiphrase is. I should have posted this one earlier in the series, but I wanted to have the demo ready before publishing it.)

Consider Vim🔗

Many motions in vi are focused on semantic units of text, like character, word, sentence, paragraph, and line. Vim gives us a nice name to call these, “text objects”, while also giving bindings to select them rather than just move to them. But let’s consider the word text object in vim.

The w key moves forward to the beginning of the next word. If we want to select it, we use a selection prefix i (though we have to be in visual mode or after an operator), then w to select word. If we want to go backward, we use b. Hmm, that is less satisfying, so far all of the bindings have included w for word. If we want to go to the end of a word, of course, we use e. Ok, moving on, if we want to go backward to the end of a word, we naturally use ge. What?

Ok, so almost every different movement to do with a word needs a different, mostly unrelated binding. Let’s look at sentences.

We can select a sentence with (in visual mode or after an operator like d) is. Ok, select word uses w, select sentence uses s, that makes sense. Now forward sentence: ). Hmm.

There is some effort to make the keys for different motions be mnemonic, eg. backward sentence is (, which seems like the reverse of the binding for forward sentence, and initials of the names of objects are used sometimes. But there is no composition here to build bigger things out of smaller things, vim only adds composition of 3 concepts (number, movement, and one of a very limited set of operations), and all of the different things we are discussing are under the umbrella of the movement.

Vim’s text object selection adds a nice bit of composition. The i key is for “inside”, so if we use i) we select inside the parentheses. The a key is for “around”, so if we use aw we select a word plus whitespace around it. Except the meanings of “inside” and “around” aren’t very consistent, iw selects the whole word while a) selects the whole parenthesized expression with no surrounding white space.

More words, more composition🔗

Back to word objects. Let’s pretend we have one key for each key word we would use to describe these movements in English. But let’s just highlight the key words and assume there is some key for expressing it. w is move forward to the beginning of a word. e becomes move forward to the end of a word. ) becomes move forward to the beginning of a sentence. Vim doesn’t have a binding to go forward to the end of our sentence, but if we compose things, it just falls out: move forward end sentence.

In written English, a sentence also has a period, a marker that tells us when it is over. Let’s be explicit, and add an execute instruction. So now w has become move forward beginning word execute. Let us bask in the glory of having turned a single key operation into 5 keys.

No, keep basking, this is nice.

Ok, so obviously turning 1 key into 5 keys is a bit of an inconvenience. There are a few things we can do. For one, we can have defaults and omit pieces. If we don’t mention whether we are going to the beginning or the end explicitly, let’s assume it is the beginning. If we don’t mention forward or backward explicitly, let’s assume forward. If we don’t mention an operation explicitly, let’s assume move.

Next, let’s decouple the compositional language from the keys. We can define keys that add an arbitrary number of words to the sentence and/or execute the sentence. We can bind w to add each of those words (or just the ones that aren’t implicit) and execute the sentence, and once again have a single key. But now that we’ve decomposed w into orthogonal pieces, we can not only recompose them back in Vim’s way, but also in new ways.

I’ve implemented a little library called Composiphrase for doing all of this, as well as various other libraries that can be put together to form a whole system of composable, modal editing. You configure composiphrase with a list of verbs and objects, what defaults they have for modifiers, and a big list of match specifications to match sentences to commands. Modifiers can be used in matching and/or passed as arguments to functions that you match to. It is easy to add new verbs or objects as you find new operations that you want to perform. It includes a demo matching configuration that I’ve made and have been using lately, as well as a key binding configuration that takes some inspiration from Vim, but leaning into this extra composability. (The matching configuration is probably more universal than the key bindings, so maybe I’ll upgrade the Composiphrase verb, object, and matching configuration to be “default” instead of “demo”. The matching configuration can be used with many different key binding layouts, providing the semantics for composing pieces, while the keys provide the implicit groupings.)

The demo key binding is organized thusly: a key for each of the operation, the object, direction, and any (non-default) modifiers. It is vim-like in order: (1) choose a verb first (or not, to take my default move command). (2) hit a key to set modifier backward, forward, or selection (aka expand-region), h, l, and a, respectively, each of which also serves as a prefix into the object selection keymap. (3) optionally select any extra modifiers that share the object keymap. The object selection keymap stays active when modifiers are chosen, and only exits once an object is chosen. (4) finally, select an object. The object keys also trigger execution, and thus are the final key in the sequence. Since I have so many more commands and objects than in vim, I do use a prefix key for commands4. Eg. sc for change, sj for join, sd for delete, ss for slurp, etc🥨. I thought about having a prefix map for modifiers, but I decided to mostly mix them in with the command and object maps. I can stand some longer bindings, but it does start to get ridiculous.

Some quick examples: forward word is lw, backward word is hw. Forward character is lc. Backward to the end of a word is hnw, just sticking the n-for-end modifier right before the relevant object.

4: So... turning w into 5 keys is a bit of an exaggeration, but for operations that exist in vim, my demo config does tend to increase keys by, say, 30-400%. Quadruple key length is an outlier, though! w only turns into lw, and 3dw becomes 3sdlw if you use the command map, though d is also left on the base layer so you can also just do 3dlw, so it can be 4 or 5 instead of 3 keys. Open line forward can be, instead of o, soll, 4x the number of keys. But with the limited number of keys on keyboards, you just can’t fit many things on single keys. You can fit a lot within 2-3 key presses, but without composability, you can’t learn and remember very many of them. Keep reading to the end; I promise there are benefits to this key explosion madness.

🥨: Why s? Well, actually, on my keymap I went with c, for command, since I didn’t keep c on the base layout. But since I wanted to leave c and d on the base layout for the demo, to not mess with people’s vim-based expectations TOO much, I looked for some other convenient key. By happenstance I have s bound in my personal config to a prefix map of some accreted stuff that I should probably rethink and organize better somehow, but for the demo it was free! The s key is pretty useless in vim, anyway. Not everything in the demo config was judiciously thought through. Much like the default vi and vim keymap! (Ba dum tsss!)

Waxing verbose🔗

The main criticism I foresee for this system is that it has long bindings for commands. However, you can still take a few of your most used commands or objects outside of the prefix maps to get a few short bindings. However however, longer bindings are usually not a big issue. Operations that you use frequently feel a lot like typing a common short word, which is fast, and when you need to repeat operations you can reach for single-key motion repetition, single-key command repetition, or keyboard macros. I think many people are more used to multi-stage commands than they realize. In default Emacs, people use modifier keys, control+<key>, meta+<key>, etc, and in vim many keys are capitalized, requiring the shift key. If you use a modifier key, that is already two keys, and arguably longer than two individual keys in series, because you need to hold the modifier down until after you release the other key. But more importantly, aside from extremely frequent commands, I think that command fluency is much more important than command length, at least within reason.

The big thing is that with this system, it is easy to remember bindings and reach for them without looking anything up. “I use open line a lot, with soll to open line forward, and it puts my cursor at the proper indentation. I want to do the same thing going to the right place to insert a sibling expression. That must be sols because I want to open a new s-expression instead of a line. Or solt for treesitter. Or solo when I’m writing an org-mode document. Or solx for xml...” “I frequently go up to the parent s-expression with hus, or hut with treesitter, but I’m using a file without treesitter support. But the indentation shows some of the structure, and I know there is an indentation tree object bound to e. So I should use hue.” Frequent operations still feel effortless even if they take 1-2 more keys (they are still short words), but many more operations can achieve this effortless status, and many more operations are available at your fingertips if you stop and think for a moment to replace one or two letters. Mashing a particular movement key (eg. mash forward word until you get where you want to go, instead of using more semantic movements or search) becomes a slightly longer initial sequence followed by mashing the “repeat movement” key.

In the demo configuration, the common vim operations of delete, change, and yank, are left on d, c, and y. In my personal configuration, I’m leaning harder into composing longer sequences, to see how it goes. If there are operations that I find tedious with longer prefixes, I will move those out to shorter bindings. For example, I’ve already put y back on the base map🄯! But I don’t think I will move c or d back.

🄯: With vim, the change, delete, and paste operators all also copy the deleted thing. So you either need to fuss with registers or re-copy things if you don’t want your copied text to be overwritten. So with vim I end up using y a bunch with those others. In my composiphrase config I’ve made it so they have different default registers to copy into, so they don’t overwrite each other by default, but then I often want to explicitly copy before deleting, for example. I could solve this using registers, but as you might have guessed at this point, I somehow just haven’t worked register usage into my habits... So anyway, yeah, I use y a lot when moving code around, and in conjunction with the other operators.

Remix🔗

We can also reconsider the order of things. Kakoune is famous for flipping vim’s order on its head – movement first, then operation. If execution is not bound to a specific part of the grammar, the user (or binding package author) can choose what part to put where. Thus, the same editor can support multiple language grammars. You could imagine highlighting a preview of the movement for the current sentence built up and changed as you go along until you commit to a specific action. (I haven’t implemented that, and am unlikely to, but you can imagine it!) Or just imagine that each movement also activates a visual region, like Kakoune does, which you could do with even less difficulty.

I am used to vim-like movement, always moving to the beginning of things, or explicitly moving to the end when desired. You could also set this up using emacs-style motion – normal emacs motions move to the end of an object when going forward, and to the beginning when moving backward. You could set the default for the position modifier to be emacs-style instead of beginning. For trees, I’ve written forward and backward motions to respect tree structure, and only move between siblings. But you could also write it to ignore that, or change whether the default is to respect or ignore tree structure.

We might reduce bindings by having more of them be context-aware. Eg. maybe we should just have one tree binding t, and not have s for smartparens s-expression, x for xml, e for indentation-tree, o for outline/org-mode, etc. I’ve thought about that, but also I like having multiple. I once arranged my key bindings so that the bindings I made for s-expression movement and operations were re-purposed inside org-mode to operate on heading trees. But then I found that I wanted to use s-expression operations sometimes inside org-mode, and I wanted to have outline-minor-mode bindings available for comments in source code. And I want some kinds of useful bindings to work when there is some kind of code (probably from a different language) inside a string literal, or similar cases of language nesting.🧀

🧀: Also, as there are currently many holes in what is actually implemented, sometimes something useful works when using a different text object view of the same data. Eg. I have slurp/barf implemented via smartparens for s and outline trees with o, but not for treesitter with t. But s is not clearly superior to t when editing lisp, for example, because all of the movement and selection operations are faster with t, because treesitter caches while smartparens does a bunch of analysis for each operation. Additionally, none of these tools are fully bulletproof, and they all work better or worse in different situations. Treesitter is cool, but really wants more language-specific massaging to get operations to work nicely. Smartparens works great for lisps, and also is useful in other languages, and generally whenever there are matched delimiters like parentheses. But it also gets confused, especially in wonky situations where you need to use parenthesis characters in contexts where they are in strings or character literals that smartparens doesn’t understand. Indentation trees are fairly low-fi ways of looking at semantic tree information. But if things are properly indented, they typically “just work”, even if treesitter and the LSP get borked, making it easy to eg. move to the next definition, the next statement after a big if or while statement, etc.

While there are many ways to put together the burrito, in terms of word ordering, whether to make keys context-dependent, choosing defaults for brevity, etc, importantly all of the ingredients are the same no matter what. All text editors are bottlenecked on implementation bandwidth for features. Many new text editors are remixing things in interesting ways, but throwing out the implementation of features with the bathwater. With a system like Composiphrase, you can do a lot more remixing without having to re-implement so much infrastructure.

Implemented in a Limited Time Only!🔗

I wrote a bunch of code to build this system and some useful movements and operations in it. Some of those pieces are reasonably respectable, but many are very slapdash. Eg. I wrote a library for indentation-tree movement, a library wrapping smartparens to respect tree boundaries and move explicitly to the beginning/end of s-expressions whether going forward/backward, a library for using org-mode and outline-minor-mode with this system, a library for explicit beginning/end movement and selection with various text objects that emacs already supports, a library for modal editing, a library for command recording and repeating, ... But I was in a hurry because I wanted to use these things, and have something to write a blog post about, within a limited time, and I wrote very few tests for these libraries. There were just reams of features and wrappers, etc, that I was churning out to fill out a large and useful subset of the command space of the demo key bindings. There are probably bugs℗ and limitations.

℗: For example, I wrote a library for recording commands, to implement the command repeat key. It was working, and I was happy with it, ... until I realized that I wasn’t thinking about the recursive editing feature in Emacs. So command repetition is fundamentally broken when used with commands like isearch that perform recursive editing... I need to re-think the approach, but probably not right now. So the demo has a slapdash implementation of command repetition...

I haven’t even implemented all of the operations mentioned in this post. For example, open line works, and open s-expression works and is convenient, but I haven’t written “open tree-sitter sibling forward/backward”, and it might be something that needs some finessing for each language. The interface is composable, but much of the implementation is not, unfortunately🪛. I have written a quick and dirty generic treesitter movement framework, which allows moving between siblings, selecting nodes, etc, that works well for s-expressions, and, meh, it’s sort of ok for javascript, and progressively worse for languages whose keywords I have not added to its configuration. But it would be nice to have a tree-sitter setup for each language used, so that operations on t work well in each language. But I’m just one man, trying to write all of this in a hurry during a break between jobs with a new baby. I only have so much time. Writing and using this framework has made me want to use many new operations, but I can’t write them all within a few weeks. So if you were to use this system, you might occasionally write a perfectly cromulent command sentence, but get a message that no match is found to execute it.

🪛: Some pieces are composable. For example, the delete, change, and yank operations a la vim simply compose with a movement or selection. Transposition can typically be implemented as just getting the region of a text object, finding the next one, getting its region, and swapping. So those are pretty solid. And I wrote a library that provides some generic tree operations, given a set of core operations like “move to parent”, “move to sibling”, etc. But the specifics of core tree movement for different trees, slurping, promotion, joining, opening, etc, are more detailed in implementation despite being conceptually similar.

I wanted to implement this idea of more composable editing that I’ve been sitting on for years, and use it, and also communicate it with people. But I haven’t had time to really write good software with this, and chances are I’ll get busy with other things and not have time to keep pushing on this. So... I’m releasing these packages as “demo-ware”. They are good enough that I use them for my daily driver editing. But they are not really in a state where people can contribute reasonably (eg. because there are few-to-no tests), and there are a lot of sharp edges.

You can find a demo configuration to try, along with some basic documentation about it, here For all that I’ve talked up its limitations, it’s a pretty good demo. Many commands are missing, sure, but many are there. If you use it for a while, you will reach for commands whose existence is implied by other commands you’ve learned, and they will just work much of the time. If nothing else, click on the link to look over the full list of (over 20) verbs, (over 30) objects, and (over 10) modifiers implemented (or imagined) so far.

Demonstrations!🔗

So here are some highlights (and lowlights!) to demonstrate:

Of course you can use stlw to transpose a word forward. Or stll to transpose a line forward, or stho to transpose an org-mode heading backward. But how about sthut to transpose up a tree? (The h here is actually meaningless, the matcher for transpose up ignores forward/backward direction, so maybe this key binding scheme is... not perfect. But you need some key there to get into the object map...) Transpose-up transforms this:

(while (predicate-a)
  (when (predicate-b)
    (do-that-thing)))

into this:

(when (predicate-b)
  (while (predicate-a)
    (do-that-thing)))

Ok, that’s a bit of a lie. I wrote transpose-up, but that “after” result is also after re-formatting. My transpose-up implementation is generic, but I need to add some transformation hooks to fix things up... This transformation is not quite the same as sp-convolute-sexp, but related. I never bound sp-convolute-sexp in my old key bindings, unsure where to put it, and people often seem confused about when they would use it. This transpose-up feature is a little less fancy, but maybe if its binding is a clue to what it does, people will be more apt to try it? Maybe.

Using smartparens (or paredit), two of my favorite operations are slurp and barf. Lisp users are spoiled with editing goodness, its homoiconicity🔨 is not only awesome for macros, but also makes text editing operations easier and more flexible. Treesitter improves the situation for non-lisps, but still doesn’t close the gap. But you know one of my least favorite languages to edit? Python. I really do prefer having brackets and having the computer automatically indent. It is much easier to move code around with brackets than without. Using Python I have to do a lot more manual moving between indentation levels. Ok, all of that complaining aside, wouldn’t it be nice to have slurp/barf available to move statements in or out of eg. an if or while block in Python? Say we want to transform this:

if (foo()): # with the cursor on the colon
  bar()
baz()

Into this:

if (foo()):
  bar()
  baz()

🔨: Or alternatively described as Bicameral syntax. That’s a pretty good blog post, and I agree with most of it. I prefer a scannerless reader, though, and have some different opinions on abstract syntax that can be viewed as different concrete trees.

So instead of ssls to slurp forward an s-expression, use sslt to slurp with treesitter! Oh, well, erm, except that I haven’t implemented slurping with treesitter, either a generic effort or anything specific for any language. But hey, I did write this library for indentation trees! So let’s go, ssle to slurp based on indentation! Oh, bother, I haven’t implemented that operation yet, either, it was lower on my to-do list than writing a library for repeating commands that would work well with composiphrase. Well, you get the idea. It will work once I write it. The key bindings are ready. So I guess I’ll just use le to move from the if line to the baz line and then indent.

Joining is another operation. I’ve used a key binding to join line forward for a long time, but it would be nice to leverage the knowledge of that binding to join other things. So instead of sjll, I can just swap in sjls to join neighboring s-expressions. Nice! Now let’s do xml! sjlx! Oops, I’ve been meaning to add XML to this system for working with web stuff, but of course that’s yet another thing that I haven’t added yet. It should be a quick wrapper using the emacs nxml library with some support from the tree-walk file that I wrote in the composiphrase-objects library, which given some movement operations provides region expansion, tree-respecting transposition, and in-order tree traversalabc.

abc: I’m waiting to find a time when in-order traversal during editing will be useful... I don’t know quite why I decided I needed to write a library to provide generic in-order tree traversal, but it was exciting to load it up with indent-tree movements and see in-order indent-tree traversal, which is equivalent to... going down one line at a time... But this one is a fancy move down one line at a time function that is slower than it needs to be! Also it skips blank lines. Worth it. Not a waste of time to write, at all. At least in-order traversal of s-expressions is moderately interesting, if not apparently useful.

Open line is another classic vim operation. You can open line forward, which inserts a newline after the current line, and leaves you in insert state to write the line. The benefit of this, vs moving to the end of the line and inserting a newline, is mostly that you capture the movement for the repeat command (which my Composiphrase demo supports). I think for other text objects it can be even more helpful. For example, with the demo config solo opens a sibling org-mode heading. So you get to skip inserting all of those stars. But also, composiphrase opens the possibility of more composition. soldo adds a key for “down”, and now it opens a new child heading. soluo adds a key for “up”, and now it opens the next sibling of the parent heading, or add a 2 somewhere before the final o and it opens the sibling of the grandparent heading. Replace the last o with s and you’re opening siblings, parents, and children for s-expressions. Oh, except I only implemented up/down modifiers for open for org heading trees so far, I only implemented sibling opening for s-expressions.

Moving on, it’s really easy to take a command that works on a region and plug it in to the system so that it either operates on the current region or defers to the underlying movement command to delimit a region. That’s how delete, change, copy, upcase, and several more work. But emacs has loads of do-X-on-region commands. Let’s pretend I’ve bound sm to do morse-region as the verb. Then with a buffer like Compositional |editing is awesome! with cursor at the pipe character, we can type sm2ew to get Compositional ./-../../-/../-./--. ../... awesome!. Ok, well, that’s... maybe not very useful. So I didn’t actually bind that command in. But if you can think of an operation on a region that would be useful to put in the composable language, it’s pretty easy!

This is generally another possible argument against this system: how many operations do you really need, and how often are different compositions useful? Eg. join line is useful, I use it all the time. Join word... seems less so. I haven’t even bothered implementing join operations for non-tree objects besides line. Join s-expression is useful sometimes. But with less enlightened language grammars, are there many opportunities where sjlt, join forward treesitter element, is well defined or useful? So maybe Vim has it right with ~5 composable operations, and it just needs a generic tree object with treesitter support. Maybe the Helix editor is doing the right thing, since that’s basically what it has. But I really want to be able to view the same file as being an indentation tree, a smartparens tree, a treesitter tree, and more, all at the same time, and I want to have a sub-map of infrequent text objects so I can do things like va<some-prefix-within-the-object-map>u to select a url and va<prefix>m to select an email address, etc, even if there aren’t other useful compositional operations on URLs, email addresses, phone numbers, etc, that you can write a thing-at-point detector for. And I want to have slurp and barf and tree movements for each kind of tree, and I want useful fallback text objects to use when my LSP crashes or treesitter has a bug that won’t work for the specific file I’m using. And I want transpose to work on everything. And I want to keep the door open for more compositions if I can think of more operations in the future.

Some may think this is all too much for too little benefit over Vim, that most of these extra operations don’t matter. But... as for myself, I’m loving it, having switched to using it full-time at an early stage in writing it🛃. My system has lots of holes where I haven’t implemented operations, but it also has a lot of useful things filled in. I think the holes in my system are just evidence of how much low hanging fruit there is in developer tooling🪚. I’ve had fun looking at what new operations are implied by the binding scheme, and gradually filling some of them in. I hope I’ve communicated the idea well, and maybe convinced some people that it’s a good idea. I would like to hear people’s opinions, especially anyone brave enough to kick the tires.

🛃: My configuration is not quite the same as the demo, but the demo is a simplified version of what I’m using with a bit less weirdness, that probably makes more sense to Vim and Qwerty keyboard layout users.

🪚: Out of necessity, and a million reasons, we tend to just get used to using bad tools. I have so many ideas for tool and workflow improvements, but I rarely have time to build them. I’m always too busy doing “real work”.