XEmacs: Sorting Key-Value Lines by Value

XEmacs 21.4.24 [direct ftp download] and the latest stable release (2015) is the version I’m personally using. The directions here may well apply to GNU Emacs as well; I don’t know.

Most Emacs users are familiar with the command M-x sort-lines which alphabetically sorts the lines highlighted in the current buffer.

However, I had the wish to sort key: values as follows, by the value.

foo: 12
baz: 7
bar: 2

As you can see, in this instance, the value is numeric and lexicographical sorting of numbers results in

foo: 12
bar: 2
baz: 7

that is, alphabetical and not numeric.

In order to “fix this” we have to delve into the sorting internals of XEmacs.

First, let’s look at the function sort-lines, which is fairly straight forward.1

(defun sort-lines (reverse beg end)
  ;; [documentation string elided]
  (interactive "P\nr")
  (save-excursion
    (save-restriction
      (narrow-to-region beg end)
      (goto-char (point-min))
      (sort-subr reverse 'forward-line 'end-of-line))))

The most important thing to note here is the use of the function sort-subr. The rest of the code is simply boiler plate to limit the sort to the highlighted region. The (interactive "P\nr") is to read the start and end of the region into the argument values beg and end. For our purposes, we can simply treat this as “magic”; the effect of the newline embedded in the string is to separate the reverse (read with "P") from the region (read with "r") and that means people can sort in reverse alphabetical order with C-u M-x sort-lines. The code we’ll develop in here does not have this feature; adding it can be considered an exercise for the reader.

Alright, now we know how sort-lines works, and we know that we can use sort-subr to sort by values, the question is how do we do it?

First, and obviously, we read the documentation string with C-h f sort-subr. This tells us that there are two variables we can make use of,1

STARTKEYFUN moves from the start of the record to the start of the key. It may return either a non-nil value to be used as the key, or else the key is the substring between the values of point after STARTKEYFUN and ENDKEYFUN are called. If STARTKEYFUN is nil, the key starts at the beginning of the record.

and

COMPAREFUN compares the two keys. It is called with two strings and should return true if the first is “less” than the second, just as for `sort’. If nil or omitted, the default function accepts keys that are numbers (compared numerically) or strings (compared lexicographically).

The first thing we note here is the startkeyfun. It’ll allow us to limit the sort comparison to the value part of the lines. The trick here is to just move the point past the : (colon). We can do that with search-forward. Since in my case, and the example here, all the lines do have a colon, we’ll not consider the case where it might be missing in the line, hence we don’t impose any limit on the search (nil) nor do we care about errors (we allow them with nil); however, we limit the count to exactly one.

That leaves us with a call that looks like

  (search-forward ":" nil nil 1)

and to apply that to sort-subr we define an entirely new function, my-sort-key-value-lines. We wrap search-forward in a lambda for simplicity. Notice that we return nil explicitly from the lambda, because otherwise sort-subr will use its return value (the location of point in the buffer after the colon) and sort from that.2

(defun my-sort-key-value-lines (beg end)
  (interactive "r")
  (save-excursion
    (save-restriction
      (narrow-to-region beg end)
      (goto-char (point-min))
      (sort-subr nil 'forward-line 'end-of-line
                 (lambda ()
                   (search-forward ":" nil nil 1)
                   ;; returns point, so we explicitly return
                   nil)))))

And this is the code that results in lexicographical sorting of the values, but we want numeric sorting. There are at least two ways to fix that.

First, we can use the comparison function, to compare both arguments as numbers. We just have to convert the arguments to integers, and then compare them.2

(defun my-sort-key-value-lines (beg end)
  (interactive "r")
  (save-excursion
    (save-restriction
      (narrow-to-region beg end)
      (goto-char (point-min))
      (sort-subr nil 'forward-line 'end-of-line
		 (lambda ()
		   (search-forward ":" nil nil 1)
		   nil)
		 nil
		 (lambda (a b)
		   (< (string-to-number a)
		      (string-to-number b)))))))

In the code above we do that in the second lambda. The nil between them is the end-of-key function, which we don’t need to define because it’s the same as the end-of-record (represented by 'end-of-line in the above code).

The simpler method, is to do the conversion in the previous lambda, and use the default comparison function. Which results in the third revision.2

(defun my-sort-key-value-lines (beg end)
  (interactive "r")
  (save-excursion
    (save-restriction
      (narrow-to-region beg end)
      (goto-char (point-min))
      (sort-subr nil 'forward-line 'end-of-line
                 (lambda ()
                   (search-forward ":" nil nil 1)
                   (string-to-number (buffer-substring (point) (point-at-eol))))))))

You can now just drop this into your ~/.xemacs/init.el and use the command M-x my-sort-key-value-lines to sort key: value lines, whenever you have the need. And this leaves us with the desired numerical sort order.

bar: 2
baz: 7
foo: 12

1 This code is GPL.

2 This code can be considered WTFPL 2.0; at least the parts inside the lambdas and the rest is just boilerplate.

One thought on “XEmacs: Sorting Key-Value Lines by Value

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: