Skip to content

Adding ripgrep and find Support to org-roam

Posted on:June 6, 2020

Despite having using Emacs for one and a half years, for fun and professionally, I haven’t had an impetus to advance my understanding of Elisp beyond scavenging others’ dotfiles. This changed two weeks ago with my first Elisp PR. I was chatting with a good friend and ex-colleague of mine, Jethro, about an Emacs package that he wrote called org-roam which exploded in popularity in the Emacs world. He talked about how much there was to do to maintain the project, and I figured it would be a good opportunity to help out and stop procrastinating on learning Elisp.

I won’t spend this post talking about what org-roam does. Rather, this post is a short commentary about the PR.

The link for the PR is here:

The aim of the PR is to add support for using shell commands when looking for org-roam files located recursively in a directory. In org-roam, this is accomplished with the org-roam--list-files function, which prior to this PR used a pure Elisp implementation, below:

(defun org-roam--list-files (dir)
  "Return all Org-roam files located within DIR, at any nesting level.
Ignores hidden files and directories."
  (let ((regex (concat "\\.\\(?:"(mapconcat #'regexp-quote org-roam-file-extensions "\\|" )"\\)\\(?:\\.gpg\\)?\\'"))
    (dolist (file (directory-files-recursively dir regex) result)
      (when (and (file-readable-p file) (org-roam--org-file-p file))
        (push file result)))))

Here, org-roam-file-extensions is typically a list like '(".org"), and what this function does is to first construct a regex that will match all files ending with .org or .org.gpg, and then call directory-file-recursively with that regex.

Since we want to delegate the file searching to a shell command, it would be prudent to allow the user to specify the tool used (or not, in which case we would fall back to the pure Elisp implementation). This is accomplished with a new user option variable called org-roam-list-files-commands:

(defcustom org-roam-list-files-commands '(find rg)
  "Commands that will be used to find Org-roam files.
It should be a list of symbols or cons cells representing any of the following
 supported file search methods.
The commands will be tried in order until an executable for a command is found.
The Elisp implementation is used if no command in the list is found.
    Use ripgrep as the file search method.
    Example command: rg /path/to/dir --files -g \"*.org\" -g \"*.org.gpg\"
    Use find as the file search method.
    Example command:
    find /path/to/dir -type f \( -name \"*.org\" -o -name \"*.org.gpg\" \)
By default, `executable-find' will be used to look up the path to the
executable. If a custom path is required, it can be specified together with the
method symbol as a cons cell. For example: '(find (rg . \"/path/to/rg\"))."
  :type '(set (const :tag "find" find)
              (const :tag "rg" rg)))

org-roam-list-files-commands is defined as a list of either symbols or cons cells, which will be evaluated in order. If it is an rg or a find symbol, then we will attempt to use the respective executables as found by executable-find. Otherwise, if the executable lives in a custom location, it can be specified with a cons cell whose car is the symbol and the cdr is an absolute path to the executable location, e.g. (find . "/path/to/find").

(defun org-roam--list-files (dir)
  "Return all Org-roam files located recursively within DIR.
Use external shell commands if defined in `org-roam-list-files-commands'."
  (let (path exe)
    (cl-dolist (cmd org-roam-list-files-commands)
      (pcase cmd
        (`(,e . ,path)
         (setq path (executable-find path)
               exe  (symbol-name e)))
        ((pred symbolp)
         (setq path (executable-find (symbol-name cmd))
               exe (symbol-name cmd)))
         (signal 'wrong-type-argument
                          `((consp symbolp)
      (when path (cl-return)))
    (if path
        (let ((fn (intern (concat "org-roam--list-files-" exe))))
          (unless (fboundp fn) (user-error "%s is not an implemented search method" fn))
          (funcall fn path dir))
      (org-roam--list-files-elisp dir))))

We then update the body of org-roam--list-files to iterate org-roam-list-files-commands using cl-dolist, pattern matching on each value with pcase. If the value matches a cons cell (“(,e . ,path)), we will use the path as specified in the cdr. Otherwise, if the value matches a symbol (pred symbolp), we will attempt to find the path of the executable with (executable-find (symbol-name cmd)). We exit early when a path has been found (when path (cl-return)). If a value is neither a cons cell nor a symbol, we will signal an error to the user using signal ‘wrong-type-argument`.

Once the path is found, we then use a little bit of magic to “reflect” on the method name with intern. If exe is rg, we will invoke org-roam--list-files-rg with the path and given directory using funcall. If exe is find instead, we will invoke org-roam--list-files-find. Note that because of this, adding support for a shell tool is as simple as adding a new org-roam--list-files-$SHELL_TOOL function, and specifying it in org-roam-list-files-commands.

If no suitable path is found, we fall back to the pure Elisp implementation, which is the first function in this post, renamed as org-roam--list-files-elisp.

The org-roam--list-files-rg and org-roam--list-files-find functions are given below. They are straightforward functions that construct the command strings.

(defun org-roam--list-files-rg (executable dir)
  "Return all Org-roam files located recursively within DIR, using ripgrep, provided as EXECUTABLE."
  (let* ((globs (org-roam--list-files-search-globs org-roam-file-extensions))
         (command (s-join " " `(,executable ,dir "--files"
                                            ,@(mapcar (lambda (glob) (concat "-g " glob)) globs)))))
    (org-roam--shell-command-files command)))

The full shell command used for rg is:

rg /path/to/dir --files -g "*.org" -g "*.org.gpg"
(defun org-roam--list-files-find (executable dir)
  "Return all Org-roam files located recursively within DIR, using find, provided as EXECUTABLE."
  (let* ((globs (org-roam--list-files-search-globs org-roam-file-extensions))
         (command (s-join " " `(,executable ,dir "-type f \\("
                                            ,(s-join " -o " (mapcar (lambda (glob) (concat "-name " glob)) globs)) "\\)"))))
    (org-roam--shell-command-files command)))

The full shell command used for find is:

find /path/to/dir -type f \( -name "*.org" -o -name "*.org.gpg" \)

On top of the bulk of the implementation above, there are a few utility functions as well:

(defun org-roam--list-files-search-globs (exts)
  "Given EXTS, return a list of search globs.
E.g. (\".org\") => (\"*.org\" \"*.org.gpg\")"
   (mapcar (lambda (ext) (s-wrap (concat "*." ext) "\"")) exts)
   (mapcar (lambda (ext) (s-wrap (concat "*." ext ".gpg") "\"")) exts)))

as well as:

(defun org-roam--shell-command-files (cmd)
  "Run CMD in the shell and return a list of files. If no files are found, an empty list is returned."
  (seq-filter #'s-present? (split-string (shell-command-to-string cmd) "\n")))


In the original PR, I did some rudimentary benchmarks, using Jethro’s public braindump.

(benchmark 1000 '(org-roam--list-files-rg "./jethrokuan/braindump"))
"Elapsed time: 9.012230s (0.399595s in 5 GCs)"
(benchmark 1000 '(org-roam--list-files-find "./jethrokuan/braindump"))
"Elapsed time: 5.543965s (0.318566s in 4 GCs)"
(benchmark 1000 '(org-roam--list-files-elisp "./jethrokuan/braindump"))
"Elapsed time: 55.781495s (3.220956s in 41 GCs)"

Since then, others have done more elaborate ones, which you can read about here.

It took me around three days to get the PR to a presentable state, thanks to awesome feedback from Jethro and progfolio. I’ve learned a lot of Elisp, and hope to continue learning more and contributing!