Epistolary Programs

Published Thursday, August 1, 2024

Literate Programming

In his book "Literate Programming", Donald Knuth introduced a programming paradigm of the same name. Literate programming highlights the importance of colocating natural and computer language in a manner unmatched by simple comments in source code. When writing a literate program, human and machine are simultaneously the target audience of the program.

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.
- Donald Knuth, "Literate Programming"

The paradigm has gained traction since its inception, evident through its widespread use in scientific computing¹. It becomes especially useful to employ when the size and scope of the program is managed effectively².

Solving for Personal Projects

For nearly all of my programming outside of work, I'm the sole collaborator, and my projects don't scale to an unmanageable amount of source files or lines of code. A problem that I do encounter is that when I revisit old projects, I'm left spending a non-negligible amount of time re-reading my own code wondering why I made certain architectural decisions³.

To fix that, I've made my own literate programming tool: one that helps me write a letter to my future self alongside my code. epistle is a tool for writing epistolary programs.

epistolary
(adjective): contained in or carried on by letters (Merriam-Webster)

Inspired by epistolary novels, which tell stories using letters ⁴, an epistolary program is that which is contained in a letter to the programmer's future self. I do a lot of writing in my Obsidian workflow, so epistle complements that by extracting code from the Obsidian Markdown I write.

In Practice

Epistle is a CLI program written in a little over 100 lines of Rust. It can be installed using Cargo with the following command.

cargo install epistle

Code Walkthrough

Epistle, itself, is an epistolary program. Let's walk through the code, and then we'll see a command we can use to generate machine-readable code for compilation.

First, let's create a Cargo.toml and add some project metadata. The most important things here are dependencies, and I've highlighted them below. clap is used for parsing command line arguments, markdown handles all of the Markdown parsing, and regex is used for parsing output file paths from fenced in code (more on that below).

[package]
name = "epistle"
description = "Writing letters to both human and machine"
authors = ["Gerald Nash (https://hivoltage.xyz)"]
version = "0.1.0"
edition = "2021"
documentation = "https://hivoltage.xyz/Essays/Epistolary-Programs"
homepage = "https://hivoltage.xyz/Essays/Epistolary-Programs"
license = "MIT"

[dependencies]
clap = "4.5.13"
markdown = "1.0.0-alpha.18"
regex = "1.10.5"

Since we're in a Rust project, let's also add a .gitignore to prevent checking in any build artifacts. Note that Rust projects that use Cargo almost always output to /target.

/target

That's pretty much all of our project configuration. Before we move onto our main.rs, I'll point out that the file's too large to have in one Markdown code block. epistle understands this and will concatenate all blocks that have the same output file path before writing to the file. The order of concatenation is the order in which each block appears in the Markdown source file.

We tell epistle that a code block should be output to a file using a file: attribute in the code block's metadata area.

```mylanguage file:path/to/my/file.ext
```

The above code block will be output to {output_directory}/path/to/my/file.ext, where {output_directory} can be defined at runtime using a CLI flag.

Let's get back to our main.rs. To start, we'll define our imports. Each of these will be used later.

use clap::{Arg, Command};
use markdown::{mdast::Node, to_mdast, Constructs, ParseOptions};
use regex::Regex;
use std::collections::HashMap;
use std::io::Write;
use std::path::Path;
use std::{fs, fs::File};

Now we'll define functions that'll be used in our program entrypoint. extract_file_path(1) accepts a string slice and, if it contains a pattern file:path, returns the path. It accepts a string slice and not an owned String because Regex::captures(1) doesn't need ownership of the value.

fn extract_file_path(input: &str) -> Option<String> {
    let re = Regex::new(r#"file:(?:"(.*?)"|(\S+))"#).expect("Couldn't create file path regex");
    re.captures(input).and_then(|caps| {
        caps.get(1)
            .or_else(|| caps.get(2))
            .map(|m| m.as_str().to_string())
    })
}

extract_output_files(2) traverses a Markdown AST in search of code blocks. When discovering a block, it extracts a file path using the above function. It then "upserts" into the extracted_files map the extracted file path (key) and its contents (value).

fn extract_output_files(node: &Node, extracted_files: &mut HashMap<String, String>) {
    match node {
        Node::Code(code_block) => {
            if let Some(meta) = &code_block.meta {
                if let Some(file_path) = extract_file_path(meta) {
                    if let Some(file_contents) = extracted_files.get_mut(&file_path) {
                        let additional_file_contents = ["\n", &code_block.value, "\n"].concat();
                        file_contents.push_str(&additional_file_contents);
                    } else {
                        extracted_files.insert(file_path, code_block.value.clone());
                    }
                }
            }
        }
        _ => {
            if let Some(children) = node.children() {
                for child in children {
                    extract_output_files(child, extracted_files);
                }
            }
        }
    }
}

merge_paths(2) merges the output directory path with the output file paths. Since merging a directory path with an absolute file path is hard to define behavior for, the function returns an error if such a thing happens. Note that calling to_string_lossy() below requires that both input paths only contain valid unicode.

fn merge_paths(dir: &String, file: &String) -> Result<String, String> {
    if Path::new(file).is_absolute() {
        return Err(String::from("file is absolute"));
    }
    Ok(Path::new(dir).join(file).to_string_lossy().into_owned())
}

And now the main event. Let's define our program entrypoint. To start, we define our CLI input flags. Clap's ergonomic API makes these self-explanatory.

fn main() {
	let matches = Command::new("epistle")
        .arg(
            Arg::new("input_file")
                .short('i')
                .long("input_file")
                .value_name("FILE")
                .required(true)
                .help("Input Markdown file path"),
        )
        .arg(
            Arg::new("output_dir")
                .short('o')
                .long("output_dir")
                .value_name("DIR")
                .required(true)
                .help("Output project directory path"),
        )
        .get_matches();

Then, we actually parse them from the program's argv. We use expect(1) here because we're in the top level function and are okay with panicking. expect(1) instead of unwrap() because we want to provide a helpful error message on panic.

let input_file = matches
	.get_one::<String>("input_file")
	.expect("Couldn't find input file in CLI args");
let output_dir = matches
	.get_one::<String>("output_dir")
	.expect("Couldn't find output directory in CLI args");

Now let's load the input file into memory. read_to_string(1) is more concise than going with the manual File::open approach.

let markdown_content = fs::read_to_string(input_file).expect("Failed to read input file");

Now that we have everything that we need from the environment, we can get to parsing and extracting. As a matter of fact, we'll do it in one go.

let options = ParseOptions {
        constructs: Constructs {
            code_fenced: true,
            ..Constructs::default()
        },
        ..Default::default()
    };
let ast = to_mdast(&markdown_content, &options).expect("Failed to parse Markdown");

let mut extracted_files: HashMap<String, String> = HashMap::new();
extract_output_files(&ast, &mut extracted_files);

And to wrap up, we iterate through our previously extracted files and write them to the file system.

create_dir_all(1) is really handy here, as it allows us to create nested parent directories in one line. If dir/sub1/sub2/sub3/file.ext is what we want to write, but only dir exists with no child directories, lines 3-5 will create sub1, sub2, and sub3 for us.

Note that File::create(1) will either create or open the file at the specified path, so we don't have to worry about the existing output file being deleted. It will just be overwritten.

	for (file_path, file_contents) in extracted_files {
        if let Ok(ultimate_file_path) = merge_paths(output_dir, &file_path) {
            if let Some(parent_dir) = Path::new(&ultimate_file_path).parent() {
                fs::create_dir_all(parent_dir).expect("Couldn't create file parent directories");
            }

            let mut file = File::create(ultimate_file_path).expect("Couldn't create or open file");
            file.write_all(file_contents.as_bytes())
                .expect("Couldn't write to file");
        }
    }
}

Generating and Using the Program

As mentioned above, epistle is self-hosting and generates itself from the source file for this essay. I ran the following command to generate the files in the epistle GitHub repo.

epistle -i "Essays/Epistolary Programs.md" -o ~/projects/epistle

Let's add a README for completeness.

# epistle

A tool for writing [epistolary programs](https://hivoltage.xyz/Essays/Epistolary-Programs).

Copyright (c) Gerald Nash

Conclusion

One pain point that's true of almost all literate programming workflows that is also true for epistle is that it's difficult to map errors back to their source lines, as epistle doesn't keep track of the line mappings. It's not a blocker, since epistle designed for smaller programs that are less likely to have such errors, but it's a worthwhile feature to add for future scale.

Ultimately, literate programming is a worthwhile paradigm to employ for small scale personal projects like many of my own, and epistle helps me write literate programs by making my programming experience feel more like writing letters to my future self in addition to the machine. This saves me lots of time when looking back at past projects, since I'll have plenty of English available to make sense of the code whose context has left my memory.

1. ↩︎

Jupyter notebooks embody the literate programming paradigm and dominate research-focus industries including, but not limited to, AI / ML, bioinformatics, and mathematics.

2. ↩︎

Literate programs have been known to be difficult to collaborate on. They have weak tooling support (can you name a lit. program editor with LSP support?) and can be cumbersome to manage in highly collaborative projects (developers must carefully consider how their VCS workflow works with their lit. programming workflow).

3. ↩︎

You may be thinking "that's what comments are for: to explain the 'why' of programming decisions". First, remember that almost all of us think of code first and comments second (if ever). Second, see this StackExchange thread that discusses the matter. Ultimately, if reasonably commenting code in source files works for you (it does for me too in highly collaborative environments), then that's great. For the projects I'm referring to here, I want a code editing environment that encourages me to write natural language more proactively.

4. ↩︎

Bram Stoker's Dracula is an example epistolary novel.