Literate Programming
Literate Programming is a
documentation first programming style pioneered by
Donald Knuth. It’s
But this isn’t a post about my hacked together attempts at literate programming — that won’t be very interesting. This is more of a small research gathering exercise on the different literate programming tools and workflows.
Ideal
In my opinion, literate programming is still an incomplete paradigm. The biggest drawback comes in the form of programming resistance. This resistance comes mostly from the fact that a problem space is unknown ahead of time and so, as the story goes — the implementation becomes the specification.
That’s the norm, but ideally anyone should be able to jump into a source tree,
make changes, and reconcile those changes later on correctly with some sort of
generated RFC
(Request For
Comments) document style is a nice example of “for the human” language. Here’s
the RFC
for the
Atom Syndication Format as an
example.
-
The workflow should facilitate calculating and visualizing sources of documentation drift. Difference in the specification (documentation) and implementation (source code) should be easy to reconcile through varied means. This could be by the seat of your pants, automated diffing and fuzzing algorithms between the source and doc, or from signals that inform specificity and constraints. Test driven development and type systems are useful signals but are exposed as low level source code rather than high level human readable constraints.
-
The initial documentation entry point should not matter. Edit source files directly or enter top down through an overarching literate system. If documentation drift is easy to identify then there is flexibility to experiment “freely” with the implementation. Primitives could be at a singular source file level and ideally any source file can act as an entry point for jump starting a code repository’s documentation in part, or in whole.
-
Literate workflows should facilitate producing documents in whole or in part as discrete outputs based on section, topic, idea, or source file. This can be in any output format such as
PDF
(Portable Document Format),HTML
(HyperText Markup Language), orMarkdown
. This is important because sometimes only select portions of a program are worth documenting or reading. -
Ideally multiple formats should be available as inputs and outputs. Markdown is popular but there are a range of other markup document languages that offer different advantages. reStructuredText, AsciiDoc, and Troff/Groff are a few. The shortcut approach is to leverage Pandoc, a universal document converter.
There are nigh infinite literate programming tools and workflows. I’ve tried some of them, but not in any meaningful way to write about at great length.
Literate Programming Tools and Programs
The literate programming workflow is based on a simple tangle and weave process. The literate source file contains splices of code chunks with their accompanying explanations. The code chunks are tangled to construct a complete source file or executable. The explanations around the chunks are weaved to create a document that fully explains the program.
Web [pdf
]: Web is
Donald Knuth’s system for literate programming. Tangling and weaving macros
process different parts of a web
file to produce multiple outputs. There’s
cweb,
nuweb, and
noweb.
Documented LaTeX Files
[pdf
]: LaTeX Packages from
CTAN
(Comprehensive TeX Archive Network) achieve
a literate style by weaving and tangling documentation and code with the doc
and docstrip
packages. Package code is mixed in with commented typesetting
inside a documented LaTeX file (dtx
).
Babel: Babel converts Emacs’ Org Mode into a powerful workflow more suitable for literate programming. Babel is popular in some circles and has numerous use cases including developer operations and general system crafting and architecture.
Tsodings’ lit: A simple literate
implementation based on the
Literate Haskell approach. The
literate program reads a document written in LaTeX and comments out every line
not contained in a \code{}
section block. The output is an executable source
file and the literate document takes advantage of LaTeX’s output formats.
Zyedidia’s Literate: A
literate system based on Markdown. The literate document is written inside
.lit
files and outputs to HTML
and CSS
. Since it outputs directly to
HTML
, there’s a lot of leverage in refining and adapting the output.
Jupyter Notebook: A
more popular kind of simple literate programming in the form of literate
“computation”. These are interactive GUI
(Graphical User Interface) documents
for explaining and running code from a single view.
lit.sh: A literate programming preprocessor in pure shell because why not?
Conclusion
Literate programming is difficult and hasn’t caught on because in the majority of cases, programs are built with the specification made up along the way. Put a customer in the mix and your specification changes randomly. This is the status quo because often the range of inputs are unknown along with a full scope of the problem space.
Quantifying documentation drift and adding flexibility for limited but accurate
documentation could assist in making literate programming much easier. Testing
and type systems are becoming much more robust — perhaps Donald Knuth was just
a tad bit too early for literate programming to fully catch on? My hunch is that
if someone or a company figures this out, it will be super simple to use and
would look like git
but for documentation.
In terms of popularity, the most used literate programming tools seem to be Babel and Noweb.