How to write pandoc plugins in Nodejs
Pandoc is a great tool to transform text formats yinto other text formats. I use it to create PDF / HTML / epub / mobi versions out of my book from my Markdown files.
It comes with plugins (pandoc filters) that allow transforming the AST (abstract syntax tree) of your text files, so you can build your own syntax and make it do pretty much anything. The idea is similar to other AST transformation frameworks that you might know, such as jscodeshift or remark. While pandoc and its plugins are mainly written in Haskell, you can also write them in JavaScript using Node.js.
We will write and test a plugin that allows including source code from your file
system in code blocks. The full-featured plugin is available on
GitHub as pandoc-code-file-filter
.
Development
A pandoc plugin is a binary file that can be passed as an option to running the pandoc command:
pandoc <args> --filter path/to/pandoc-filter-binary
Which means we will start by setting up an NPM
project and creating a
bin/filter.js
file with the #!/usr/bin/env node
preamble. We need to install
pandoc-filter-promisified
which includes the pandoc bindings.
Our binary file uses this library and applies an action
function to whatever
pandoc sends to our filter.
#!/usr/bin/env node
const pandoc = require('pandoc-filter-promisified')
const action = require('../src/index.js')
pandoc.stdio(action)
If you want to publish your pandoc-filter through NPM, you need to reference the binary file in the package.json as
"bin": { "pandoc-code-file-filter": "bin/filter.js"}
The main part of our logic will be implemented in our src/index.js
file. The
entry point of a pandoc plugin is an action
file. It is passed each block of
the AST and some meta information about the conversion happening.
const fs = require('fs')
const path = require('path')
const pandoc = require('pandoc-filter-promisified')
const { CodeBlock } = pandoc
async function action(elt, pandocOutputFormat, meta) {
if (elt.t === `CodeBlock`) {
// console.warn(JSON.stringify(elt, null, 4));
const [headers, content] = elt.c
const includePath = getIncludeHeader(headers)
// it's a normal code block, no need to do anything
if (!includePath) return
// filter out the include value if another filter processes this code block
const newHeaders = filterOutOwnHeaders(headers)
let newContent = replaceWithFile(include)
return CodeBlock(newHeaders, newContent)
}
}
module.exports = action
The elt
object describes a node of the syntax tree. Its type can be checked
with the t
key and its header and content can be read by accessing the c
key. Admittedly, these names and keys look like someone who spent too much time
in Haskell came up with them. Also, there is no official documentation for these
types. I found it easiest to check the source code of
pandoc-filter-node
and log the results.
Here, we check if the type is a CodeBlock
and check if an include
is
specified in the headers.
function filterOutOwnHeaders(headers) {
const [_, classes, keyValuePairs] = headers
const newKeyValuePairs = keyValuePairs.filter(
([key, value]) => key !== `include`
)
const newHeaders = [headers[0], headers[1], newKeyValuePairs]
return newHeaders
}
function getIncludeHeader(headers) {
const [_, classes, keyValuePairs] = headers
const keyValuePair = keyValuePairs.find(([key, value]) => key === `include`)
if (!keyValuePair) return false
return keyValuePair.value
}
This would match the following code block in Markdown:
```{include=test.js}
```
After extracting the include
information, we can read the file using Node.js and replace the CodeBlock
’s content with the file content.
function replaceWithFile(include) {
if (!fs.existsSync(include))
throw new Error(
`pandoc-code-file-filter: File does not exist: "${path.resolve(include)}"`
)
const fileContent = fs.readFileSync(include, 'utf8')
return fileContent
}
Replacing the AST is done by simply returning a new node in the action
function.
We call the CodeBlock
constructor with the new headers (old headers minus our include
header) and the new file content.
Testing
To test if our filter works, we can write some jest tests.
Our example Markdown file will have the following contents:
``` {.javascript include=./test/examples/example.js}
Replace me
```
The example.js
file with the content we want to include is just a normal JS file.
We can now run pandoc using our filter on the Markdown file and verify pandoc’s output using jest’s snapshot testing.
Create the test in test/index.test.js
:
const { execSync } = require("child_process");
function execPandocOnFile(fileName) {
const stdout = execSync(
`pandoc -s -t markdown test/examples/${fileName} --filter bin/filter.js`
);
return String(stdout);
}
test("replaces code block content with file content", () => {
const output = execPandocOnFile(`example.md`);
expect(output).toMatchSnapshot();
});
Here, I ran pandoc to create another Markdown file as the output. The new Markdown output is saved as a snapshot test, which makes it a lot easier as a human to verify than the AST.
You can now publish the pandoc filter on NPM and people can install it with npm -g your-pandoc-filter
.
And that’s all you need to know to get started with developing your own pandoc plugins. ✨