Cover Image Credit: RFC: Designing Logo of Pandoc

Introduction

Most developers have probably wondered or even asked the question at some point, "What exactly is Pandoc?" or "What do people use Pandoc for?"

I certainly did! So, I have written a short article to help answer that question. Much of the information in this article is sourced from a gist I had put together over the course of several months.



Defined as a universal document converter, Pandoc is an open-source software program for file conversion. Pandoc is not a markup language, nor is it a CLI, although it is used from one. 🙃

If you need to convert files from any conceivable format to another, Pandoc is your best friend. It has become popular across multiple industry and technology sectors. 

The value of Pandoc shines brightest when used to transform file types like Markdown, Microsoft Word (.docx), and XML into more user-friendly documents and markup languages, including PDF and HTML.


Installing Pandoc

I will assume at this point, if you are still reading, you likely want to try Pandoc out for yourself.

Listed below are the steps I took to install and configure Pandoc on Windows (10 Pro Edition) and Linux (Ubuntu 20.04 Focal) for Markdown to PDF document conversion.

Please note all disclaimers remain flapping in the breeze. Pandoc is open-source software that carries no warranty of any kind. Likewise, I make no guarantee that you will achieve a favorable result simply by following the steps below. 🙃

Your Mileage Might Vary

--> Pandoc on Windows

I have been using the Chocolately package manager for Windows for several years now. So for me, the simplest way to get Pandoc up and running was first to install it with this command.

choco install pandoc

Then I grabbed the Windows installer for MikTex (one of many Pandoc engines) from the official downloads page and ran the .exe file to install.


markdown-to-pdf-pandoc-sample

Before-and-after view when running the markdown-to-pdf command in Pandoc

No further configuration was necessary, at least not from a functional point of view. However, Pandoc will let you customize certain behaviors at quite a granular level.

There is a configuration section toward the end of this article offers a good starting place for those wishing to take a deeper dive.


--> Pandoc on Linux

Getting Pandoc setup with .pdf capabilities on Linux proved a bit more challenging.

After navigating a good deal of noise encountered while researching different packages, I found the consensus seemed to favor a Pandoc/TexLive setup.

Since I am running Ubuntu Linux on WSL2, I opted to build from scratch to avoid conflicts with my local Windows environment.


markdown-to-html-pandoc.jpg

Before-and-after view when running the markdown-to-html command in Pandoc


First, I pulled the latest tarball from the release page.

There are multiple assets available with each release, so check your OS architecture first rather than blindly copying the snippets. This way, you can ensure you are requesting the correct package for your machine.

wget https://github.com/jgm/pandoc/releases/download/2.13/pandoc-2.13-linux-amd64.tar.gz

Then, without switching directories, I used a two-step installation process.

sudo tar xvzf $TGZ --strip-components 1 -C '/usr/local'

Note that $TGZ and the destination folder in the snippet above are generic, and substitutions must be made to reflect your home directory.

sudo apt-get install texlive texlive-latex-extra

And with that, Pandoc was installed on Linux to convert markdown files to PDF documents!


Configuring Pandoc

These configuration options are just examples and may not apply in every situation.

One should always consult the official Pandoc documentation for complete details and the latest changes.

--pdf-engine=PROGRAM
  • Specifies which engine Pandoc should use when producing PDF output.
  • Valid values are pdflatex, lualatex, xelatex, latexmk, tectonic, wkhtmltopdf, weasyprint, prince, context, and pdfroff.
  • If the engine is not in your PATH, you can specify the full path of the engine here.
  • If this option is not specified, pandoc uses the following defaults depending on the output format specified.
> -t latex or none
  • defaults to pdflatex (other options: xelatex, lualatex, tectonic, latexmk)
> -t context:
  • defaults to context
> -t html:
  • defaults to wkhtmltopdf (other options: prince, weasyprint; visit Print-CSS for a good introduction to PDF generation from HTML/CSS.)
> -t ms:
  • defaults to pdfroff

markdown-rendered-as-html-pandoc.jpg

Pandoc can even apply custom styling when rendering HTML from Markdown

--pdf-engine-opt=STRING
  • Use the given string as a command-line argument to the pdf-engine. For example, to use a persistent directory foo for Latexmk’s auxiliary files, use --pdf-engine-opt=-outdir=foo .
  • Note that no check for duplicate options is done.

Credit: Pandoc User's Guide


Conclusion

On a related note, if you find yourself working with markdown, HTML, PDF, or XML files quite often, you should check out a little project of mine called mdEditor for VS Code.

The frameworks installed by mdEditor automate the configurations we just covered in such detail. Now you can generate file conversion with a simple key-binding or click in the command palette!

I hope you have found this tutorial useful, and thank you for taking the time to follow along!

Don't forget to 💖 this article and leave a 💭. If you're feeling extra generous, please click my name below to 🎆subscribe🎇!

-- killshot13


A Note on Pandoc

Copyright 2006–2021 John MacFarlane. Released under the GPL, version 2 or greater. This software carries no warranty of any kind. (See COPYRIGHT for full copyright and warranty notices.) For a full list of contributors, see the file AUTHORS.md in the Pandoc source code.

This post is also available on DEV.