Academic Writing Using Pandoc

Posted on November 14, 2018 • 7 min read • 1,391 words

Markdown became one of the most powerful tools in my daily business of doing research. Nowadays, everyone faces a lot of writing work. While LaTeX which I learned during my studies at universities is a great deal for creating beautiful and reproducible documents without all the Microsoft Word hassles, it has some major drawbacks. Some people claim, it is way too hard to learn and not as understandable as WYSIWYG (what you see is what you get) editors, others – like me – are just annoyed by constantly having to type one of \ { }. If you are interested more into LaTeX, I suggest reaching out for your favorite search engine which will have plenty of ressources for reference. For me, markdown appeared as a perfect trade-off between these two worlds. Hence, I show you how you can also benefit from using pandoc for academic writing tasks.

Primer on Markdown

When talking (or writing) about markdown, I need to clarify that I will discuss here the pandoc markdown dialect. To my knowledge it is the most powerful dialect and conversion tool. Just have a look on the awesome conversion possibilities indicated on their homepage.


If you have not worked with markdown yet, let me shortly introduce to some of the basics. This will not be a complete tutorial to pandoc and there is way more functionality than I could describe in just one article. If you are interested have a look at the manual or leave me a comment what you need to know or need help with.

Basic Text / Paragraphs

Text is just plain text. There is nothing special about it. Just write it. That is the actual beauty of markdown. It allows you to focus on the most essential part of your writing. If you happen to need to indicate some formatting like italics, bold print or strikethrough text you have simple commands to your hand.

*bold* renders to bold*,
**italic** to italic**, and
~~strike~~ to strike

Headings

A heading in markdown is indicated by a hash sign # followed by the name of the heading. Different levels of headings just use the corresponding amount of hash signs. Normally, headlines will be numbered. Unnumbered headings can be added by appending {-} to the end of the heading – this is actually short for {.unnumbered}. Additionally, if you are going to output to latex or pdf files, starting with heading level 4 you have access to paragraphs.

1## Headline 1 {.unnumbered}
2## Also unnumbered headline {-}
3### Headline Level 2
4##### Paragraph in Latex
5These are generally unnumbered

Tables in pandoc markdown can easily be written by separating the columns using pipes |. The first line will automatically be converted to the table heading. You indicate the formatting of the table, that is whether you want to have your text in the columns being justified left, right or centered, in the second line. :--- thereby indicates left justified, ---: correspondingly right justified and :---: centered. Be aware, that the relative amount of dashes indicates how wide the column shall be and that you always need at least three dashes! Thus, :---|------: means we have two columns. The first is left justified, while the second is right justified. Additionally, the second column should take up twice as much width as the first. A caption can be added to the table by a newline beneath the table beginning with Table: caption goes here.

In my experience, these formatting instructions are really easy and very useful when creating simple documents. For scientific work, however, I most of the times rely on the inline latex feature which pandoc provides. This allows you to write arbitrary latex code right inside the markdown file at any place. During the document conversion, pandoc will just skip this part and copy it as it is to the final latex document. Just keep in mind that you will loose these parts if you do not export to latex in the end.

 1## Pandoc style tables
 2head col 1 | head col 2
 3:---:|:---
 4centered col | left aligned col
 5Table: Caption for the table (and yes, it gets converted to real captions ;
 6-))
 7## Latex style tables
 8\begin{table}
 9\begin{tabular}...
10\end{tabular}
11\end{table}
12Does also work.

Mathematics

If you know LaTeX, you probably enjoy typesetting equations in latex. It is way more easier than it is in WYSIWIG editors. Pandoc markdown allows to use the same notations as LaTeX. So you can just write your equations as you are used to.

1$$a = \sum_{\forall a_i \in A} a_i^2$$

One of the reasons, why I love pandoc markdown so much is, that the above code which renders a beautiful latex formula can be converted to a valid MS Word equation. It will not convert to an image and include that, but instead creates the according equation object. So far, this feature saved me several features and sometimes I just start a document for one equation, convert it and copy it over to some document I am working on. Just amazing – thanks to the developers!

Bibliography

For academic writing, you definitely need to know how to reference to previous work using pandoc. In markdown this is as easy as it might get. If you are already used to the latex style of using \citet{} or \textcite and \citep or \cite, you will enjoy how easy citing can be. First of all each markdown document can have a preamble like a latex document providing some metadata. This preamble starts using three dashes and ends the same way. A bibliography file can be provided as shown below.

1---
2bibliography: bibfile.bib
3---
4## Heading 1

Actual citing is now as easy as referencing the corresponding bibtex key. What I really like is, that this concept even works if you finally output to MS Word documents. It will give you correct citations from your bibtex bibliography.

1As is shown by @Nohl2014 (for in text citing \citet)
2The BadUSB publication [@Nohl2014] (for in parentheses citing \citep)
3Multiple authors can also be cited [@Nohl2014; @Langner2013].

Sharing your final work

One of the benefits of writing in markdown is the possibility to export your final document to every format you may require. For example, you can generate a pdf file using the options -t pdf or by exporting to a latex source -t tex and doing the final document creation on your own. The full command for exporting to even a word document would then look like below.

1pandoc document.md -t docx -o document.docx

For academic writing, however, I prefer to export the pandoc document to a latex document which I then include in the required latex template. Thus, I export to an intermediate file paper.tex and then use \include{paper.tex} inside the main document, e.g. sample.tex. Additionally, experience has shown that pandoc sometimes outputs some commands my latex templates do not recognize. These are usually concerned with tables. I, therefore, replace these commands with my table styles. Placing the full pipeline a file called Makefile in the same folder then allows for using make on the commandline for producing the final output.

 1pandoc paper.md -t latex -o paper.tex --bibliography bibfile.bib --natbib
 2--top-level-division=section --toc
 3## Convert longtable to supertabular
 4## Requires \usepackage{supertabular}
 5sed -i -e 's/longtable/supertabular/g' paper.tex
 6## Remove weird endhead and endfirsthead
 7sed -i -e 's/\\endhead//g' paper.tex
 8sed -i -e 's/\\endfirsthead//g' paper.tex
 9## Compile tex document.
10## sample.tex contains the preamble, style and command definitions and has a \include{paper.tex} for the actual content
11xelatex sample.tex
12bibtex sample
13## Recompile for bib and toc updates

Producing Slideshows

As shown in the previous section, you can produce beautiful documents by just writing markdown. The magic of pandoc does not stop by just producing documents. You can even create awesome slideshows with it. Just follow the same principles as before and make sure you have headings from all levels 1 through 3. Each level 3 heading designates a new slide. By using the -t beamer option, you can then render awesome pdf slideshows out of latex code.

1pandoc -t beamer --listings tool.md > input.tex
2## Compile presentation.tex which holds preamble, style and command definitions and has a \include{input.tex} for the actual content
3pdflatex presentation.tex

Finally, also make sure you use a decent editor for this workflow. If you need an inspiration on how to use emacs for pandoc/markdown editing have a look in my setup for emacs.

Nextlevel v/Peter Schneider

I work on everything cyber security and development, CVR: 42051993, mail: info@nextlevel-blog.de, phone: 60 59 76 35