R Markdown and Pandoc Template: releasing your super powers

This is a write version of my presentation at the R user group at the University of Manchester.

Motivation

Meet Alice and Bob.

Alice and Bob shade
Alice and Bob

Alice and Bob are researchers. Sometimes they have awesome ideas.

Alice and Bob shade with light bulb on top of their heads
Alice and Bob had an idea

And sometimes they collaborate in an shared idea.

Alice and Bob shade with shared light bulb on top of their heads
Alice and Bob share an idea

When they collaborate, they create something concreate faster than when they work alone.

Alice and Bob shade with shared gear on top of their heads
Alice and Bob share an invention

Because collaborate with each other has been very productive, in the last year they have committed to support open access, open data and open source software.

Alice and Bob shade with open access logo, open data logo and open source software logo on top of their heads
Alice and Bob and open access, open data and open source software

Among the open source software that Alice and Bob use, their favourites are R and RStudio.

Alice and Bob shade with R logo and RStudio logo on top of their heads
Alice and Bob and R and RStudio

Alice and Bob love to write their research narrative in R Markdown. R Markdown helps very much to have the data visualisation in the correct place and always correct.

Alice and Bob shade with PC running RStuido between them
Alice and Bob using RMarkdown

Unfortunately, they waste time because none of the journals accept R Markdown documents as input format which translate in Alice and Bob tailoring their narrative to the journal style.

Alice and Bob shade with PC visiting some paywalled literature between them
Alice and Bob versus publishers

At some point, Alice and Bob were so frustrated with the time wasted tailoing the same narrative more than once just because the publisher couldn’t handle R Markdown that they decided to, as good researchers, investigate a solution for the problem. They wore their x-ray goggles to look how R Markdown works under the hood and they discovered that part of R Markdown magic is powered by Pandoc.

Alice and Bob shade with PC visiting Pandoc's website for the first time between them
Alice and Bob discovering Pandoc

They also discovered that Pandoc allow users to provide custom templates. Now, Alice and Bob have a hypothesis: can they use Pandoc’s custom template to save time?

Hypothesis Investigation

Pandoc’s documentation says

A custom template can be specified using the --template option. You can also override the system default templates for a given output format FORMAT by putting a file templates/default.FORMAT in the user data directory.

It also says

--data-dir=DIRECTORY Specify the user data directory to search for pandoc data files. If this option is not specified, the default user data directory will be used. This is, in Unix:

$HOME/.pandoc

(…)

and in Windows Vista or later:

C:\Users\USERNAME\AppData\Roaming\pandoc

Pandoc sets some variables by default when it reads the input document and those variables can be used in the template by surrounding the variable name with the dollar sign, $.

With that information and all their R knowledge, Alice and Bob wrote on their lab notebook

  1. Clone the Git repository with the last paper.

    Run git clone git@gitlab.com:rgaiacs/rmd-and-pandoc-template.git at the command line.

  2. Create the directory _pandoc/templates inside the directory rmd-and-pandoc-template.

    Run cd rmd-and-pandoc-template, mkdir _pandoc and mkdir _pandoc/templates at the command line.

  3. Get one LaTeX templates from Overleaf and store it inside _pandoc/templates.

    Visit Overleaf’s APA6 template, copy it and save as _pandoc/templates/apa6.tex.

  4. Change _pandoc/templates/apa6.tex to make use of Pandoc’s variables title, author and body. Also include some necessary LaTeX packages.

    The necessary LaTeX packages are

    % Common used
    \usepackage[english]{babel}
    \usepackage[utf8x]{inputenc}
    \usepackage{amsmath}
    
    % Required by knitr
    \usepackage{framed}
    \usepackage{graphicx}
    \usepackage{listings}
    \usepackage{longtable,booktabs}
    \usepackage{textcomp}
    \usepackage{xcolor}
    
  5. Add Pandoc arguments into paper.Rmd.

    output:
     pdf_document:
       pandoc_args: [
         "--data-dir", "_pandoc",
         "--template", "apa6.tex",
         "--output", "apa6.pdf"
       ]
    
  6. Run Knit from the RStudio interface.

    RStudio log will show

    /usr/lib/rstudio/bin/pandoc/pandoc +RTS -K512m -RTS paper.utf8.md --to latex --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash --output paper.pdf --template /home/raniere/R/x86_64-pc-linux-gnu-library/3.4/rmarkdown/rmd/latex/default-1.17.0.2.tex --highlight-style tango --latex-engine pdflatex --variable graphics=yes --data-dir _pandoc --template apa6.tex --output apa6.pdf --variable 'geometry:margin=1in' 
    output file: paper.knit.md
    
    
    Output created: paper.pdf
    Error in tools::file_path_as_absolute(output_file) : 
      file 'paper.pdf' does not exist
    Calls: <Anonymous> -> <Anonymous>
    Execution halted
    

    Looks like that something went wrong but this is only because we change the output expected by RStudio with --output apa6.pdf.

  7. Open apa6.pdf with your favourite PDF reader.
  8. Repeat steps 3-7 with a different LaTeX templates.

    You can use Overleaf’s IEEE Photonics Journal Paper template. You will need to download the IEEEphot.cls.

Looks like that Alice and Bob can use Pandoc’s custom template to save time!

Alice and Bob shade with PC showing two PDF based on two templates between them
Alice and Bob with two PDF based on two templates

--data-dir versus $HOME/.pandoc

Alice and Bob successfully verified that they could use $HOME/.pandoc instead of --data-dir but they agreed that using --data-dir would make easy for reproduce their work at the cost of duplicated files.

HTML

Alice and Bob also tested Pandoc templates for HTML documents. They wrote _pandoc/templates/university-of-manchester.html and added

html_document:
  template: "university-of-manchester.html"
  pandoc_args: [
    "--data-dir", "_pandoc",
  ]

into paper.Rmd.

Alice and Bob shade with PC showing HTML based on templates between them
Alice and Bob with HTML based on templates

Note: As explained in rmarkdown issue #1142, use --template inside pandoc_args will not work.

PDF from HTML

Chrome 59 or later has headless support which allows users, among other things, to create PDF by running chrome --headless --disable-gpu --print-to-pdf https://website.to.print/.

Microsoft Word and LibreOffice Writer

Pandoc supports custom template for Microsoft Word and LibreOffice Writer documents. Unfortunately, write a custom template for this formats can be a little more challenge.

Conclusion

Now is time for Alice and Bob wear they red cape and go share their new super power.

Alice and Bob shade with their cape
Alice and Bob with their cape

Appendix

File does not exist

R Markdown/RStudio doesn’t offer a great support to output_file argument, --output as Pandoc argument, as explained by Yihui Xie in Stack Overflow. You can follow his suggestion and use “the undocumented knit hook”.

Mathematical expressions

As mentioned on Pandoc’s documentation,

To write a literal $ in a template, use $$.

GitLab, GitLab CI and GitLab Pages

If you host your R Markdown file in GitLab, you can use GitLab CI to convert it to HTML, or any other format, and publish it on GitLab Pages. You can do it for a single document or for more, maybe organised as a blog.

Alice and Bob shade with PC showing GitLab homepage between them
Alice and Bob with GitLab