2025-03-11


Introduction

R Markdown is a powerful tool that allows you to:

  • Write and execute code in a single document

  • Generate high-quality, dynamic reports with both code and output embedded

R Markdown is especially useful for reproducibility because it combines both narrative text and computational code in the same document.

The output of your code is automatically included in the final document, which makes it easier to track and verify results.

R Markdown supports a wide variety of output formats:

  • static: such as HTML, PDF, and Word documents

  • dynamic/interactive: such as Shiny apps and interactive web pages

This flexibility makes it an ideal tool for sharing and presenting your work in a reproducible and professional manner.

To explore the full capabilities of R Markdown, visit the official website R Markdown Website, where you can find introductory videos and comprehensive guides on how to use R Markdown effectively.

Additionally, it is recommended to check out the R Markdown Cheat Sheet, which is an excellent quick reference guide for key functions and syntax when working with R Markdown.




Utility of R Markdown

R Markdown files are versatile and can be used in three main ways:

  1. For communicating with decision makers:
    R Markdown can be used to create reports that focus on the conclusions and results of your analysis, without overwhelming decision makers with the underlying code.
    This allows stakeholders to easily understand the key findings without the need for technical details.

  2. For collaboration with other data scientists:
    R Markdown is also a great tool for collaborating with other data scientists, including your future self! In this case, the file contains not only the conclusions of the analysis but also the full code used to reach those conclusions.
    This promotes transparency, reproducibility, and makes it easier for others to follow and build upon your work.

  3. As a data science environment:
    R Markdown can serve as a modern-day lab notebook for data scientists.
    It is an ideal environment to document both the process and the thinking behind your analysis.
    It allows you to capture not only the steps you took in your analysis but also your thought process, helping you document your work more thoroughly for future reference.



Format of R Markdown files

R Markdown files have the extension .Rmd and are created using the rmarkdown package.

Important note: You do not need to manually install the rmarkdown package, as it is automatically included when you use RStudio. This makes it easy to start working with R Markdown right away without any additional setup.



The importance of reproducibility

From the very beginning of the concept of Science, reproducibility has been regarded as a cornerstone of scientific inquiry.

Galileo Galilei himself based his scientific method on the principle of reproducibility, emphasizing that experimental results should be replicable by others to ensure their validity and reliability.

This foundational idea remains central to modern scientific practice, where the ability to reproduce results is crucial for the advancement of knowledge.


Scientific editing, Importance of Reproducibility in Science, David Adewusi, 2020
Scientific editing, Importance of Reproducibility in Science, David Adewusi, 2020



There are several levels of reproducibility that we need to consider:

  • Data Replicability

  • Data Reproducibility

  • Research Reproducibility

Data replicability and data reproducibility refer to the ability to generate or reproduce the same dataset, typically under controlled conditions. These concepts focus on ensuring that the same data can be obtained or verified by others, either through the original process or a similar methodology.

Research reproducibility involves repeating the entire research process, not only the data but also the analyses, methodology, and interpretation of results. It means that another researcher should be able to follow the same steps and arrive at the same conclusions.


If we create a report that includes all the code used in the analysis, along with clear explanations of the choices made (such as parameters or methodologies), and then share this report, we allow others to reproduce the same results using our data. Alternatively, researchers can adapt the code for use with their own data.
This practice promotes transparency and ensures that others can verify and build upon our work.

In the early 2010s the term “replication crisis” described a significant issue in scientific research.
It refers to the increasing difficulty or impossibility of reproducing the results of many published studies. Given that reproducibility is a core principle of the scientific method, this crisis threatens the credibility of theories based on these studies.
The inability to reproduce results undermines trust in scientific knowledge and raises serious concerns about the reliability and validity of research findings. As a result, there has been growing emphasis on improving research practices and ensuring that findings are more robust, transparent, and trustworthy.



According to a study published in Nature (check it here) six key factors that can impact reproducibility are identified:

  • Lack of access to methodological details, raw data, and research materials
    When researchers do not have access to the precise methods, raw datasets, or research materials used in a study, it becomes impossible to verify or reproduce the results.

  • Use of misidentified, cross-contaminated, or over-passaged cell lines and microorganisms
    Improperly identified or contaminated biological materials can lead to erroneous results, as they no longer represent the original source or research conditions.

  • Inability to manage complex datasets
    Complex datasets may be difficult to handle, especially when they are large, unstructured, or poorly organized. Poor data management can lead to errors in the analysis and hinder reproducibility.

  • Poor research practices and experimental design
    Flawed research methods or experimental designs that don’t account for key variables, controls, or other essential aspects can lead to inaccurate or irreproducible results.

  • Cognitive bias
    Researchers’ inherent biases, such as confirmation bias or selective reporting, can influence the outcomes of a study, often leading to distorted results that are difficult to reproduce by others.

  • A competitive culture that rewards novel findings and undervalues negative results
    The pressure to produce groundbreaking discoveries, coupled with the undervaluing of negative or inconclusive results, can result in selective reporting and a lack of transparency, making it harder for others to replicate findings.



In the same article, the authors suggest several solutions to improve reproducibility:

  • Robust sharing of data, materials, software, and other tools
    Ensuring that data, materials, and software are shared in an accessible and transparent way allows others to verify and reproduce the results.

  • Use of authenticated biomaterials
    Using properly authenticated biomaterials ensures that experiments are conducted with reliable and correct biological materials, reducing the risk of errors or misidentification.

  • Training on statistical methods and study design
    Educating researchers on proper statistical techniques and study design helps improve the reliability of analyses and ensures results can be consistently reproduced.

  • Pre-registration of scientific studies
    Educating researchers on proper statistical techniques and study design helps improve the reliability of analyses and ensures results can be consistently reproduced.

  • Publishing negative data
    Publishing results that do not support the hypothesis is crucial to provide a complete picture of scientific understanding and prevent the bias of only showcasing positive or novel findings.

  • Thorough description of methods
    A detailed and transparent description of the methods used in research allows others to replicate the study and verify the results.


Reproducibility in data analysis is just as important as reproducibility in generating data.

Although reporting alone may not be sufficient in bioinformatics (as discussed in this article), it represents an essential first step toward ensuring reproducible analysis.



Components of an R Markdown document

An R Markdown document consists of three primary components:

  • Metadata
    The metadata section is a YAML header that contains information about the document, such as the title, author, date, and other output options (e.g., the type of document to generate: HTML, PDF, Word).
    It is defined at the beginning of the file and helps configure the behavior of the document.

  • Text
    The text section of the document is written in Markdown, a lightweight markup language.
    This is where the main content, such as descriptions, explanations, headings, and paragraphs, is inserted.
    It can be formatted using Markdown syntax (e.g., bold, italics, bullet points).

  • Code
    The code section contains chunks of R code (called “code chunks”) that are executed when generating the document.
    The results of the code execution, such as plots, tables, or calculation outputs, are automatically included in the final document.
    Code is enclosed between the delimiters ```{r} and ```.



Workflow





Creating a new R Markdown file

In RStudio, you can create a new R Markdown (.Rmd) file by following these steps:

  1. Go to the menu: Click on FileNew FileR Markdown

  2. A new window will appear, prompting you to specify document details such as:

    • Title: The title of your document

    • Author: Your name (optional)

    • Default output format: Choose between HTML, PDF, or Word


Once you have filled in the required information, click OK to create the new R Markdown file.

The newly created file will include a template with:

  • A YAML header section at the top (enclosed between --- lines)

  • A sample Markdown text section

  • A sample R code chunk




This file is now ready to be edited, executed, and converted into a final report.

When you are ready, you can render the document by clicking the Knit button in the toolbar, generating a fully formatted report in your chosen output format.



Compiling an R Markdown document

To generate and view the final output of your R Markdown document, you need to compile it.
This process transforms your .Rmd file into a formatted document in HTML, PDF, or Word format, depending on your settings.

The easiest way to compile an R Markdown document is by using the Knit button in RStudio.

  1. Open your .Rmd file in RStudio.

  2. Locate the Knit button in the toolbar above the editor window. It looks like a ball of yarn.

  3. Click the Knit button.


By default, this will generate an HTML document and open it in RStudio’s Viewer pane.
You can change the output format by modifying the output: option in the YAML metadata at the beginning of your file.

For example, to generate a PDF or Word document, you can specify:

---
output: pdf_document
---

or

---
output: word_document
---

Alternatively, you can manually select a format by clicking the small downward arrow next to the Knit button and choosing one of the available formats:



When you click one of the output format options in the Knit menu, your YAML header will automatically update to reflect your selection.

We will discuss YAML headers in detail in a later section, but in summary, the YAML header defines key formatting options for your document, including:

  • Output type (HTML, PDF, Word, etc.)

  • Title, author, and date

  • Other document settings

You can also specify multiple output formats in your YAML header if you want to generate more than one file type at the same time.

Besides clicking the Knit button, there are other ways to render your document:

  1. Keyboard shortcut

    • Press Cmd/Ctrl + Shift + K to knit the document quickly.
  2. Using a code command

    • You can programmatically render the document by running the following command in the R console: rmarkdown::render("1-example.Rmd") .
      This method is useful when automating reports or running scripts outside RStudio.

When you knit an R Markdown document, the process follows these steps:

  1. R Markdown sends the .Rmd file to knitr
    knitr
    (https://yihui.org/knitr/)) executes all the code chunks and generates a new Markdown (.md) document containing the code and its outputs.

  2. Pandoc processes the Markdown file
    The Markdown (.md) file created by knitr is then passed to Pandoc (https://pandoc.org/), which converts it into the final output format (HTML, PDF, Word, etc.).

This two-step workflow allows R Markdown to generate many different types of output while maintaining reproducibility and flexibility.


When you render an R Markdown document, R will automatically generate the output file and display the report.

You can choose how to view the rendered document by clicking on the cog icon next to the Knit button in RStudio. This opens a drop-down menu with different preview options:

  • Viewer Pane (default) → Displays the output in the bottom right pane (shared with Files, Plots, Packages, and Help).

  • New Window → Opens the output in a separate window.

These options allow you to adjust the workflow based on your preferences.



Running a single code chunk

Sometimes, instead of knitting the entire document, you may want to execute only one code chunk to check its output.

You can run an individual code chunk in two ways:

  • Click the “Run” Icon

    • Each code chunk has a small “Run” button (a play icon ▶) at the top-right corner. Clicking this will execute only that chunk.
  • Use a Keyboard Shortcut

    • Windows/Linux: Press Ctrl + Shift + Enter

    • Mac: Press Cmd + Shift + Enter

When you run a chunk, the code inside it is executed immediately and the output appears inline, directly below the chunk in the R Markdown document.

This feature is useful for debugging, testing, or incremental development, as it allows you to check specific parts of your analysis without re-knitting the entire document.



Document Types in R Markdown

R Markdown allows you to render documents in multiple formats, depending on your needs.

The main output types include:

  • HTML

  • PDF

  • Word

  • Templates

  • Presentations



HTML

By default, R Markdown compiles documents into HTML (HyperText Markup Language) format.

HTML documents allow you to include interactive features, such as embedded plots, dynamic tables, and JavaScript-based visualizations.



PDF

If you prefer a Portable Document Format (PDF) output, R Markdown can generate high-quality PDF reports. However, note that:

  • Interactive features cannot be included in PDF documents.

  • Knitting to PDF requires LaTeX to be installed on your system.

  • If LaTeX is missing, you will see an error message like:
    “No LaTeX installation detected (LaTeX is required to create PDF output).”

To install LaTeX, you can choose one of the following options:



Word

If you have Microsoft Word installed, you can render R Markdown documents as .docx files.

However, similar to PDFs, interactive elements (such as embedded plots and dynamic tables) will not be included.



Presentations

R Markdown also supports creating slide presentations in various formats:

  • ioslides (HTML-based slides)

  • Slidy (HTML-based slides)

  • Beamer (PDF-based slides, requires LaTeX)

  • PowerPoint (requires Microsoft PowerPoint)

While R Markdown presentations are useful, they may lack flexibility compared to dedicated presentation tools.



Shiny Applications

Shiny is an R package that enables the creation of interactive web applications.

You can embed a Shiny app within an R Markdown document to allow real-time user interaction.

Shiny applications require an HTML output and cannot be rendered in PDF or Word formats.



Using Templates in R Markdown

R Markdown provides predefined templates for different document types, reducing the need to manually set up formatting. These templates can be accessed when creating a new R Markdown file in RStudio.

Additionally, custom templates can be specified in the YAML header, allowing further customization of document formats.



Key components of an R markdown

In the following sections, we will explore in more detail the three key components of an R Markdown document:

  1. Markdown Text (for formatting content)

  2. The YAML Header (for setting document properties)

  3. Code Chunks (for executing R code)



Text formatting

R Markdown uses Markdown, a lightweight markup language designed to be easy to read and write. Markdown allows you to format text in various ways, making it possible to create structured and well-formatted documents.



Headers

# Header 1  
## Header 2  
### Header 3  
#### Header 4  
##### Header 5  
###### Header 6  


Alternatively, for Header 1 and Header 2, an underline style can be used:


Header 1  
========
  
Header 2  
--------



Text emphasis


You can apply italics, bold, and other text styles using the following syntax:

  • Italic → Use one asterisk (*italic*) or one underscore (_italic_).

  • Bold → Use two asterisks (**bold**) or two underscores (__bold__).

  • Bold & Italic → Use three asterisks (***bold and italic***) or three underscores (___bold and italic___).

  • Strikethrough → Use two tildes (~~strikethrough~~).



Lists in Markdown

Unordered lists

To create a bulleted list, use *, +, or -, followed by a space.

To add sub-bullets, insert two spaces and use *, +, or - , followed by a space:

Rendered Output:

  • Superbattito
    • Quella te
    • Nmrpm
  • Punk
  • OK
    • Destri
  • Dentro
    • Idem
    • Flavio
  • INDI
    • Mezzo secondo
    • Noi no



Ordered lists

To create an ordered list, follow the same steps as for unordered lists, but use numbers followed by a period (.) instead of asterisks (*).

To create sub-items within an ordered list, insert three spaces before the sub-item and use either a number (for ordered sub-lists) or a symbol like * (for unordered sub-lists).


  1. One line

  2. Second line

    • Unordered
    1. Ordered sub-list
  3. Third line

Markdown will automatically adjust the numbering when rendered, even if you use 1. for all items.



Images

You can insert images into your document using a syntax similar to that of hyperlinks, with the addition of an exclamation mark (!) before the square brackets.

In the example below, you can see the file path to the image. This image will render without a caption:

![](images/hex-rmarkdown.png)


To add a caption for the image, place the caption text between the square brackets, like this:

![Rmarkdown symbol](images/hex-rmarkdown.png)

Rmarkdown symbol
Rmarkdown symbol

This will render the image with the caption Rmarkdown symbol.


You can also control the size of the image. For example, to set the image width to 100 pixels:

![Rmarkdown symbol](images/hex-rmarkdown.png){width=100px}

Rmarkdown symbol
Rmarkdown symbol


This will render the image with a width of 100px.

Alternatively, you can use percentages to control the size:

![Rmarkdown symbol](images/hex-rmarkdown.png){width=20%}

Rmarkdown symbol
Rmarkdown symbol


This will render the image at 20% of its original size.


To center the image, you can use HTML tags. For example:

Rmarkdown symbol
Rmarkdown symbol


If you want to insert an image inline with the text, you can simply place the image directly in the line.

For example:

This is the Rmarkdown symbol:



Adding spaces between lines

To start a new paragraph, you need to leave an empty line between the previous paragraph and the new one. If you add multiple spaces by pressing “ENTER,” it won’t create additional space. You need to use <br /> (or $~$) to create extra line breaks.

For example:

I need


space

\(~\)

I need



more space

This will add the required spaces between the lines.




Tabbed Sections

Tabbed sections can be added to an HTML document, which will fold the subsequent sub-sections into separate tabs. To create a tabbed section, specify a section header followed by {.tabset}.

For example:

My Section Header

Plot
Table



Metadata or Yet Another Markup Language (YAML) header

Whole document settings can be controlled through parameters in the YAML header. YAML (Yet Another Markup Language) is a simple, human-readable format used by R Markdown to control various details of the document’s output.

A YAML header contains arguments such as title, author, and output, each demarcated by three dashes (---) on both ends.

  • The title (e.g., title: "My Title") will appear at the head of the document with a larger font size than the rest of the text.

  • You can also specify a subtitle, which will appear below the title and in a slightly smaller font size.

  • You can specify one or more authors, separating multiple names with commas.

  • You can supply a static date (e.g., date: 2022-01-13) or a dynamic date that updates each time you knit the document using Sys.Date().




Table of Contents (TOC)

By manipulating the YAML header, you can add a Table of Contents (TOC), which can either appear at the beginning of the document or float.

Important: Be sure to pay attention to indentations in the YAML header.





Themes

You can change the styling of your document (such as font type, color, and size) by adding the theme parameter. Many theme examples can be found at Data Dreaming’s R Markdown Theme Gallery.



Output

The output option allows you to specify the type of document you want to create. This will be auto-populated if you generate the .Rmd file in RStudio via the “New R Markdown” file option. You can manually modify the output type, but you must use valid arguments.

Some valid output types include:

  • html_document

  • pdf_document

  • word_document

You can even specify multiple document types to render simultaneously:



Parameters

R Markdown documents can include one or more parameters, which allow you to re-render the same report with different values for key inputs. For example, you could generate different sales reports by branch or exam results by student.

To declare parameters, use the params field in the YAML header.



Bibliographies and Citations

Pandoc can automatically generate citations and a bibliography in various styles. To use this feature, specify a bibliography file in the YAML header:

bibliography: rmarkdown.bib

Supported formats include BibTeX, BibLaTeX, EndNote, and Medline.

To create citations within the .Rmd file, use the citation key from your bibliography file preceded by @. For example:

Blah blah [@smith04; @doe99]

For in-text citations, omit the square brackets:

@smith04 says blah, or @smith04 [p. 33] says blah

You can change the citation and bibliography style by referencing a CSL (Citation Style Language) file in the csl field:

bibliography: rmarkdown.bib
csl: apa.csl



Code Chunks and Inline code

To integrate R code into your document and create reproducible objects (e.g., figures, tables, and text), you need to insert a code chunk. There are three ways to do this:

  1. Use the keyboard shortcut Cmd + Option + I (Mac) or Ctrl + Alt + I (Windows/Linux).

  2. Use the “Insert” button icon in the editor toolbar.

  3. Manually type the chunk delimiters: ````{r}and ``` `.

You can run the code using Cmd/Ctrl + Enter, and run the entire chunk with Cmd/Ctrl + Shift + Enter.

The chunk header consists of ````{r}` followed by an optional chunk name and any additional options (comma-separated), and the chunk ends with three backticks.



Chunk name

Chunks can be given an optional name: ```{r by-name}.

This has three advantages:

  1. Easier navigation using the drop-down code navigator in the editor.
  1. Clearer names for graphics produced by the chunk, which makes them easier to use elsewhere.

  2. Reuse of cached chunks, which helps avoid re-performing expensive computations.
    A special chunk name, setup, is run automatically once before any other code when in notebook mode.



Chunk options

Knitr provides nearly 60 options for customizing your code chunks. Some key options include:

  • eval = FALSE: Prevents code from being evaluated.
    This is useful when you want to display the code in your document but don’t want it to be executed. For example, you may want to show the structure of your code as an example but don’t want to actually run it.

  • include = FALSE: Runs the code, but hides the code and its results in the output.
    This is useful for setting up things that don’t need to be visible in the document, like data loading or preprocessing, while ensuring that the code is still executed. The results and the code are hidden, but they affect the final output.

  • echo = FALSE: Prevents code from being shown, but includes the results.
    If you want to hide the R code but display the results (e.g., plots or tables), you can use this option. This is useful when you want the focus to be on the results, not the underlying code.

  • message = FALSE or warning = FALSE: Prevents messages or warnings from appearing.
    Sometimes code produces warnings or messages that are not critical to the document’s understanding, but they can clutter the output. Setting message = FALSE hides any informational messages, and warning = FALSE hides warning messages.

  • results = 'hide': Hides printed output.
    This option is useful when you want to execute code but do not want the printed results (e.g., the result of a summary() or print() function) to appear in the output. The code will be executed, but no results will be displayed.

  • fig.show = 'hide': Hides plots.
    This option is useful when you want to run code that produces a plot but do not want the plot to be displayed in the document. You might want to generate the plot for later use but not show it in the current output.

  • error = TRUE: Allows the render to continue even if code returns an error.
    This option is useful when you want to allow errors to appear in the document or when you’re debugging and want the document to continue rendering even if some code fails.


You can see the full list at http://yihui.name/knitr/options/.

The following table summarizes which types of output each option suppresses:

You can also add fig.cap = "..." adds a caption to graphical results.



Tables

By default, R Markdown displays data frames and matrices as they appear in the console:

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Kable 1
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2

For more customized formatting, use the knitr::kable function.

For more advanced customization, consider using packages like xtable , stargazer , pander , tables , or ascii.



Caching

By default, each time you knit a document, it starts from scratch.

This is ideal for reproducibility since it ensures all important computations are captured in the code. However, it can be time-consuming if some computations take a long time to run.

The solution is to use cache = TRUE.

When enabled, it saves the output of a chunk to a uniquely named file on disk. On subsequent runs, Knitr checks if the code has changed. If it hasn’t, the cached results are reused, saving time.



Global options

You can change default chunk options globally using knitr::opts_chunk$set().

For example, you can set the following at the beginning of your R Markdown document to hide code, messages, warnings, and errors by default:

In this case, if you want to display code (with echo = TRUE) for specific chunks you have to specify it.

Important note: if you set message = FALSE and warning = FALSE, that would make it harder to debug problems because you would not see any messages in the final document.



Inline code

You can also embed R code directly in the text with the following syntax:

`r <expression>` 

This is useful when you want to refer to properties of your data within the narrative.

That results in: