R Markdown is a powerful tool that allows you to:
Write and execute code in a single document
Generate high-quality, dynamic reports with both code and output embedded
R Markdown is especially useful for reproducibility because it combines both narrative text and computational code in the same document.
The output of your code is automatically included in the final document, which makes it easier to track and verify results.
R Markdown supports a wide variety of output formats:
static: such as HTML, PDF, and Word documents
dynamic/interactive: such as Shiny apps and interactive web pages
This flexibility makes it an ideal tool for sharing and presenting your work in a reproducible and professional manner.
To explore the full capabilities of R Markdown, visit the official website R Markdown Website, where you can find introductory videos and comprehensive guides on how to use R Markdown effectively.
Additionally, it is recommended to check out the R Markdown Cheat Sheet, which is an excellent quick reference guide for key functions and syntax when working with R Markdown.
R Markdown files are versatile and can be used in three main ways:
For communicating with decision makers:
R Markdown can be used to create reports that focus on the
conclusions and results of your
analysis, without overwhelming decision makers with the underlying
code.
This allows stakeholders to easily understand the key findings without
the need for technical details.
For collaboration with other data
scientists:
R Markdown is also a great tool for collaborating with other data
scientists, including your future self! In this case, the file contains
not only the conclusions of the analysis but also the full
code used to reach those conclusions.
This promotes transparency, reproducibility, and makes it easier for
others to follow and build upon your work.
As a data science environment:
R Markdown can serve as a modern-day lab notebook for data
scientists.
It is an ideal environment to document both the process and the thinking
behind your analysis.
It allows you to capture not only the steps you took in your analysis
but also your thought process, helping you document your work more
thoroughly for future reference.
R Markdown files have the extension .Rmd
and are created
using the rmarkdown
package.
Important note: You do not need to manually install the
rmarkdown
package, as it is automatically included when you use RStudio. This makes it easy to start working with R Markdown right away without any additional setup.
From the very beginning of the concept of Science, reproducibility has been regarded as a cornerstone of scientific inquiry.
Galileo Galilei himself based his scientific method on the principle of reproducibility, emphasizing that experimental results should be replicable by others to ensure their validity and reliability.
This foundational idea remains central to modern scientific practice, where the ability to reproduce results is crucial for the advancement of knowledge.
There are several levels of reproducibility that we need to consider:
Data Replicability
Data Reproducibility
Research Reproducibility
Data replicability and data reproducibility refer to the ability to generate or reproduce the same dataset, typically under controlled conditions. These concepts focus on ensuring that the same data can be obtained or verified by others, either through the original process or a similar methodology.
Research reproducibility involves repeating the entire research process, not only the data but also the analyses, methodology, and interpretation of results. It means that another researcher should be able to follow the same steps and arrive at the same conclusions.
If we create a report that includes all the code used in the
analysis, along with clear explanations of the choices made (such as
parameters or methodologies), and then share this report, we allow
others to reproduce the same results using our data. Alternatively,
researchers can adapt the code for use with their own data.
This practice promotes transparency and ensures that
others can verify and build upon our work.
In the early 2010s the term “replication crisis”
described a significant issue in scientific research.
It refers to the increasing difficulty or impossibility of reproducing
the results of many published studies. Given that reproducibility is a
core principle of the scientific method, this crisis threatens the
credibility of theories based on these studies.
The inability to reproduce results undermines trust in scientific
knowledge and raises serious concerns about the reliability and validity
of research findings. As a result, there has been growing emphasis on
improving research practices and ensuring that findings are more robust,
transparent, and trustworthy.
According to a study published in Nature (check it here) six key factors that can impact reproducibility are identified:
Lack of access to methodological details, raw data, and
research materials
When researchers do not have access to the precise methods, raw
datasets, or research materials used in a study, it becomes impossible
to verify or reproduce the results.
Use of misidentified, cross-contaminated, or
over-passaged cell lines and microorganisms
Improperly identified or contaminated biological materials can lead to
erroneous results, as they no longer represent the original source or
research conditions.
Inability to manage complex datasets
Complex datasets may be difficult to handle, especially when they are
large, unstructured, or poorly organized. Poor data management can lead
to errors in the analysis and hinder reproducibility.
Poor research practices and experimental
design
Flawed research methods or experimental designs that don’t account for
key variables, controls, or other essential aspects can lead to
inaccurate or irreproducible results.
Cognitive bias
Researchers’ inherent biases, such as confirmation bias or selective
reporting, can influence the outcomes of a study, often leading to
distorted results that are difficult to reproduce by others.
A competitive culture that rewards novel findings and
undervalues negative results
The pressure to produce groundbreaking discoveries, coupled with the
undervaluing of negative or inconclusive results, can result in
selective reporting and a lack of transparency, making it harder for
others to replicate findings.
In the same article, the authors suggest several solutions to improve reproducibility:
Robust sharing of data, materials, software, and other
tools
Ensuring that data, materials, and software are shared in an accessible
and transparent way allows others to verify and reproduce the
results.
Use of authenticated biomaterials
Using properly authenticated biomaterials ensures that experiments are
conducted with reliable and correct biological materials, reducing the
risk of errors or misidentification.
Training on statistical methods and study
design
Educating researchers on proper statistical techniques and study design
helps improve the reliability of analyses and ensures results can be
consistently reproduced.
Pre-registration of scientific studies
Educating researchers on proper statistical techniques and study design
helps improve the reliability of analyses and ensures results can be
consistently reproduced.
Publishing negative data
Publishing results that do not support the hypothesis is crucial to
provide a complete picture of scientific understanding and prevent the
bias of only showcasing positive or novel findings.
Thorough description of methods
A detailed and transparent description of the methods used in research
allows others to replicate the study and verify the results.
Reproducibility in data analysis is just as important as reproducibility in generating data.
Although reporting alone may not be sufficient in bioinformatics (as discussed in this article), it represents an essential first step toward ensuring reproducible analysis.
An R Markdown document consists of three primary components:
Metadata
The metadata section is a YAML header that contains information about
the document, such as the title, author, date, and other output options
(e.g., the type of document to generate: HTML, PDF, Word).
It is defined at the beginning of the file and helps configure the
behavior of the document.
Text
The text section of the document is written in Markdown, a lightweight
markup language.
This is where the main content, such as descriptions, explanations,
headings, and paragraphs, is inserted.
It can be formatted using Markdown syntax (e.g., bold, italics, bullet
points).
Code
The code section contains chunks of R code (called “code chunks”) that
are executed when generating the document.
The results of the code execution, such as plots, tables, or calculation
outputs, are automatically included in the final document.
Code is enclosed between the delimiters ```{r}
and
```
.
In RStudio, you can create a new R Markdown (.Rmd) file by following these steps:
Go to the menu: Click on File
→
New File
→ R Markdown
A new window will appear, prompting you to specify document details such as:
Title: The title of your document
Author: Your name (optional)
Default output format: Choose between HTML, PDF, or Word
Once you have filled in the required information, click OK to create the new R Markdown file.
The newly created file will include a template with:
A YAML header section at the top (enclosed
between ---
lines)
A sample Markdown text section
A sample R code chunk
This file is now ready to be edited, executed, and converted
into a final report.
When you are ready, you can render the document by clicking the
Knit button in the toolbar, generating a fully
formatted report in your chosen output format.
To generate and view the final output of your R Markdown document,
you need to compile it.
This process transforms your .Rmd
file into a formatted
document in HTML, PDF, or Word format, depending on
your settings.
The easiest way to compile an R Markdown document is by using the Knit button in RStudio.
Open your .Rmd
file in RStudio.
Locate the Knit button in the toolbar above the editor window. It looks like a ball of yarn.
Click the Knit button.
By default, this will generate an HTML document and
open it in RStudio’s Viewer pane.
You can change the output format by modifying the output:
option in the YAML metadata at the beginning of your file.
For example, to generate a PDF or Word
document, you can specify:
---
output: pdf_document
---
or
---
output: word_document
---
Alternatively, you can manually select a format by clicking the small downward arrow next to the Knit button and choosing one of the available formats:
When you click one of the output format options in the Knit menu, your YAML header will automatically update to reflect your selection.
We will discuss YAML headers in detail in a later section, but in summary, the YAML header defines key formatting options for your document, including:
Output type (HTML, PDF, Word, etc.)
Title, author, and date
Other document settings
You can also specify multiple output formats in your YAML header if you want to generate more than one file type at the same time.
Besides clicking the Knit button, there are other ways to render your document:
Keyboard shortcut
Cmd/Ctrl + Shift + K
to knit the document
quickly.Using a code command
rmarkdown::render("1-example.Rmd")
.When you knit an R Markdown document, the process follows these steps:
R Markdown sends the .Rmd
file to
knitr
knitr (https://yihui.org/knitr/)) executes
all the code chunks and generates a new
Markdown (.md
) document containing the
code and its outputs.
Pandoc processes the Markdown file
The Markdown (.md
) file created
by knitr is then passed to Pandoc (https://pandoc.org/), which
converts it into the final output format (HTML, PDF,
Word, etc.).
This two-step workflow allows R Markdown to generate many different types of output while maintaining reproducibility and flexibility.
When you render an R Markdown document, R will automatically generate the output file and display the report.
You can choose how to view the rendered document by clicking on the cog icon next to the Knit button in RStudio. This opens a drop-down menu with different preview options:
Viewer Pane (default) → Displays the output in the bottom right pane (shared with Files, Plots, Packages, and Help).
New Window → Opens the output in a separate window.
These options allow you to adjust the workflow based on your preferences.
Sometimes, instead of knitting the entire document, you may want to execute only one code chunk to check its output.
You can run an individual code chunk in two ways:
Click the “Run” Icon
Use a Keyboard Shortcut
Windows/Linux: Press
Ctrl + Shift + Enter
Mac: Press
Cmd + Shift + Enter
When you run a chunk, the code inside it is executed immediately and the output appears inline, directly below the chunk in the R Markdown document.
This feature is useful for debugging, testing, or incremental development, as it allows you to check specific parts of your analysis without re-knitting the entire document.
R Markdown allows you to render documents in multiple formats, depending on your needs.
The main output types include:
HTML
Word
Templates
Presentations
By default, R Markdown compiles documents into HTML (HyperText Markup Language) format.
HTML documents allow you to include interactive features, such as embedded plots, dynamic tables, and JavaScript-based visualizations.
If you prefer a Portable Document Format (PDF) output, R Markdown can generate high-quality PDF reports. However, note that:
Interactive features cannot be included in PDF documents.
Knitting to PDF requires LaTeX to be installed on your system.
If LaTeX is missing, you will see an error message like:
“No LaTeX installation detected (LaTeX is required to create PDF
output).”
To install LaTeX, you can choose one of the following options:
TinyTeX (Recommended for R users):
tinytex::install_tinytex()
MiKTeX (Windows): https://miktex.org
MacTeX (macOS): https://tug.org/mactex/
TeX Live (Linux): https://www.tug.org/texlive/
If you have Microsoft Word installed, you can render R Markdown documents as .docx files.
However, similar to PDFs, interactive elements (such as embedded plots and dynamic tables) will not be included.
R Markdown also supports creating slide presentations in various formats:
ioslides (HTML-based slides)
Slidy (HTML-based slides)
Beamer (PDF-based slides, requires LaTeX)
PowerPoint (requires Microsoft PowerPoint)
While R Markdown presentations are useful, they may lack flexibility compared to dedicated presentation tools.
Shiny is an R package that enables the creation of interactive web applications.
You can embed a Shiny app within an R Markdown document to allow real-time user interaction.
Shiny applications require an HTML output and cannot be rendered in PDF or Word formats.
R Markdown provides predefined templates for different document types, reducing the need to manually set up formatting. These templates can be accessed when creating a new R Markdown file in RStudio.
Additionally, custom templates can be specified in the YAML header, allowing further customization of document formats.
In the following sections, we will explore in more detail the three key components of an R Markdown document:
Markdown Text (for formatting content)
The YAML Header (for setting document properties)
Code Chunks (for executing R code)
R Markdown uses Markdown, a lightweight markup language designed to be easy to read and write. Markdown allows you to format text in various ways, making it possible to create structured and well-formatted documents.
# Header 1
## Header 2
### Header 3
#### Header 4
##### Header 5
###### Header 6
Alternatively, for Header 1 and Header 2, an underline style can be used:
Header 1
========
Header 2
--------
You can apply italics, bold, and other text styles using the following syntax:
Italic → Use one asterisk (*italic*
) or one
underscore (_italic_
).
Bold → Use two asterisks (**bold**
)
or two underscores (__bold__
).
Bold & Italic → Use three asterisks
(***bold and italic***
) or three underscores
(___bold and italic___
).
Strikethrough → Use two tildes
(~~strikethrough~~
).
To create a bulleted list, use *
,
+
, or -
, followed by a space.
To add sub-bullets, insert two spaces and use
*
, +
, or -
, followed by a
space:
Rendered Output:
To create an ordered list, follow the same steps as
for unordered lists, but use numbers followed by a period
(.
) instead of asterisks (*
).
To create sub-items within an ordered list, insert
three spaces before the sub-item and use either a
number (for ordered sub-lists) or a symbol like *
(for
unordered sub-lists).
One line
Second line
Third line
Markdown will automatically adjust the numbering when rendered, even
if you use 1.
for all items.
To create a link to a section within the document, simply enclose the section name in square brackets. For example:
[Introduction]
This will render as:
[Introduction]
Clicking on “Introduction” above will take you directly to the start of that section.
If you paste a full URL into your document it will automatically be recognized as a hyperlink. For example:
https://bookdown.org/yihui/rmarkdown
If you want to hide the full URL and display it as text, enclose the text in square brackets and follow it with the URL in parentheses. For example:
[markdown link](https://bookdown.org/yihui/rmarkdown/)
This will render as:
You can insert images into your document using a syntax similar to
that of hyperlinks, with the addition of an exclamation mark
(!
) before the square brackets.
In the example below, you can see the file path to the image. This image will render without a caption:

To add a caption for the image, place the caption text between the square brackets, like this:

This will render the image with the caption Rmarkdown symbol.
You can also control the size of the image. For example, to set the image width to 100 pixels:
{width=100px}
This will render the image with a width of 100px.
Alternatively, you can use percentages to control the size:
{width=20%}
This will render the image at 20% of its original size.
To center the image, you can use HTML tags. For example:
If you want to insert an image inline with the text, you can simply place the image directly in the line.
For example:
This is the Rmarkdown symbol:
To start a new paragraph, you need to leave an empty line between the
previous paragraph and the new one. If you add multiple spaces by
pressing “ENTER,” it won’t create additional space. You need to use
<br />
(or $~$
) to create extra line
breaks.
For example:
I need
space
\(~\)
I need
more space
This will add the required spaces between the lines.
Tabbed sections can be added to an HTML document, which will fold the
subsequent sub-sections into separate tabs. To create a tabbed section,
specify a section header followed by {.tabset}
.
For example:
Whole document settings can be controlled through parameters in the YAML header. YAML (Yet Another Markup Language) is a simple, human-readable format used by R Markdown to control various details of the document’s output.
A YAML header contains arguments such as title
,
author
, and output
, each demarcated by three
dashes (---
) on both ends.
The title
(e.g., title: "My Title"
)
will appear at the head of the document with a larger font size than the
rest of the text.
You can also specify a subtitle, which will appear below the title and in a slightly smaller font size.
You can specify one or more authors, separating multiple names with commas.
You can supply a static date (e.g.,
date: 2022-01-13
) or a dynamic date that updates each time
you knit the document using Sys.Date()
.
By manipulating the YAML header, you can add a Table of Contents (TOC), which can either appear at the beginning of the document or float.
Important: Be sure to pay attention to indentations in the YAML header.
You can change the styling of your document (such as font type,
color, and size) by adding the theme
parameter. Many theme
examples can be found at Data
Dreaming’s R Markdown Theme Gallery.
The output
option allows you to specify the type of
document you want to create. This will be auto-populated if you generate
the .Rmd
file in RStudio via the “New R Markdown” file
option. You can manually modify the output type, but you must use valid
arguments.
Some valid output types include:
html_document
pdf_document
word_document
You can even specify multiple document types to render simultaneously:
R Markdown documents can include one or more parameters, which allow you to re-render the same report with different values for key inputs. For example, you could generate different sales reports by branch or exam results by student.
To declare parameters, use the params
field in the YAML
header.
Pandoc can automatically generate citations and a bibliography in various styles. To use this feature, specify a bibliography file in the YAML header:
bibliography: rmarkdown.bib
Supported formats include BibTeX, BibLaTeX, EndNote, and Medline.
To create citations within the .Rmd
file, use the
citation key from your bibliography file preceded by @
. For
example:
Blah blah [@smith04; @doe99]
For in-text citations, omit the square brackets:
@smith04 says blah, or @smith04 [p. 33] says blah
You can change the citation and bibliography style by referencing a
CSL (Citation
Style Language) file in the csl
field:
bibliography: rmarkdown.bib
csl: apa.csl
To integrate R code into your document and create reproducible objects (e.g., figures, tables, and text), you need to insert a code chunk. There are three ways to do this:
Use the keyboard shortcut Cmd + Option + I
(Mac) or
Ctrl + Alt + I
(Windows/Linux).
Use the “Insert” button icon in the editor toolbar.
Manually type the chunk delimiters: ````{r}and
```
`.
You can run the code using Cmd/Ctrl + Enter
, and run the
entire chunk with Cmd/Ctrl + Shift + Enter
.
The chunk header consists of ````{r}` followed by an optional chunk name and any additional options (comma-separated), and the chunk ends with three backticks.
Chunks can be given an optional name:
```{r by-name}
.
This has three advantages:
Clearer names for graphics produced by the chunk, which makes them easier to use elsewhere.
Reuse of cached chunks, which helps avoid
re-performing expensive computations.
A special chunk name,
setup
, is run automatically once before any other code when
in notebook mode.
Knitr provides nearly 60 options for customizing your code chunks. Some key options include:
eval = FALSE
: Prevents code from being
evaluated.
This is useful when you want to display the code in your document but
don’t want it to be executed. For example, you may want to show the
structure of your code as an example but don’t want to actually run
it.
include = FALSE
: Runs the code, but hides the code
and its results in the output.
This is useful for setting up things that don’t need to be visible in
the document, like data loading or preprocessing, while ensuring that
the code is still executed. The results and the code are hidden, but
they affect the final output.
echo = FALSE
: Prevents code from being shown, but
includes the results.
If you want to hide the R code but display the results (e.g., plots or
tables), you can use this option. This is useful when you want the focus
to be on the results, not the underlying code.
message = FALSE
or warning = FALSE
:
Prevents messages or warnings from appearing.
Sometimes code produces warnings or messages that are not critical to
the document’s understanding, but they can clutter the output. Setting
message = FALSE
hides any informational messages, and
warning = FALSE
hides warning messages.
results = 'hide'
: Hides printed output.
This option is useful when you want to execute code but do not want the
printed results (e.g., the result of a summary()
or
print()
function) to appear in the output. The code will be
executed, but no results will be displayed.
fig.show = 'hide'
: Hides plots.
This option is useful when you want to run code that produces a plot but
do not want the plot to be displayed in the document. You might want to
generate the plot for later use but not show it in the current
output.
error = TRUE
: Allows the render to continue even if
code returns an error.
This option is useful when you want to allow errors to appear in the
document or when you’re debugging and want the document to continue
rendering even if some code fails.
You can see the full list at http://yihui.name/knitr/options/.
The following table summarizes which types of output each option suppresses:
You can also add fig.cap = "..."
adds a
caption to graphical results.
By default, R Markdown displays data frames and matrices as they appear in the console:
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
For more customized formatting, use the knitr::kable
function.
For more advanced customization, consider using packages like
xtable
, stargazer
, pander
,
tables
, or ascii
.
By default, each time you knit a document, it starts from scratch.
This is ideal for reproducibility since it ensures all important computations are captured in the code. However, it can be time-consuming if some computations take a long time to run.
The solution is to use cache = TRUE
.
When enabled, it saves the output of a chunk to a uniquely named file on disk. On subsequent runs, Knitr checks if the code has changed. If it hasn’t, the cached results are reused, saving time.
You can change default chunk options globally using
knitr::opts_chunk$set()
.
For example, you can set the following at the beginning of your R Markdown document to hide code, messages, warnings, and errors by default:
In this case, if you want to display code (with echo = TRUE) for specific chunks you have to specify it.
Important note: if you set
message = FALSE
andwarning = FALSE
, that would make it harder to debug problems because you would not see any messages in the final document.
You can also embed R code directly in the text with the following syntax:
`r <expression>`
This is useful when you want to refer to properties of your data within the narrative.
That results in: