The karyotype is the appearence of the complete set of chromosomes of a species. Often it is useful to visualize data spatially on genome, for example to visualize distribution of genes, copy numbers, ChIP-seq peaks, gene expression results and much more.
All of that is possible just exploiting one package,
karyoploteR
(see https://bernatgel.github.io/karyoploter_tutorial/).
Next, you can see some of the examples the authors did.
In the following we will learn how to create karyoplots, that are
representations of whole genomes with arbitrary data plotted on them. We
will explore building blocks and some more complex examples,
nevertheless if you anhele to keep exploring, all codes from
karyoploteR
authors are available on github.
karyoploteR
is a Bioconductor’s package, so we can
install it by using:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("karyoploteR")
Suppose we want to plot the hg19
version of human
genome. We have to:
library(karyoploteR)
kp <- plotKaryotype(genome = "hg19") # we choose the genome we are interested in
We can also decide to plot some chromosomes of choice:
kp <- plotKaryotype(genome = "hg19", chromosomes = c("chr1", "chr2") )
Moreover, we can modify the plot type. In fact, seven different visualization types are available. This is useful in case you have different data to plot on.
plot.type=1
(default), horizontal ideograms with a
single data panel above themplot.type = 2
; horizontal ideograms with two data
panels, one above and one below themplot.type = 3
; horizontal ideograms with all
chromosomes in a single line with two data panels, one above and one
below themplot.type = 4
; horizontal ideograms with all
chromosomes in a single line with one data panel aboveplot.type = 5
; horizontal ideograms with all
chromosomes in a single line with one data panel below themplot.type = 6
; horizontal ideograms with NO data
panels. Only plotting in the ideograms is possibleplot.type = 7
; horizontal ideograms with all
chromosomes in a single line with NO data panels. Only plotting in the
ideograms is possible.kp <- plotKaryotype(genome = "hg19", chromosomes = c("chr1", "chr2"),plot.type = 4 )
kp <- plotKaryotype(genome = "hg19", chromosomes = c("chr1", "chr2"),plot.type = 6 )
We can add base numbers information or add labels to cytobands (it can be useful if we plot little chromosomes or subsets of it):
kp <- plotKaryotype(genome = "hg19", chromosomes = c("chrX", "chrY"))
kpAddBaseNumbers(kp)
kpAddCytobandLabels(kp, force.all=TRUE, srt=90, col="purple", cex=1)
If we are not interested in complete chromosomes, we can zoom a
specific region using genomic ranges. We can do this by using a
GenomicRanges
object or by relying on a dependence of
karyotypeR, regioneR::toGRanges()
. Notice that only one
region is allowed at time.
library(IRanges)
library(GenomicRanges)
ranges=IRanges(start = 1, end=100000000)
granges=GRanges(seqnames = c("chr1"), ranges = ranges)
kp <- plotKaryotype(genome = "hg19", zoom=granges)
kp <- plotKaryotype(genome = "hg19", zoom=regioneR::toGRanges("chr1:1-100000000"))
Specify labels to data we are plotting may help the viewer
understanding graph, in particular when we have more than one data
panel. To add labels we can use kpAddLabels( )
function.
Labels will appear on the left side of the plot, in the left margin,
next to a data panel. The data panel linked to each label is specified
using the data.panel parameter.
Let’s make a plot of plot.type=2
.
kp <- plotKaryotype(genome = "hg19", plot.type=2, chromosomes="chr3")
kpDataBackground(kp, data.panel = 1, color = "aliceblue") # we also add background color to make the panel visible
kpDataBackground(kp, data.panel = 2, color = "antiquewhite") # we also add background color to make the panel visible
kpAddLabels(kp, labels = "Label1", data.panel=1 ) # upper panel
kpAddLabels(kp, labels = "Label12", data.panel=2 ) # upper panel
Another fundamental addition are scales. In fact, without them we cannot understand the span of data we have.
By default, if you use kpAxis()
function, karyoploteR
insert a [0,1] scale. Il you want to modify it you can fix
ymin
and ymax
. You can also define number of
thicks, side, color etc.:
kp <- plotKaryotype(genome = "hg19", plot.type=2, chromosomes="chr3")
kpDataBackground(kp, data.panel = 1, color = "aliceblue") # we also add background color to make the panel visible
kpDataBackground(kp, data.panel = 2, color = "antiquewhite") # we also add background color to make the panel visible
kpAddLabels(kp, labels = "Label1", data.panel=1 ) # upper panel
kpAddLabels(kp, labels = "Label12", data.panel=2 ) # upper panel
kpAxis(kp, data.panel=1, side=2, col = "violet", text.col = "navyblue")
kpAxis(kp, data.panel=2, ymin=-8, ymax=4, numticks = 3, side=2)
If you have more than one chromosome and you want to specify
different scales, specify chromosome into kpAxis( )
To plot points in a karyoplot we need to use the
kpPoints
function. We will need a character vector with
chromosome information and two integer vectors with positions or a
GenomicRanges
object.
x=seq(1, 248956422, 9958257)
y=rnorm(n=25, mean = 0.5, sd = 0.2)
kp=plotKaryotype(genome = "hg19", chromosomes = "chr1")
kpPoints(kp,chr="chr1", x=x, y=y) # I do not need a vector for chromosome here, as I fixed "chr1"
Let’s try adding more point levels and modifying graphical parameters:
library(viridis)
x=seq(1, 248956422, 9958257)
y=rnorm(n=25, mean = 0.5, sd = 0.2)
x2=seq(4979129, 234019040, 9543330)
y2=rnorm(n=24, mean = 0.5, sd = 0.25)
x3=seq(2489564, 246466858, 4879546)
y3=rnorm(n=50, mean = 0.5, sd = 0.1)
kp=plotKaryotype(genome = "hg19", chromosomes = "chr1")
kpPoints(kp,chr="chr1", x=x, y=y, col=rainbow(length(x)), cex=2) # change color and point dimension
kpPoints(kp,chr="chr1", x=x2, y=y2, col=viridis(n = length(x2), alpha = 0.5), cex=4) # change color (with transparency) and point dimension
kpPoints(kp,chr="chr1", x=x3, y=y3, cex=0.5, pch=1:23) # change dimension and shape
Instead of points or coupled with them we may want to print text. We
just need to add to our coordinates a vector with labels and use
kpText( )
function. We can specify the family font,
rotation, colour etc.
x=seq(1, 248956422, 9958257)
y=rnorm(n=25, mean = 0.5, sd = 0.2)
labels = paste0("pos", c(1:25))
kp=plotKaryotype(genome = "hg19", chromosomes = "chr1")
kpText(kp,chr="chr1", x=x, y=y, labels = labels, family="sans",col="antiquewhite3", srt=90) # I do not need a vector for chromosome here, as I fixed "chr1"
Using the same information we used for plotting points, we can plot
lines using kpLines
:
x=seq(1, 248956422, 9958257)
y=rnorm(n=25, mean = 0.5, sd = 0.2)
kp=plotKaryotype(genome = "hg19",chromosomes = "chr1")
kpLines(kp,chr="chr1", x=x, y=y) # I do not need a vector for chromosome here, as I fixed "chr1"
Also here we can customize plot by playing with graphical parameters:
x=seq(1, 248956422, 9958257)
y=rnorm(n=25, mean = 0.5, sd = 0.2)
x2=seq(4979129, 234019040, 9543330)
y2=rnorm(n=24, mean = 0.5, sd = 0.25)
x3=seq(2489564, 246466858, 4879546)
y3=rnorm(n=50, mean = 0.5, sd = 0.1)
kp=plotKaryotype(genome = "hg19", chromosomes = "chr1")
kpLines(kp,chr="chr1", x=x, y=y, col="magenta", lwd=2) # change color and thickness
kpLines(kp,chr="chr1", x=x2, y=y2, col="forestgreen", lty=2, lwd=3) # change color, appearence and thickness
kpLines(kp,chr="chr1", x=x3, y=y3,col="cornflowerblue", lty=3, lwd=5) # change color, appearence and thickness
Usually, when you infer copy numbers the final result are large
segments that together cover almost all genome and you want visualize
them. For this example (and more others) you can use
kpSegments()
. Let’s make a Granges
object that
contains both chromosomal coordinates and y values for start of the
segment (y0
) and end (y1
).As we want to
simulate copy numbers data y0
and y1
will be
the same. If you want the function to use automatically Granges columns
for y0
and y1
you have to call them like
that.
values=c(0,1,2,3,4,5,2,2,2,2)
df=cbind.data.frame(
chr=c(rep("chr1", 23), rep("chr2", 23), rep("chr3", 18)),
start=c(rep(1:23*10e6, 2), 1:18*10e6),
end=c(rep(2:24*10e6, 2), 2:19*10e6),
y0=sample(values, 64, replace=TRUE)
)
df$y1=df$y0
df$col=c(rep("red", 23), rep("orange", 23), rep("darkgoldenrod1",18))
gr=makeGRangesFromDataFrame(df, keep.extra.columns = T)
kp <- plotKaryotype(genome = "hg19", plot.type=3, chromosomes = c("chr1", "chr2", "chr3"))
kpSegments(kp,data = gr, col=gr$col, ymin = 0, ymax=5) # we have to specify ymin and ymax as our data exceed [0,1]
kpAddLabels(kp, labels = "Invented CNV" , srt=90) # rotate label
kpAxis(kp, data.panel=1, side=2, ymin=0, ymax=5, tick.pos = c(0,1,2,3,4,5))
If we want to show values across the genome we can choose to use a barplot. We can always use GRanges objects or create vectors ad hoc. We can:
x0 <- 1:23*10e6
x1 <- 2:24*10e6
y1 <- rnorm(23, mean=0.5, sd=0.1)
kp <- plotKaryotype(chromosomes="chr1", genome="hg19")
kpBars(kp, chr="chr1", x0=x0, x1=x1, y1=y1, col=heat.colors(23), border="blue")
In the Bioconductor module we learnt how to extract informations from
a txDb
object. Let’s load
TxDb.Hsapiens.UCSC.hg19.knownGene
and extract genes.
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
genes <- genes(txdb)
Now, we can make a karyotype plot with gene density by simply:
kp <- plotKaryotype(genome="hg19")
kpAddBaseNumbers(kp)
kpPlotDensity(kp, genes, col="lavender")
We can also modify window size used to perform density. Default is
10^6. Notice that if you decrease the window the evaluation of density
will be more precise but the computation time will increase. Let’s make
an example on chrY
(little chromosome with a little number
of genes).
genes_chrY=genes[seqnames(genes)=="chrY"]
kp <- plotKaryotype(genome="hg19", chromosomes = "chrY")
kpAddBaseNumbers(kp)
kpPlotDensity(kp, genes, col="yellow", window.size = 100)
As we saw in the previous paragraph, we can use different plot types:
genes_chr1=genes[seqnames(genes)=="chr1"]
kp <- plotKaryotype(genome="hg19", plot.type=6, chromosome="chr1")
kpAddBaseNumbers(kp)
kpPlotDensity(kp, genes_chr1, col="pink")
genes_chr1_2=genes[seqnames(genes)%in%c("chr1", "chr2")]
kp <- plotKaryotype(genome="hg19", plot.type=4, chromosome=c("chr1", "chr2"))
kpAddBaseNumbers(kp)
kpPlotDensity(kp, genes_chr1_2, col="pink")
Obviously, we can make the same plot using transcripts, exons, promoters and so on.
We can add to our plot markers indicating gene positions or regions of interest. We can add them by using both a GRanges object or specifying vector of coordinates:
ranges=IRanges(start = c(38058757, 7571720, 65541842), end=c(38064325, 7590868, 65569262))
granges=GRanges(seqnames = c("chr14", "chr17", "chr14"), ranges = ranges) # genomic positions in hg19 assembly of FOXA1, TP53 and MAX genes. You can find them by inspecting TxDb object or interrogating the Genome Browser from UCSC https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr2%3A25383722%2D25391559&hgsid=1560392251_nXl2BBNq5FsALF9e2zfvvesc01vw
granges$labels<-c("FOXA1", "TP53", "MAX")
kp <- plotKaryotype(genome="hg19", chromosomes = c("chr14", "chr17"))
kpAddBaseNumbers(kp)
kpPlotMarkers(kp, data = granges, text.orientation = "horizontal")
# or
kp <- plotKaryotype(genome="hg19", chromosomes = c("chr14", "chr17"))
kpAddBaseNumbers(kp)
kpPlotMarkers(kp,chr=c("chr14", "chr17", "chr14"), x= c(38058757, 7571720, 65541842), labels=c("FOXA1", "TP53", "MAX"),
text.orientation = "horizontal", # these time I put them in horizontal
line.color = "grey",
label.color = "black")
# we can also modify distance from the karyotype inserting a y value or a vector of them
kp <- plotKaryotype(genome="hg19", chromosomes = c("chr14", "chr17"))
kpAddBaseNumbers(kp)
kpPlotMarkers(kp,chr=c("chr14", "chr17", "chr14"), x= c(38058757, 7571720, 65541842), y=0.1, labels=c("FOXA1", "TP53", "MAX"),
text.orientation = "horizontal", # I put them in horizontal
line.color = "grey",
label.color = "black")
Another information we can add are links between chromosomal regions, both intra-chromosomal and inter-chromosomal. This can be useful, for example, if we want to link enhancers to genes they regulate.
For example:
# Let's build two Granges object of the same length, representing start and end ranges for the links
reg1=GRanges(seqnames = c("chr1", "chr1", "chr1", "chr2"), ranges = IRanges(start = c(30000,19888888, 3, 6000000), end=c(30030,19888888, 40000000, 6000300)))
reg2=GRanges(seqnames = c("chr1","chr1", "chr2", "chr2"), ranges = IRanges(start = c(248387320,3000000,20000900, 80000000),end=c(248387321,3300000,20010900, 100700000)))
reg1
## GRanges object with 4 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 30000-30030 *
## [2] chr1 19888888 *
## [3] chr1 3-40000000 *
## [4] chr2 6000000-6000300 *
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
reg2
## GRanges object with 4 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 248387320-248387321 *
## [2] chr1 3000000-3300000 *
## [3] chr2 20000900-20010900 *
## [4] chr2 80000000-100700000 *
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
kp <- plotKaryotype(genome="hg19", chromosomes = c("chr1", "chr2"))
kpAddBaseNumbers(kp)
kpPlotLinks(kp,data=reg1, data2=reg2,col="#FAC7FFAA", border = "violet")
The function kpPlotRegions
can be used to plot regions
along the genome. It can be useful to plot CpG island positions,
transcription factor’s bindings and so on. Regions are plotted as
rectangles and the function includes an option (active by default) to
plot overlapping regions as a stack. This function only accepts regions
as GRanges objects as input.
Let’s create a set of random regions using regioneR’s
createRandomRegions
function and plot them on the
genome.
regions <- createRandomRegions(nregions=800, length.mean = 3e6, mask = NA, non.overlapping = FALSE)
kp <- plotKaryotype()
kpPlotRegions(kp, data=regions, col="magenta", border = "black")
If you want an alternative to barplot that can be plotted onto a karyotype, you can build a lollipop plot by combining segments and points.
x=seq(1, 248956422, 9958257)
y=rnorm(n=25, mean = 0.5, sd = 0.3)
y[y>1]=1
kp=plotKaryotype(chromosomes = "chr1")
kpPoints(kp,chr="chr1", x=x, y=y, col=rainbow(length(x)), cex=2)
kpSegments(kp,chr="chr1", x0=x, x1=x, y0=0, y1=y,col=rainbow(length(x)))
kpAxis(kp, side=2, cex=0.7)
kpAddLabels(kp, "Measure X", srt=90, r0=0.9)
An immediate application of plotting points is to visualize copy number calling results from SNP array. In fact, to have a general overwiew, we need to visualize distribution of values across all the genome.
In this case we choose plot.type=4
to have all
information in one line. Notice that the panel above the karyotype span
from r0=0
to r1=1
. As we want to plot both
Beta-allele frequency (BAF) and log R ratio (LRR) we have to split the
panel in two parts.
Beta-allele frequency (BAF). The B-Allele Frequency is a normalized measure of the allelic intensity ratio of two alleles (A and B), such that a BAF of 1 or 0 indicates the complete absence of one of the two alleles (e.g. AA or BB), and a BAF of 0.5 indicates the equal presence of both alleles (e.g. AB).
log R ratio (LRR). The log R ratio (LRR) is the normalized measure of signal intensity for each SNP marker, in array chips. It is calculated taking the log2 of the ratio between the observed and expected signal for two copies of the genome.
Let’s make the figure! We will use the example file available in karyoploteR, https://bernatgel.github.io/karyoploter_tutorial//Examples/SNPArray/SNPArray.html, excluding sexual chromosomes.
df <-read.table(file = "Data/SNPArray.26T.txt", sep="\t", header=TRUE, stringsAsFactors = FALSE)
df=subset(df, !Chr%in%c("X", "Y", "XY"))
head(df)
## SNP.Name Sample.ID Allele1...Top Allele2...Top GC.Score Sample.Name
## 1 rs11152336 34 A C 0.7622 26T
## 2 rs4458740 34 A C 0.8454 26T
## 3 rs4264565 34 C C 0.8317 26T
## 4 rs4474684 34 G G 0.7729 26T
## 5 rs7677662 34 A A 0.7739 26T
## 6 rs4682434 34 A C 0.4395 26T
## Sample.Group Sample.Index SNP.Index SNP.Aux Allele1...Forward
## 1 26T 32 81073 0 A
## 2 26T 32 443708 0 A
## 3 26T 32 434421 0 C
## 4 26T 32 444477 0 C
## 5 26T 32 613842 0 A
## 6 26T 32 456548 0 T
## Allele2...Forward Allele1...Design Allele2...Design Allele1...AB Allele2...AB
## 1 C A C A B
## 2 C T G A B
## 3 C G G B B
## 4 C C C B B
## 5 A A A A A
## 6 G A C A B
## Allele1...Plus Allele2...Plus Chr Position GT.Score Cluster.Sep SNP
## 1 A C 18 59864223 0.7679 0.7687 [A/C]
## 2 A C 7 156089537 0.8203 0.7304 [T/G]
## 3 C C 2 215283259 0.8107 1.0000 [T/G]
## 4 C C 16 951045 0.7734 0.9690 [T/C]
## 5 A A 4 145513053 0.7741 0.9954 [A/G]
## 6 T G 3 112534069 0.7903 0.6738 [A/C]
## ILMN.Strand Customer.Strand Top.Genomic.Sequence Plus.Minus.Strand Theta
## 1 TOP TOP NA + 0.651
## 2 BOT TOP NA - 0.615
## 3 BOT TOP NA - 0.972
## 4 BOT BOT NA + 0.974
## 5 TOP TOP NA + 0.033
## 6 TOP BOT NA - 0.330
## R X Y X.Raw Y.Raw B.Allele.Freq Log.R.Ratio CNV.Value
## 1 0.642 0.243 0.398 1266 1753 0.6157 -0.3307 NA
## 2 1.137 0.465 0.672 2350 2903 0.4573 0.1754 NA
## 3 1.266 0.053 1.213 486 5697 1.0000 -0.0784 NA
## 4 1.161 0.046 1.115 432 4710 1.0000 -0.1139 NA
## 5 1.369 1.300 0.068 6213 460 0.0000 -0.0547 NA
## 6 0.465 0.296 0.169 1744 866 0.3379 -0.1655 NA
## CNV.Confidence GC_probe GCC_LRR
## 1 NA 0.31 -0.26
## 2 NA 0.44 0.18
## 3 NA 0.31 -0.01
## 4 NA 0.62 -0.15
## 5 NA 0.36 -0.01
## 6 NA 0.30 -0.10
Let’s make a GenomicRanges object with information we need, i.e. BAF and LRR. Notice: in this dataframe chromosomes are written according to the Ensembl convention (only number or letter), so we have to transform them by adding “chr” (UCSC style).
library(GenomicRanges)
library(IRanges)
snp=GRanges(seqnames = paste0("chr", df$Chr), ranges = IRanges(start = df$Position, end=df$Position), BAF=df$B.Allele.Freq, LRR=df$Log.R.Ratio)
Now, we will create a scatter plot using the kpPoints
function, putting BAF in the upper part of the data panel
(r0=0.5
and r1=1
) and the LRR in the lower
part (r0=0
and r1=0.5
). As we said we will use
plot.type=4
. Notice that to make a good visualization all
data you use have to be between 0 and 1 (centered around 0.5), otherwise
they can overlap with karyogram or other panels. If our data are not in
this range we have to fix min and max values for y to make the plot
rescaled. We know that BAF is a frequency, so its values are certainly
between 0 and 1. Let’s check LRR.
rg=range(snp$LRR, na.rm = T)
rg # we have to fix ymin and ymax
## [1] -4.0722 1.7004
kp=plotKaryotype(genome = "hg19",chromosomes=paste0("chr", 1:22), plot.type = 4)
kpPoints(kp, data=snp, y=snp$BAF, cex=0.3, r0=0.5, r1=1)
kpPoints(kp, data=snp, y=snp$LRR, cex=0.3, r0=0, r1=0.5, ymax=rg[2], ymin=rg[1])
kpAxis(kp, side=2, cex=0.7)
We can modify our graph to make it more beautiful and meaningful (notice, for example, that values on Y axis are not correct).
pp <- getDefaultPlotParams(plot.type = 4) # retrieve parameters that control plot.type number 4
pp$data1inmargin <- 2 # reduce the margin between the data and the ideograms
kp <- plotKaryotype(genome = "hg19",chromosomes=paste0("chr", 1:22),plot.type = 4, ideogram.plotter = NULL, plot.params = pp,labels.plotter = NULL)
kpAddCytobandsAsLine(kp)
kpAxis(kp, r0=0.4, r1=0.75)
kpPoints(kp, data=snp, y=snp$BAF, cex=0.3, r0=0.4, r1=0.75)
kpAxis(kp, tick.pos = c(rg[1], 0, rg[2]), r0=0, r1=0.35, ymax=rg[2], ymin=rg[1])
kpPoints(kp, data=snp, y=snp$LRR, cex=0.3, r0=0, r1=0.35, ymax=rg[2], ymin=rg[1])
kpAbline(kp, h=0, r0=0, r1=0.35, ymax=rg[2], ymin=rg[1], col="blue")