2025-03-19

Karyotype

The karyotype is the appearence of the complete set of chromosomes of a species. Often it is useful to visualize data spatially on genome, for example to visualize distribution of genes, copy numbers, ChIP-seq peaks, gene expression results and much more.

All of that is possible just exploiting one package, karyoploteR (see https://bernatgel.github.io/karyoploter_tutorial/).

Next, you can see some of the examples the authors did.

In the following we will learn how to create karyoplots, that are representations of whole genomes with arbitrary data plotted on them. We will explore building blocks and some more complex examples, nevertheless if you anhele to keep exploring, all codes from karyoploteR authors are available on github.

Installing karyoploteR

karyoploteR is a Bioconductor’s package, so we can install it by using:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("karyoploteR")

Make a plain karyotype

Suppose we want to plot the hg19 version of human genome. We have to:

library(karyoploteR)

kp <- plotKaryotype(genome = "hg19") # we choose the genome we are interested in

We can also decide to plot some chromosomes of choice:

kp <- plotKaryotype(genome = "hg19", chromosomes = c("chr1", "chr2") )

Moreover, we can modify the plot type. In fact, seven different visualization types are available. This is useful in case you have different data to plot on.

plot.type=1 (default), horizontal ideograms with a single data panel above them
plot.type = 2; horizontal ideograms with two data panels, one above and one below them
plot.type = 3; horizontal ideograms with all chromosomes in a single line with two data panels, one above and one below them
plot.type = 4; horizontal ideograms with all chromosomes in a single line with one data panel above
plot.type = 5; horizontal ideograms with all chromosomes in a single line with one data panel below them
plot.type = 6; horizontal ideograms with NO data panels. Only plotting in the ideograms is possible
plot.type = 7; horizontal ideograms with all chromosomes in a single line with NO data panels. Only plotting in the ideograms is possible.

kp <- plotKaryotype(genome = "hg19", chromosomes = c("chr1", "chr2"),plot.type = 4 )

kp <- plotKaryotype(genome = "hg19", chromosomes = c("chr1", "chr2"),plot.type = 6 )

We can add base numbers information or add labels to cytobands (it can be useful if we plot little chromosomes or subsets of it):

kp <- plotKaryotype(genome = "hg19", chromosomes = c("chrX", "chrY"))
kpAddBaseNumbers(kp)

kpAddCytobandLabels(kp, force.all=TRUE, srt=90, col="purple", cex=1)

If we are not interested in complete chromosomes, we can zoom a specific region using genomic ranges. We can do this by using a GenomicRanges object or by relying on a dependence of karyotypeR, regioneR::toGRanges(). Notice that only one region is allowed at time.

library(IRanges)
library(GenomicRanges)
ranges=IRanges(start = 1, end=100000000)
granges=GRanges(seqnames = c("chr1"), ranges = ranges)

kp <- plotKaryotype(genome = "hg19", zoom=granges)

kp <- plotKaryotype(genome = "hg19", zoom=regioneR::toGRanges("chr1:1-100000000"))

Labels and axes

Specify labels to data we are plotting may help the viewer understanding graph, in particular when we have more than one data panel. To add labels we can use kpAddLabels( ) function. Labels will appear on the left side of the plot, in the left margin, next to a data panel. The data panel linked to each label is specified using the data.panel parameter.

Let’s make a plot of plot.type=2.

kp <- plotKaryotype(genome = "hg19", plot.type=2, chromosomes="chr3")
kpDataBackground(kp, data.panel = 1, color = "aliceblue") # we also add background color to make the panel visible
kpDataBackground(kp, data.panel = 2, color = "antiquewhite") # we also add background color to make the panel visible
kpAddLabels(kp, labels = "Label1", data.panel=1 ) # upper panel
kpAddLabels(kp, labels = "Label12", data.panel=2 ) # upper panel

Another fundamental addition are scales. In fact, without them we cannot understand the span of data we have.

By default, if you use kpAxis() function, karyoploteR insert a [0,1] scale. Il you want to modify it you can fix ymin and ymax. You can also define number of thicks, side, color etc.:

kp <- plotKaryotype(genome = "hg19", plot.type=2, chromosomes="chr3")
kpDataBackground(kp, data.panel = 1, color = "aliceblue") # we also add background color to make the panel visible
kpDataBackground(kp, data.panel = 2, color = "antiquewhite") # we also add background color to make the panel visible
kpAddLabels(kp, labels = "Label1", data.panel=1 ) # upper panel
kpAddLabels(kp, labels = "Label12", data.panel=2 ) # upper panel
kpAxis(kp,  data.panel=1, side=2, col = "violet", text.col = "navyblue") 
kpAxis(kp,  data.panel=2, ymin=-8, ymax=4, numticks = 3, side=2)

If you have more than one chromosome and you want to specify different scales, specify chromosome into kpAxis( )

Low-level plot functions

Plot points

To plot points in a karyoplot we need to use the kpPoints function. We will need a character vector with chromosome information and two integer vectors with positions or a GenomicRanges object.

x=seq(1, 248956422, 9958257)
y=rnorm(n=25, mean = 0.5, sd = 0.2)


kp=plotKaryotype(genome = "hg19", chromosomes = "chr1")
kpPoints(kp,chr="chr1", x=x, y=y) # I do not need a vector for chromosome here, as I fixed "chr1"

Let’s try adding more point levels and modifying graphical parameters:

library(viridis)
x=seq(1, 248956422, 9958257)
y=rnorm(n=25, mean = 0.5, sd = 0.2)

x2=seq(4979129, 234019040, 9543330)
y2=rnorm(n=24, mean = 0.5, sd = 0.25)

x3=seq(2489564, 246466858, 4879546)
y3=rnorm(n=50, mean = 0.5, sd = 0.1)


kp=plotKaryotype(genome = "hg19", chromosomes = "chr1")
kpPoints(kp,chr="chr1", x=x, y=y, col=rainbow(length(x)), cex=2) # change color and point dimension
kpPoints(kp,chr="chr1", x=x2, y=y2, col=viridis(n = length(x2), alpha = 0.5), cex=4) # change color (with transparency) and point dimension
kpPoints(kp,chr="chr1", x=x3, y=y3, cex=0.5, pch=1:23) # change dimension and shape

Plot text

Instead of points or coupled with them we may want to print text. We just need to add to our coordinates a vector with labels and use kpText( ) function. We can specify the family font, rotation, colour etc.

x=seq(1, 248956422, 9958257)
y=rnorm(n=25, mean = 0.5, sd = 0.2)
labels = paste0("pos", c(1:25))

kp=plotKaryotype(genome = "hg19", chromosomes = "chr1")
kpText(kp,chr="chr1", x=x, y=y, labels = labels, family="sans",col="antiquewhite3",  srt=90) # I do not need a vector for chromosome here, as I fixed "chr1"

Plot lines

Using the same information we used for plotting points, we can plot lines using kpLines:

x=seq(1, 248956422, 9958257)
y=rnorm(n=25, mean = 0.5, sd = 0.2)


kp=plotKaryotype(genome = "hg19",chromosomes = "chr1")
kpLines(kp,chr="chr1", x=x, y=y) # I do not need a vector for chromosome here, as I fixed "chr1"

Also here we can customize plot by playing with graphical parameters:

x=seq(1, 248956422, 9958257)
y=rnorm(n=25, mean = 0.5, sd = 0.2)

x2=seq(4979129, 234019040, 9543330)
y2=rnorm(n=24, mean = 0.5, sd = 0.25)

x3=seq(2489564, 246466858, 4879546)
y3=rnorm(n=50, mean = 0.5, sd = 0.1)


kp=plotKaryotype(genome = "hg19", chromosomes = "chr1")
kpLines(kp,chr="chr1", x=x, y=y, col="magenta", lwd=2) # change color and thickness
kpLines(kp,chr="chr1", x=x2, y=y2, col="forestgreen", lty=2, lwd=3)  # change color, appearence and thickness
kpLines(kp,chr="chr1", x=x3, y=y3,col="cornflowerblue", lty=3, lwd=5) # change color, appearence and thickness

Plot segments

Usually, when you infer copy numbers the final result are large segments that together cover almost all genome and you want visualize them. For this example (and more others) you can use kpSegments(). Let’s make a Granges object that contains both chromosomal coordinates and y values for start of the segment (y0) and end (y1).As we want to simulate copy numbers data y0 and y1 will be the same. If you want the function to use automatically Granges columns for y0 and y1 you have to call them like that.

values=c(0,1,2,3,4,5,2,2,2,2)
df=cbind.data.frame(
  chr=c(rep("chr1", 23), rep("chr2", 23), rep("chr3", 18)),
  start=c(rep(1:23*10e6, 2), 1:18*10e6),
  end=c(rep(2:24*10e6, 2), 2:19*10e6),
  y0=sample(values, 64, replace=TRUE)
)
df$y1=df$y0
df$col=c(rep("red", 23), rep("orange", 23), rep("darkgoldenrod1",18))

gr=makeGRangesFromDataFrame(df, keep.extra.columns = T)


kp <- plotKaryotype(genome = "hg19", plot.type=3, chromosomes = c("chr1", "chr2", "chr3"))
kpSegments(kp,data = gr, col=gr$col, ymin = 0, ymax=5) # we have to specify ymin and ymax as our data exceed [0,1]
kpAddLabels(kp, labels = "Invented CNV" , srt=90) # rotate label
kpAxis(kp,  data.panel=1, side=2, ymin=0, ymax=5, tick.pos = c(0,1,2,3,4,5))

Plot bars

If we want to show values across the genome we can choose to use a barplot. We can always use GRanges objects or create vectors ad hoc. We can:

x0 <- 1:23*10e6
x1 <- 2:24*10e6
y1 <- rnorm(23, mean=0.5, sd=0.1)

kp <- plotKaryotype(chromosomes="chr1", genome="hg19")

kpBars(kp, chr="chr1", x0=x0, x1=x1, y1=y1, col=heat.colors(23), border="blue")

High-level plot functions

Plotting density

In the Bioconductor module we learnt how to extract informations from a txDb object. Let’s load TxDb.Hsapiens.UCSC.hg19.knownGene and extract genes.

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
genes <- genes(txdb)

Now, we can make a karyotype plot with gene density by simply:

kp <- plotKaryotype(genome="hg19")
kpAddBaseNumbers(kp)
kpPlotDensity(kp, genes, col="lavender")

We can also modify window size used to perform density. Default is 10^6. Notice that if you decrease the window the evaluation of density will be more precise but the computation time will increase. Let’s make an example on chrY (little chromosome with a little number of genes).

genes_chrY=genes[seqnames(genes)=="chrY"]
kp <- plotKaryotype(genome="hg19", chromosomes = "chrY")
kpAddBaseNumbers(kp)
kpPlotDensity(kp, genes, col="yellow", window.size = 100)

As we saw in the previous paragraph, we can use different plot types:

genes_chr1=genes[seqnames(genes)=="chr1"]
kp <- plotKaryotype(genome="hg19", plot.type=6, chromosome="chr1")
kpAddBaseNumbers(kp)
kpPlotDensity(kp, genes_chr1, col="pink")

genes_chr1_2=genes[seqnames(genes)%in%c("chr1", "chr2")]
kp <- plotKaryotype(genome="hg19", plot.type=4, chromosome=c("chr1", "chr2"))
kpAddBaseNumbers(kp)
kpPlotDensity(kp, genes_chr1_2, col="pink")

Obviously, we can make the same plot using transcripts, exons, promoters and so on.

Plot markers

We can add to our plot markers indicating gene positions or regions of interest. We can add them by using both a GRanges object or specifying vector of coordinates:

ranges=IRanges(start = c(38058757, 7571720, 65541842), end=c(38064325, 7590868, 65569262))
granges=GRanges(seqnames = c("chr14", "chr17", "chr14"), ranges = ranges) # genomic positions in hg19 assembly of FOXA1, TP53 and MAX genes. You can find them by inspecting TxDb object or interrogating the Genome Browser from UCSC https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr2%3A25383722%2D25391559&hgsid=1560392251_nXl2BBNq5FsALF9e2zfvvesc01vw
granges$labels<-c("FOXA1", "TP53", "MAX")


kp <- plotKaryotype(genome="hg19", chromosomes = c("chr14", "chr17"))
kpAddBaseNumbers(kp)
kpPlotMarkers(kp, data = granges, text.orientation = "horizontal")

# or

kp <- plotKaryotype(genome="hg19", chromosomes = c("chr14", "chr17"))
kpAddBaseNumbers(kp)

kpPlotMarkers(kp,chr=c("chr14", "chr17", "chr14"), x= c(38058757, 7571720, 65541842), labels=c("FOXA1", "TP53", "MAX"),
              text.orientation = "horizontal", # these time I put them in horizontal
              line.color = "grey",
              label.color = "black")

# we can also modify distance from the karyotype inserting a y value or a vector of them

kp <- plotKaryotype(genome="hg19", chromosomes = c("chr14", "chr17"))
kpAddBaseNumbers(kp)

kpPlotMarkers(kp,chr=c("chr14", "chr17", "chr14"), x= c(38058757, 7571720, 65541842), y=0.1, labels=c("FOXA1", "TP53", "MAX"),
              text.orientation = "horizontal", #  I put them in horizontal
              line.color = "grey",
              label.color = "black")

Plot links

Another information we can add are links between chromosomal regions, both intra-chromosomal and inter-chromosomal. This can be useful, for example, if we want to link enhancers to genes they regulate.

For example:

# Let's build two Granges object of the same length, representing start and end ranges for the links

reg1=GRanges(seqnames = c("chr1", "chr1", "chr1", "chr2"), ranges = IRanges(start = c(30000,19888888, 3, 6000000), end=c(30030,19888888, 40000000, 6000300)))

reg2=GRanges(seqnames = c("chr1","chr1", "chr2", "chr2"), ranges = IRanges(start = c(248387320,3000000,20000900, 80000000),end=c(248387321,3300000,20010900, 100700000)))

reg1

## GRanges object with 4 ranges and 0 metadata columns:
##       seqnames          ranges strand
##          <Rle>       <IRanges>  <Rle>
##   [1]     chr1     30000-30030      *
##   [2]     chr1        19888888      *
##   [3]     chr1      3-40000000      *
##   [4]     chr2 6000000-6000300      *
##   -------
##   seqinfo: 2 sequences from an unspecified genome; no seqlengths

reg2

## GRanges object with 4 ranges and 0 metadata columns:
##       seqnames              ranges strand
##          <Rle>           <IRanges>  <Rle>
##   [1]     chr1 248387320-248387321      *
##   [2]     chr1     3000000-3300000      *
##   [3]     chr2   20000900-20010900      *
##   [4]     chr2  80000000-100700000      *
##   -------
##   seqinfo: 2 sequences from an unspecified genome; no seqlengths

kp <- plotKaryotype(genome="hg19", chromosomes = c("chr1", "chr2"))
kpAddBaseNumbers(kp)
kpPlotLinks(kp,data=reg1, data2=reg2,col="#FAC7FFAA", border = "violet")

Plot regions

The function kpPlotRegions can be used to plot regions along the genome. It can be useful to plot CpG island positions, transcription factor’s bindings and so on. Regions are plotted as rectangles and the function includes an option (active by default) to plot overlapping regions as a stack. This function only accepts regions as GRanges objects as input.

Let’s create a set of random regions using regioneR’s createRandomRegions function and plot them on the genome.

regions <- createRandomRegions(nregions=800, length.mean = 3e6, mask = NA, non.overlapping = FALSE)
kp <- plotKaryotype()
kpPlotRegions(kp, data=regions, col="magenta", border = "black")

Slightly more complex examples

Lollipop plot

If you want an alternative to barplot that can be plotted onto a karyotype, you can build a lollipop plot by combining segments and points.

x=seq(1, 248956422, 9958257)
y=rnorm(n=25, mean = 0.5, sd = 0.3)
y[y>1]=1

kp=plotKaryotype(chromosomes = "chr1")
kpPoints(kp,chr="chr1", x=x, y=y, col=rainbow(length(x)), cex=2)
kpSegments(kp,chr="chr1", x0=x, x1=x, y0=0, y1=y,col=rainbow(length(x)))
kpAxis(kp, side=2, cex=0.7) 
kpAddLabels(kp, "Measure X", srt=90, r0=0.9)

SNP-array plot

An immediate application of plotting points is to visualize copy number calling results from SNP array. In fact, to have a general overwiew, we need to visualize distribution of values across all the genome.

In this case we choose plot.type=4 to have all information in one line. Notice that the panel above the karyotype span from r0=0 to r1=1. As we want to plot both Beta-allele frequency (BAF) and log R ratio (LRR) we have to split the panel in two parts.

Fundamental quantities for defining Copy Number Variations

Beta-allele frequency (BAF). The B-Allele Frequency is a normalized measure of the allelic intensity ratio of two alleles (A and B), such that a BAF of 1 or 0 indicates the complete absence of one of the two alleles (e.g. AA or BB), and a BAF of 0.5 indicates the equal presence of both alleles (e.g. AB).

log R ratio (LRR). The log R ratio (LRR) is the normalized measure of signal intensity for each SNP marker, in array chips. It is calculated taking the log2 of the ratio between the observed and expected signal for two copies of the genome.

Plot

Let’s make the figure! We will use the example file available in karyoploteR, https://bernatgel.github.io/karyoploter_tutorial//Examples/SNPArray/SNPArray.html, excluding sexual chromosomes.

df <-read.table(file = "Data/SNPArray.26T.txt", sep="\t", header=TRUE, stringsAsFactors = FALSE)
df=subset(df, !Chr%in%c("X", "Y", "XY"))
head(df)

##     SNP.Name Sample.ID Allele1...Top Allele2...Top GC.Score Sample.Name
## 1 rs11152336        34             A             C   0.7622         26T
## 2  rs4458740        34             A             C   0.8454         26T
## 3  rs4264565        34             C             C   0.8317         26T
## 4  rs4474684        34             G             G   0.7729         26T
## 5  rs7677662        34             A             A   0.7739         26T
## 6  rs4682434        34             A             C   0.4395         26T
##   Sample.Group Sample.Index SNP.Index SNP.Aux Allele1...Forward
## 1          26T           32     81073       0                 A
## 2          26T           32    443708       0                 A
## 3          26T           32    434421       0                 C
## 4          26T           32    444477       0                 C
## 5          26T           32    613842       0                 A
## 6          26T           32    456548       0                 T
##   Allele2...Forward Allele1...Design Allele2...Design Allele1...AB Allele2...AB
## 1                 C                A                C            A            B
## 2                 C                T                G            A            B
## 3                 C                G                G            B            B
## 4                 C                C                C            B            B
## 5                 A                A                A            A            A
## 6                 G                A                C            A            B
##   Allele1...Plus Allele2...Plus Chr  Position GT.Score Cluster.Sep   SNP
## 1              A              C  18  59864223   0.7679      0.7687 [A/C]
## 2              A              C   7 156089537   0.8203      0.7304 [T/G]
## 3              C              C   2 215283259   0.8107      1.0000 [T/G]
## 4              C              C  16    951045   0.7734      0.9690 [T/C]
## 5              A              A   4 145513053   0.7741      0.9954 [A/G]
## 6              T              G   3 112534069   0.7903      0.6738 [A/C]
##   ILMN.Strand Customer.Strand Top.Genomic.Sequence Plus.Minus.Strand Theta
## 1         TOP             TOP                   NA                 + 0.651
## 2         BOT             TOP                   NA                 - 0.615
## 3         BOT             TOP                   NA                 - 0.972
## 4         BOT             BOT                   NA                 + 0.974
## 5         TOP             TOP                   NA                 + 0.033
## 6         TOP             BOT                   NA                 - 0.330
##       R     X     Y X.Raw Y.Raw B.Allele.Freq Log.R.Ratio CNV.Value
## 1 0.642 0.243 0.398  1266  1753        0.6157     -0.3307        NA
## 2 1.137 0.465 0.672  2350  2903        0.4573      0.1754        NA
## 3 1.266 0.053 1.213   486  5697        1.0000     -0.0784        NA
## 4 1.161 0.046 1.115   432  4710        1.0000     -0.1139        NA
## 5 1.369 1.300 0.068  6213   460        0.0000     -0.0547        NA
## 6 0.465 0.296 0.169  1744   866        0.3379     -0.1655        NA
##   CNV.Confidence GC_probe GCC_LRR
## 1             NA     0.31   -0.26
## 2             NA     0.44    0.18
## 3             NA     0.31   -0.01
## 4             NA     0.62   -0.15
## 5             NA     0.36   -0.01
## 6             NA     0.30   -0.10

Let’s make a GenomicRanges object with information we need, i.e. BAF and LRR. Notice: in this dataframe chromosomes are written according to the Ensembl convention (only number or letter), so we have to transform them by adding “chr” (UCSC style).

library(GenomicRanges)
library(IRanges)

snp=GRanges(seqnames = paste0("chr", df$Chr), ranges = IRanges(start = df$Position, end=df$Position), BAF=df$B.Allele.Freq, LRR=df$Log.R.Ratio)

Now, we will create a scatter plot using the kpPoints function, putting BAF in the upper part of the data panel (r0=0.5 and r1=1) and the LRR in the lower part (r0=0 and r1=0.5). As we said we will use plot.type=4. Notice that to make a good visualization all data you use have to be between 0 and 1 (centered around 0.5), otherwise they can overlap with karyogram or other panels. If our data are not in this range we have to fix min and max values for y to make the plot rescaled. We know that BAF is a frequency, so its values are certainly between 0 and 1. Let’s check LRR.

rg=range(snp$LRR, na.rm = T) 
rg # we have to fix ymin and ymax

## [1] -4.0722  1.7004

kp=plotKaryotype(genome = "hg19",chromosomes=paste0("chr", 1:22), plot.type = 4)

kpPoints(kp, data=snp, y=snp$BAF, cex=0.3, r0=0.5, r1=1) 
kpPoints(kp, data=snp, y=snp$LRR, cex=0.3, r0=0, r1=0.5, ymax=rg[2], ymin=rg[1]) 

kpAxis(kp, side=2, cex=0.7)

We can modify our graph to make it more beautiful and meaningful (notice, for example, that values on Y axis are not correct).

pp <- getDefaultPlotParams(plot.type = 4) # retrieve parameters that control plot.type number 4
pp$data1inmargin <- 2 # reduce the margin between the data and the ideograms 

kp <- plotKaryotype(genome = "hg19",chromosomes=paste0("chr", 1:22),plot.type = 4, ideogram.plotter = NULL, plot.params = pp,labels.plotter = NULL)
kpAddCytobandsAsLine(kp)
kpAxis(kp, r0=0.4, r1=0.75)
kpPoints(kp, data=snp, y=snp$BAF, cex=0.3, r0=0.4, r1=0.75)
kpAxis(kp, tick.pos = c(rg[1], 0, rg[2]), r0=0, r1=0.35, ymax=rg[2], ymin=rg[1])
kpPoints(kp, data=snp, y=snp$LRR, cex=0.3, r0=0, r1=0.35, ymax=rg[2], ymin=rg[1])
kpAbline(kp, h=0, r0=0, r1=0.35, ymax=rg[2], ymin=rg[1], col="blue")

Genome Visualization