GenomicRanges package (Genomic ranges)

Read the file Arabidopsis_thaliana_TSSs.rds, located in the Datasets folder, and assign it to a variable named tss.
This file contains a GenomicRanges object representing the transcription start site (TSS) coordinates of Arabidopsis thaliana transcripts.

tss = readRDS("Exercises/Datasets/Arabidopsis_thaliana_TSSs.rds")

Which chromosomes are present in Arabidopsis thaliana?

library(GenomicRanges)
seqnames(tss)

## factor-Rle of length 41671 with 7 runs
##   Lengths: 10646  6855  8219  6384  9288   146   133
##   Values :    1     2     3     4     5     Mt    Pt
## Levels(7): 1 2 3 4 5 Mt Pt

Subset tss, keeping only sequences from sequence 1, and reassign the result to the variable tss.

tss = tss[seqnames(tss) == 1]

Extend the ranges: we want each range to be 1050 bases wide, but only by extending the start position.
Hint: Pay attention to which position should remain fixed. Reassign the result to the variable tss.

tss = resize(tss,width = 1050, fix = "end")

Shift the ranges by 50 bases and reassign the result to the variable tss.

tss = shift(tss, shift = 50)

Merge overlapping ranges while considering strand information. Reassign the result to the variable tss.

tss = reduce(tss)

Import the BED file Arabidopsis_casual_regions.bed, located in the Datasets folder.

library(rtracklayer)
bed = import.bed("Exercises/Datasets/Arabidopsis_casual_regions.bed")

Find overlaps between the extended and shifted regions obtained earlier and the intervals in Arabidopsis_casual_regions.bed. Then:

Extract the overlapping regions from the first dataset.
Extract the overlapping regions from the second dataset.
Concatenate the results from the previous steps and export them as a BED file.

OV = findOverlaps(tss, bed)

OV_TSS = tss[queryHits(OV)]

OV_BED = bed[subjectHits(OV)]

OV_MERGE = c(OV_TSS,OV_BED)

export.bed(OV_MERGE, "Bed_file_overlaps.bed")

Install the BSgenome Object for Arabidopsis thaliana (use BSgenome.Athaliana.TAIR.TAIR9).
Create a GenomicRanges Object with the following values:

chr = c("Chr5", "Chr1", "Chr4")
start = c(129999, 340, 23456)
end = c(130000, 400, 23470)

Extract the base sequences of the genomic intervals contained in this GenomicRanges object and assign them to the variable sequences.What type of object do you obtain?

#BiocManager::install("BSgenome.Athaliana.TAIR.TAIR9")
library(BSgenome.Athaliana.TAIR.TAIR9)


intervals = IRanges(start = c(129999, 340, 23456)
                    , end = c(130000, 400, 23470))

GR = GRanges(seqnames = c("Chr5", "Chr1", "Chr4")
             , ranges = intervals)
GR

## GRanges object with 3 ranges and 0 metadata columns:
##       seqnames        ranges strand
##          <Rle>     <IRanges>  <Rle>
##   [1]     Chr5 129999-130000      *
##   [2]     Chr1       340-400      *
##   [3]     Chr4   23456-23470      *
##   -------
##   seqinfo: 3 sequences from an unspecified genome; no seqlengths

sequences = getSeq(BSgenome.Athaliana.TAIR.TAIR9, GR)
sequences

## DNAStringSet object of length 3:
##     width seq
## [1]     2 AT
## [2]    61 TTGTAACCTTAGGGTTGGTTTATCTCAAGAATCTTATTAATTGTTTGGACTGTTTATGTTT
## [3]    15 AGGAAGTTGTCCCAG

Compute the reverse complement of the genomic sequences stored in the variable sequences.

reverseComplement(sequences)

## DNAStringSet object of length 3:
##     width seq
## [1]     2 AT
## [2]    61 AAACATAAACAGTCCAAACAATTAATAAGATTCTTGAGATAAACCAACCCTAAGGTTACAA
## [3]    15 CTGGGACAACTTCCT

Create a GenomicRanges Object using the following coordinates:

chrom = c("chr1", "chr2", "chrM")
start = c(1000000; 120000; 340000)
end = c(10000000; 560000; 98000)

What do you obtain? Why?

library(IRanges)

intervals = IRanges(start = c(1000000, 120000, 340000)
                    , end = c(10000000, 560000, 98000)
                    )

library(GenomicRanges)

GR = GRanges(seqnames = c("chr1", "chr2", "chrM")
             , ranges = intervals)
GR

Create a GenomicRanges object using the following coordinates:

chrom = c("chr1", "chr2", "chrM")
start = c(1000000, 120000, 340000)
end = c(10000000, 560000, 9,800000)

intervals = IRanges(start = c(1000000, 120000, 340000)
                    , end = c(10000000, 560000, 9800000))

GR = GRanges(seqnames = c("chr1", "chr2", "chrM")
             , ranges = intervals)
GR

## GRanges object with 3 ranges and 0 metadata columns:
##       seqnames           ranges strand
##          <Rle>        <IRanges>  <Rle>
##   [1]     chr1 1000000-10000000      *
##   [2]     chr2    120000-560000      *
##   [3]     chrM   340000-9800000      *
##   -------
##   seqinfo: 3 sequences from an unspecified genome; no seqlengths

Add strand information to the GenomicRanges Object, choosing among the allowed symbols.

strand(GR) <- c("*", "+", "-")

Add names to the intervals: Gene_1, Gene_2, and Gene_3.

names(GR) <- c("Gene_1", "Gene_2" , "Gene_3")

2025-03-18

GenomicRanges package (Genomic ranges)