GenomicRanges package (Genomic ranges)

  1. Create a GRanges object using the following information:
library(IRanges)
library(GenomicRanges)

intervals = IRanges(start = c(10000, 11100, 200000)
                    , end = c(10300, 11500, 200300))

GR = GRanges(seqnames = c("chr1","chr1", "chr2")
             , ranges = intervals
             , strand = c("+", "-", "-")
             , score=c(10,20,15)
             )

  1. Use the start(), end(), strand(), seqnames() and width() functions on the GRanges object you created. What information do you obtain?
start(GR)
## [1]  10000  11100 200000
end(GR)
## [1]  10300  11500 200300
strand(GR)
## factor-Rle of length 3 with 2 runs
##   Lengths: 1 2
##   Values : + -
## Levels(3): + - *
seqnames(GR)
## factor-Rle of length 3 with 2 runs
##   Lengths:    2    1
##   Values : chr1 chr2
## Levels(2): chr1 chr2
width(GR)
## [1] 301 401 301

  1. Add the width of ranges as a metadata column named dimension.
    Hint: use $ as if it is a dataframe.
GR$dimension = width(GR)

  1. Extract metadata columns.
mcols(GR)
## DataFrame with 3 rows and 2 columns
##       score dimension
##   <numeric> <integer>
## 1        10       301
## 2        20       401
## 3        15       301

  1. Create a subset of the GRanges object containing only intervals on the + strand and another subset with intervals located on chr1.
    Hint: GRanges objects can be subset using the [ ] operator, similar to data frames.
    However, you may need to use start(), end(), strand(), and seqnames() within [ ].
GR[strand(GR) == "+"]
## GRanges object with 1 range and 2 metadata columns:
##       seqnames      ranges strand |     score dimension
##          <Rle>   <IRanges>  <Rle> | <numeric> <integer>
##   [1]     chr1 10000-10300      + |        10       301
##   -------
##   seqinfo: 2 sequences from an unspecified genome; no seqlengths
GR[seqnames(GR) == "chr1"]
## GRanges object with 2 ranges and 2 metadata columns:
##       seqnames      ranges strand |     score dimension
##          <Rle>   <IRanges>  <Rle> | <numeric> <integer>
##   [1]     chr1 10000-10300      + |        10       301
##   [2]     chr1 11100-11500      - |        20       401
##   -------
##   seqinfo: 2 sequences from an unspecified genome; no seqlengths

  1. Modify the chromosome of the second interval in the GRanges object to chr20.
seqlevels(GR)
## [1] "chr1" "chr2"
seqlevels(GR) <- c("chr1", "chr2", "chr20")
seqnames(GR)[2] = "chr20"

  1. Try creating a GRanges object using the following data:


Are you able to create it successfully? Why or why not?

intervals2 = IRanges(start = c(1, 679, 7000)
                     , end = c(7, 666, 34000))
## Error in .width_as_unnamed_integer(width, msg = "an end that is greater or equal to its start minus one"): each range must have an end that is greater or equal to its start minus
##   one
GR2 = GRanges(seqnames = c("chr4", "chr6", "chr9", "chr22")
              , ranges = intervals2)
## Error: object 'intervals2' not found

  1. Create a GRanges object using the following data:
intervals2 = IRanges(start = c(1, 679, 7000, 999)
                     , end = c(7, 6666, 34000, 1000)
                     )

GR2 = GRanges(seqnames = c("chr4", "chr6", "chr9", "chr22")
              , ranges = intervals2)

  1. Assign names to the GRanges object you created: Sequence1, Sequence2, Sequence3, and Sequence4.
intervals2 = IRanges(start = c(1, 679, 7000, 999)
                     , end = c(7, 6666, 34000, 1000))

names(GR2) <- c("Sequence1", "Sequence2", "Sequence3", "Sequence4")

  1. Extract the range named Sequence3 from the GRanges object.
GR2["Sequence3"]
## GRanges object with 1 range and 0 metadata columns:
##             seqnames     ranges strand
##                <Rle>  <IRanges>  <Rle>
##   Sequence3     chr9 7000-34000      *
##   -------
##   seqinfo: 4 sequences from an unspecified genome; no seqlengths