Theorical intro: CpG islands

The CpG sites (i.e. 5’C—phosphate—G3’) are regions of DNA where a cytosine is followed by a guanine nucleotide in the linear sequence of bases along its 5’ → 3’ direction.

Often, the cytosines in CpG dinucleotides are methylated (5-methylcytosines). In mammals, 70% to 80% of CpG cytosines are methylated. Methylating the cytosine within a gene can influence its expression.

CpG islands (also called CG-Rich Islands) are regions with a high frequency of CpG sites. In humans, about 70% of promoters located near the transcription start site of a gene contain a CpG island.


Input/Output

  1. Read CpG island (CpGi) data contained in CpGi.table.hg18.csv (You find it in Datasets folder). This file was downloaded from the compGenomRData package and is a comma-separated file.
  • Store it in a variable called cpgi
  • By applying the str( ) function, explore the cpgi dataframe.
# for me DATA_DIR= '/Users/tbecchi/Desktop/repository/BDSB/'
cpgi=read.csv(paste0(DATA_DIR,"Exercises/Ex_per_date/Datasets/CpGi.table.hg18.csv"))
str(cpgi)
## 'data.frame':    28226 obs. of  10 variables:
##  $ chrom     : chr  "chr1" "chr1" "chr1" "chr1" ...
##  $ chromStart: int  18598 124987 317653 427014 439136 523082 534601 703847 752279 778726 ...
##  $ chromEnd  : int  19673 125426 318092 428027 440407 523977 536512 704410 753308 779074 ...
##  $ name      : chr  "CpG: 116" "CpG: 30" "CpG: 29" "CpG: 84" ...
##  $ length    : int  1075 439 439 1013 1271 895 1911 563 1029 348 ...
##  $ cpgNum    : int  116 30 29 84 99 94 171 60 115 28 ...
##  $ gcNum     : int  787 295 295 734 777 570 1405 385 673 192 ...
##  $ perCpg    : num  21.6 13.7 13.2 16.6 15.6 21 17.9 21.3 22.4 16.1 ...
##  $ perGc     : num  73.2 67.2 67.2 72.5 61.1 63.7 73.5 68.4 65.4 55.2 ...
##  $ obsExp    : num  0.83 0.64 0.62 0.64 0.84 1.04 0.67 0.92 1.07 1.06 ...

  1. Evaluate the dimensions of the dataframe.
dim(cpgi)
## [1] 28226    10

  1. Visualize the first rows of the dataset by using head( ) function.
head(cpgi)
##   chrom chromStart chromEnd     name length cpgNum gcNum perCpg perGc obsExp
## 1  chr1      18598    19673 CpG: 116   1075    116   787   21.6  73.2   0.83
## 2  chr1     124987   125426  CpG: 30    439     30   295   13.7  67.2   0.64
## 3  chr1     317653   318092  CpG: 29    439     29   295   13.2  67.2   0.62
## 4  chr1     427014   428027  CpG: 84   1013     84   734   16.6  72.5   0.64
## 5  chr1     439136   440407  CpG: 99   1271     99   777   15.6  61.1   0.84
## 6  chr1     523082   523977  CpG: 94    895     94   570   21.0  63.7   1.04

  1. What happens if you set stringsAsFactors=TRUE in the reading function? Use again the str( ) function.
cpgi=read.csv("Datasets/CpGi.table.hg18.csv", stringsAsFactors=TRUE )
str(cpgi)
## 'data.frame':    28226 obs. of  10 variables:
##  $ chrom     : Factor w/ 45 levels "chr1","chr1_random",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ chromStart: int  18598 124987 317653 427014 439136 523082 534601 703847 752279 778726 ...
##  $ chromEnd  : int  19673 125426 318092 428027 440407 523977 536512 704410 753308 779074 ...
##  $ name      : Factor w/ 439 levels "CpG: 100","CpG: 101",..: 18 222 213 421 439 432 80 382 17 202 ...
##  $ length    : int  1075 439 439 1013 1271 895 1911 563 1029 348 ...
##  $ cpgNum    : int  116 30 29 84 99 94 171 60 115 28 ...
##  $ gcNum     : int  787 295 295 734 777 570 1405 385 673 192 ...
##  $ perCpg    : num  21.6 13.7 13.2 16.6 15.6 21 17.9 21.3 22.4 16.1 ...
##  $ perGc     : num  73.2 67.2 67.2 72.5 61.1 63.7 73.5 68.4 65.4 55.2 ...
##  $ obsExp    : num  0.83 0.64 0.62 0.64 0.84 1.04 0.67 0.92 1.07 1.06 ...

  1. Read only the first 10 rows of the CpGi table.
cpgi=read.csv("Datasets/CpGi.table.hg18.csv", nrow=10 )
dim(cpgi)
## [1] 10 10

  1. Read the file skipping the first 10 lines of the CpGi table.
cpgi=read.csv("Datasets/CpGi.table.hg18.csv", skip =10 )
head(cpgi)
##   chr1 X778726 X779074  CpG..28 X348 X28 X192 X16.1 X55.2 X1.06
## 1 chr1  791838  792201  CpG: 24  363  24  243  13.2  66.9  0.79
## 2 chr1  795061  795491  CpG: 50  430  50  316  23.3  73.5  0.87
## 3 chr1  829557  830482  CpG: 83  925  83  525  17.9  56.8  1.11
## 4 chr1  834162  835746 CpG: 153 1584 153 1083  19.3  68.4  0.85
## 5 chr1  844628  844836  CpG: 16  208  16  140  15.4  67.3  0.68
## 6 chr1  848833  851495 CpG: 257 2662 257 1642  19.3  61.7  1.02

  1. Try reading the file setting header=FALSE. What happens?
cpgi=read.csv("Datasets/CpGi.table.hg18.csv", header=FALSE )
head(cpgi)
##      V1         V2       V3       V4     V5     V6    V7     V8    V9    V10
## 1 chrom chromStart chromEnd     name length cpgNum gcNum perCpg perGc obsExp
## 2  chr1      18598    19673 CpG: 116   1075    116   787   21.6  73.2   0.83
## 3  chr1     124987   125426  CpG: 30    439     30   295   13.7  67.2   0.64
## 4  chr1     317653   318092  CpG: 29    439     29   295   13.2  67.2   0.62
## 5  chr1     427014   428027  CpG: 84   1013     84   734   16.6  72.5   0.64
## 6  chr1     439136   440407  CpG: 99   1271     99   777   15.6  61.1   0.84

  1. Read again the file using the optimal options to import the data correctly.
cpgi=read.csv("Datasets/CpGi.table.hg18.csv", header=TRUE)
head(cpgi)
##   chrom chromStart chromEnd     name length cpgNum gcNum perCpg perGc obsExp
## 1  chr1      18598    19673 CpG: 116   1075    116   787   21.6  73.2   0.83
## 2  chr1     124987   125426  CpG: 30    439     30   295   13.7  67.2   0.64
## 3  chr1     317653   318092  CpG: 29    439     29   295   13.2  67.2   0.62
## 4  chr1     427014   428027  CpG: 84   1013     84   734   16.6  72.5   0.64
## 5  chr1     439136   440407  CpG: 99   1271     99   777   15.6  61.1   0.84
## 6  chr1     523082   523977  CpG: 94    895     94   570   21.0  63.7   1.04

  1. Write CpG islands to a RDS file. Set the output folder as you prefer. Notice that if you want write the file to your home folder you can use file="~/filename.rds" as in linux ~/ denotes home folder (Notice that if you are using Windows you have to use back slash (\) instead of slash (/).
saveRDS(cpgi, "cpgi.rds")

  1. Save CpG islands in a txt file. Make sure to use the quote=FALSE , sep="\t" and row.names=FALSE arguments. What do these arguments do?
write.table(cpgi, "cpgi.txt",quote=FALSE,sep="`\t",row.names=FALSE)

  1. Read the RDS file you created. Extract CpG islands only on chr1 and assign them to a variable called chr1. HINT: subset cpg1 using both [] (creating a logical vector with == operator) and subset() function.
cpgi=readRDS("cpgi.rds")
chr1=cpgi[cpgi$chrom=="chr1",]
chr1=subset(cpgi, chrom=="chr1")

  1. Create a variable called chr2 with CpG islands only on chr2. Save both chr1 and chr2 in an RData file. Then, remove both chr1 and chr2 variables from R environment using rm(). What happens if you try to visualize chr1 now? Inspect the “Environment” tab.
chr2=subset(cpgi, chrom=="chr2")

save(chr1, chr2, file = "cpgi_chr1_chr2.RData")

rm(chr1)
rm(chr2)

chr1

  1. Load the Rdata file you created and inspect the content. What happens if you try to visualize the header of chr1??
load("cpgi_chr1_chr2.RData")

head(chr1) # your object is already named
##   chrom chromStart chromEnd     name length cpgNum gcNum perCpg perGc obsExp
## 1  chr1      18598    19673 CpG: 116   1075    116   787   21.6  73.2   0.83
## 2  chr1     124987   125426  CpG: 30    439     30   295   13.7  67.2   0.64
## 3  chr1     317653   318092  CpG: 29    439     29   295   13.2  67.2   0.62
## 4  chr1     427014   428027  CpG: 84   1013     84   734   16.6  72.5   0.64
## 5  chr1     439136   440407  CpG: 99   1271     99   777   15.6  61.1   0.84
## 6  chr1     523082   523977  CpG: 94    895     94   570   21.0  63.7   1.04

  1. In your environment you have “chr1” and “chr2” again. Let’s work on them!

    • Create a vector length_chr1 that contains values in column length in chr1
    • Create a vector length_chr2 that contains values in column length in chr2
    • Evaluate quantile distribution with steps of 0.1 of length_chr1 and length_chr2 (HINT: use quantile() function).
    • Evaluate mean, median and standard deviation of length_chr1 and length_chr2. Are there differences between the two? Comment on this, also considering quantiles.
    • Create three different dataframes from chr1: chr1_small, chr1_medium and chr1_large by using quantiles values (<=30%, >30% and <=60% and >60%)
    • Create three different dataframes from chr2: chr2_small, chr2_medium and chr2_large by using quantiles values (<=30%, >30% and <=60% and >60%)
    • How many rows do the dataframes you created have??
    • Evaluate how many times a certain name is repeated into chr1_small
    • Evaluate how many times a certain name is repeated into chr2_large
    • Considering all cpgi dataframe, how many CpGs do you have per chromosome?
    • Create a vector casual with numbers normally distributed whose length is equal to the length of unique values of chr1_small$name
    • Give names to casual as the unique values of chr1_small$name
    • Order casual from the bigger to the smaller value
    • Add a column Gaussianin chr1_small by matching the column name with casual names.
    • Add a column Gaussian in chr2_small by matching the column name with casual names. Are there missing values? If yes, how many? To how many unique names they correspond?
    • Transform chr2_small by excluding rows that have missing values in Gaussian
    • Save chr2_small and chr1_small into a Rdata file
    • Create a list dat that contains:
      • The subset of chr1_large for which length is major than 25000
      • The number of rows corresponding to values of length in chr2_large that are major than 25000
      • A matrix you create by selecting only the last three columns from chr1_small
    • Save the list you created into an RDS file
    • Which is the type of the vector contained into the list you created?
    • Evaluate if c("CpG: 45","CpG: 52", "CpG: 108") are present in the name column of chr2_medium
    • Extract row indexes of the elements in name column in chr2_medium that correspond to c("CpG: 45","CpG: 52", "CpG: 108")
    • Extract from the matrix in the dat list all columns except the third and rows that contain perGc values between 53 and 62
    • Extract indices of rows into the column name in chr2_large that contain 2
    • Create a new logical vector lo evaluating if values in name column in chr1_large contain number 3
    • Transform lo in a matrix with 7 columns. You will obtain a warning… why?
    • Add the matrix you created to the list dat and overwrite the file you saved before with the new list
length_chr1=chr1$length
length_chr2=chr2$length

q1=quantile(length_chr1, probs=seq(0,1,0.1))
q2=quantile(length_chr2, probs=seq(0,1,0.1))

mean(length_chr1)
## [1] 764.1153
mean(length_chr2)
## [1] 816.2726
median(length_chr1)
## [1] 562
median(length_chr2)
## [1] 626
sd(length_chr1)
## [1] 1197.294
sd(length_chr2)
## [1] 697.7922
# chr1 has shorter CpG in mean and median, nevertheless has bigger fluctuations (standard deviation in bigger)
# Watching quantiles we can see that minimun length is equal, suggesting a cutoff for identifying islands. Until 90% chr2 contains bigger islands, nevertheless chr1 has some longer islands (100% value is bigger)


chr1_small=subset(chr1, length<=q1["30%"])
nrow(chr1_small)
## [1] 741
chr1_medium=subset(chr1, length>q1["30%"] & length<=q1["60%"])
nrow(chr1_medium)
## [1] 738
chr1_large=subset(chr1, length>q1["60%"])
nrow(chr1_large)
## [1] 984
chr2_small=subset(chr2, length<=q2["30%"])
nrow(chr2_small)
## [1] 506
chr2_medium=subset(chr2, length>q2["30%"] & length<=q2["60%"])
nrow(chr2_medium)
## [1] 503
chr2_large=subset(chr2, length>q2["60%"])
nrow(chr2_large)
## [1] 671
table(chr1_small$name)
## 
## CpG: 14 CpG: 15 CpG: 16 CpG: 17 CpG: 18 CpG: 19 CpG: 20 CpG: 21 CpG: 22 CpG: 23 
##       2      18      26      30      35      47      48      49      49      58 
## CpG: 24 CpG: 25 CpG: 26 CpG: 27 CpG: 28 CpG: 29 CpG: 30 CpG: 31 CpG: 32 CpG: 33 
##      50      51      40      38      27      21      28      13      13      18 
## CpG: 34 CpG: 35 CpG: 36 CpG: 37 CpG: 38 CpG: 39 CpG: 40 CpG: 41 CpG: 42 CpG: 43 
##      14      14      12       8      13       3       2       3       1       2 
## CpG: 44 CpG: 45 CpG: 47 CpG: 49 CpG: 50 
##       2       2       2       1       1
table(chr2_large$name)
## 
## CpG: 100 CpG: 101 CpG: 102 CpG: 103 CpG: 104 CpG: 105 CpG: 106 CpG: 107 
##        5        6        6        7        7        9        8        6 
## CpG: 108 CpG: 109 CpG: 110 CpG: 111 CpG: 112 CpG: 113 CpG: 114 CpG: 115 
##        5        5        5        6        6        6        6       10 
## CpG: 116 CpG: 117 CpG: 118 CpG: 119 CpG: 120 CpG: 121 CpG: 122 CpG: 123 
##        9        5        6        7        7        6        4        4 
## CpG: 124 CpG: 125 CpG: 126 CpG: 127 CpG: 128 CpG: 129 CpG: 130 CpG: 131 
##        6        5        4        5        7        4        1        7 
## CpG: 132 CpG: 133 CpG: 134 CpG: 135 CpG: 136 CpG: 137 CpG: 138 CpG: 139 
##        4        6        2        5        6        2        2        8 
## CpG: 140 CpG: 141 CpG: 142 CpG: 143 CpG: 144 CpG: 145 CpG: 146 CpG: 147 
##        1        3        3        4        2        1        5        2 
## CpG: 148 CpG: 149 CpG: 150 CpG: 151 CpG: 152 CpG: 153 CpG: 154 CpG: 155 
##        5        1        3        3        1        5        2        3 
## CpG: 157 CpG: 158 CpG: 159 CpG: 160 CpG: 161 CpG: 163 CpG: 164 CpG: 166 
##        2        1        2        2        4        4        2        3 
## CpG: 167 CpG: 168 CpG: 169 CpG: 170 CpG: 171 CpG: 172 CpG: 173 CpG: 174 
##        1        1        4        2        2        3        1        4 
## CpG: 175 CpG: 176 CpG: 177 CpG: 179 CpG: 180 CpG: 181 CpG: 182 CpG: 183 
##        1        4        2        2        4        2        2        3 
## CpG: 184 CpG: 185 CpG: 186 CpG: 187 CpG: 188 CpG: 190 CpG: 191 CpG: 192 
##        3        5        3        2        1        2        1        1 
## CpG: 194 CpG: 195 CpG: 196 CpG: 197 CpG: 198 CpG: 201 CpG: 203 CpG: 206 
##        1        1        2        2        4        1        1        1 
## CpG: 207 CpG: 208 CpG: 210 CpG: 212 CpG: 213 CpG: 214 CpG: 221 CpG: 222 
##        1        2        2        2        1        2        1        1 
## CpG: 223 CpG: 224 CpG: 227 CpG: 228 CpG: 229 CpG: 230 CpG: 235 CpG: 236 
##        1        1        1        2        1        1        1        1 
## CpG: 238 CpG: 242 CpG: 243 CpG: 244 CpG: 246 CpG: 248 CpG: 249 CpG: 251 
##        3        1        1        1        1        1        1        1 
## CpG: 253 CpG: 256 CpG: 257 CpG: 263 CpG: 266 CpG: 275 CpG: 284 CpG: 285 
##        1        1        1        1        1        1        1        1 
## CpG: 288 CpG: 289 CpG: 290 CpG: 295 CpG: 300 CpG: 308 CpG: 309 CpG: 329 
##        2        1        1        1        1        1        1        1 
## CpG: 331 CpG: 334 CpG: 339 CpG: 341 CpG: 354 CpG: 406 CpG: 423 CpG: 468 
##        1        1        1        1        1        1        1        1 
##  CpG: 50 CpG: 504 CpG: 518  CpG: 53  CpG: 55  CpG: 56  CpG: 57  CpG: 59 
##        1        1        1        1        1        2        1        3 
##  CpG: 60  CpG: 61 CpG: 615  CpG: 62  CpG: 63  CpG: 64  CpG: 65  CpG: 66 
##        4        2        1        2        2        4        3        2 
##  CpG: 67  CpG: 68  CpG: 69  CpG: 70  CpG: 71  CpG: 72  CpG: 73  CpG: 74 
##        3        4        5        4        4        5        3        7 
##  CpG: 75  CpG: 76  CpG: 77  CpG: 78  CpG: 79  CpG: 80  CpG: 81  CpG: 82 
##       10        7        8        5       10        4        8        7 
##  CpG: 83  CpG: 84  CpG: 85  CpG: 86  CpG: 87  CpG: 88  CpG: 89  CpG: 90 
##        7       10        5        4        8       16       10        5 
##  CpG: 91  CpG: 92  CpG: 93  CpG: 94  CpG: 95  CpG: 96  CpG: 97  CpG: 98 
##        5        8        9        7       12       10        4        7 
##  CpG: 99 
##        5
table(cpgi$chrom)
## 
##          chr1   chr1_random         chr10  chr10_random         chr11 
##          2463            31          1150             1          1371 
##  chr11_random         chr12         chr13  chr13_random         chr14 
##             3          1221           605             2           788 
##         chr15  chr15_random         chr16  chr16_random         chr17 
##           787            20          1491             3          1622 
##  chr17_random         chr18         chr19          chr2   chr2_random 
##            67           508          2544          1680            10 
##         chr20         chr21  chr21_random         chr22  chr22_random 
##           799           356            19           716             5 
##          chr3   chr3_random          chr4   chr4_random          chr5 
##          1159             2          1019            21          1227 
##  chr5_h2_hap1   chr5_random          chr6 chr6_cox_hap1 chr6_qbl_hap2 
##            13             2          1251           142           137 
##   chr6_random          chr7   chr7_random          chr8   chr8_random 
##            19          1552            23          1028             8 
##          chr9   chr9_random          chrX   chrX_random          chrY 
##          1230            17           891            42           181
casual=rnorm(length(unique(chr1_small$name)))

names(casual)=unique(chr1_small$name)

casual=casual[order(casual, decreasing = T)]

chr1_small$Gaussian=casual[match(chr1_small$name, names(casual))]

chr2_small$Gaussian=casual[match(chr2_small$name, names(casual))]

unique(is.na(chr2_small$Gaussian))
## [1] FALSE  TRUE
length(unique(subset(chr2_small, is.na(Gaussian))$name))
## [1] 5
chr2_small=subset(chr2_small, is.na(Gaussian))

chr2_small$Gaussian<-NULL

save(chr2_small, chr1_small, file = "chr1Small_chr2Small.RData")

dat=list(subs=subset(chr1_large,length>25000 ), num=sum(chr2_large$length>25000), mat=as.matrix(chr1_small[, (ncol(chr1_small)-3):ncol(chr1_small)]))

saveRDS(dat, "Lista.rds")

typeof(dat$num)
## [1] "integer"
c("CpG: 45","CpG: 52", "CpG: 108")%in%chr2_medium$name
## [1]  TRUE  TRUE FALSE
match(chr2_medium$name, c("CpG: 45","CpG: 52", "CpG: 108"))
##   [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
##  [26] NA NA NA NA NA NA NA NA NA NA NA  1 NA NA NA NA NA NA NA NA NA NA NA NA NA
##  [51] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
##  [76] NA  2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  1 NA
## [101] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [126] NA NA NA NA NA NA NA NA  2 NA  2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [151] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [176] NA NA NA NA NA NA  1 NA NA  2 NA NA NA NA  1 NA NA NA NA NA NA NA NA NA NA
## [201] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  1 NA NA NA NA NA NA NA
## [226] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [251] NA NA  2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [276] NA NA NA NA NA NA NA NA NA NA  2 NA NA NA NA NA  2 NA NA NA NA NA NA NA  1
## [301] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [326] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  1 NA NA NA NA NA NA NA
## [351] NA NA NA NA NA NA NA NA NA  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [376] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [401] NA NA NA  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [426] NA NA NA NA NA NA NA  1 NA NA NA NA NA NA NA NA NA NA  1 NA NA NA NA NA NA
## [451] NA NA NA NA NA NA NA NA NA NA NA NA  2 NA NA NA NA NA NA NA NA NA NA NA NA
## [476] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [501] NA NA NA
dat$mat[which(dat$mat[,"perGc"]>=53 & dat$mat[,"perGc"]<=62),-3]
##      perCpg perGc    Gaussian
## 10     16.1  55.2  1.03209945
## 25     17.0  61.6  0.23240908
## 47     16.9  61.1  0.76342535
## 48     19.7  58.4 -1.23314906
## 80     17.9  56.1  1.29513448
## 88     17.9  60.4  0.76342535
## 110    17.9  60.0  1.13433839
## 115    14.2  57.3 -0.05187486
## 129    15.7  61.7  0.80137753
## 135    15.7  61.7  0.80137753
## 155    14.6  61.7 -0.95276891
## 169    15.8  60.0  1.05810217
## 194    14.3  61.5  0.23240908
## 207    17.0  60.8 -0.59044507
## 228    15.4  58.7 -0.05187486
## 230    16.2  56.8  1.74230946
## 231    13.8  59.9 -0.28435228
## 238    14.9  60.7 -0.95276891
## 242    18.2  59.7  1.13433839
## 244    15.9  60.3  0.80137753
## 245    16.2  62.0  1.28946852
## 289    15.3  61.9  1.74230946
## 293    14.4  61.3  1.13433839
## 297    15.2  61.2  1.74230946
## 314    16.6  59.3  1.29513448
## 317    15.5  58.6  1.05810217
## 364    14.6  59.2 -0.95276891
## 407    17.4  58.9  1.74230946
## 421    14.2  56.3  1.13433839
## 468    14.7  60.8  1.13433839
## 472    15.0  62.0  1.29513448
## 490    18.8  60.8  0.80137753
## 491    15.4  61.8 -0.82738477
## 509    14.8  61.9  1.74230946
## 543    17.1  61.9  1.74230946
## 544    17.3  61.7 -0.59044507
## 564    17.9  54.7  0.23240908
## 662    13.0  54.6 -1.32768870
## 711    21.0  54.8 -0.82738477
## 752    16.8  61.1  0.80137753
## 754    17.9  58.3  0.80137753
## 761    22.0  61.4 -0.95874310
## 779    14.8  61.7 -0.28435228
## 822    20.9  59.5  0.03592213
## 825    22.7  61.8  0.80137753
## 837    18.9  60.6  0.76342535
## 876    15.9  59.7 -0.05187486
## 902    16.0  59.1  0.23240908
## 950    14.4  60.6  1.29513448
## 956    17.2  61.3 -1.32768870
## 991    14.4  56.5 -0.95276891
## 1081   14.7  60.8  1.29513448
## 1092   16.1  61.8  1.29513448
## 1156   15.2  59.8  1.03209945
## 1183   15.4  56.3 -1.32768870
## 1188   15.3  56.8 -0.74235710
## 1221   16.6  60.0  0.80137753
## 1226   16.6  60.8 -0.59044507
## 1227   17.7  62.0  1.13433839
## 1235   16.2  53.5  1.13433839
## 1239   13.5  55.6  1.29513448
## 1243   16.0  58.7  1.03209945
## 1253   18.0  60.9 -0.28435228
## 1259   15.2  54.6 -0.28435228
## 1274   15.6  57.5  0.76342535
## 1286   21.7  61.3 -0.28435228
## 1293   15.2  61.0  1.29513448
## 1310   15.0  61.0  1.29513448
## 1311   14.9  56.9 -0.74235710
## 1314   14.6  59.4  0.23240908
## 1324   15.0  60.6 -1.32768870
## 1326   16.4  61.2 -0.82738477
## 1341   16.1  58.8  0.76342535
## 1350   19.3  59.0  1.28946852
## 1420   17.9  62.0 -1.23314906
## 1427   15.2  61.4 -0.05187486
## 1493   18.5  61.0  0.23240908
## 1541   15.2  57.7 -0.74235710
## 1546   15.3  60.6  0.23240908
## 1549   14.4  60.5 -0.74235710
## 1558   18.4  61.5 -0.74235710
## 1559   16.1  61.2  1.74230946
## 1575   14.4  60.5 -0.74235710
## 1584   18.4  61.5 -0.74235710
## 1585   16.1  61.2  1.74230946
## 1594   14.1  60.4  0.23240908
## 1595   14.5  61.1 -0.05187486
## 1608   15.6  59.9 -0.82738477
## 1609   16.8  58.0  1.03209945
## 1611   15.6  59.9 -0.82738477
## 1616   15.1  60.0  1.05810217
## 1620   14.9  62.0  0.23240908
## 1657   16.2  58.5 -0.82738477
## 1659   16.4  55.1  0.76342535
## 1668   19.1  61.4  1.13433839
## 1678   20.8  58.9  1.13433839
## 1682   17.0  62.0 -0.28435228
## 1697   17.2  61.1  0.23240908
## 1706   18.0  55.9  0.23240908
## 1727   16.2  56.8  1.74230946
## 1734   15.1  60.8  1.13433839
## 1737   14.6  55.2  0.23240908
## 1761   19.4  58.8  1.13433839
## 1765   18.5  61.3 -1.23314906
## 1772   16.8  61.5  0.80137753
## 1806   15.5  60.6  0.80137753
## 1823   19.3  61.1 -0.74235710
## 1831   17.4  60.3  1.13433839
## 1858   14.7  54.3  1.05810217
## 1884   17.8  60.0  1.28946852
## 1907   14.1  59.2 -0.95276891
## 1913   14.7  60.5  0.76342535
## 1917   16.1  60.5  1.29513448
## 1972   15.1  61.8  1.74230946
## 1994   23.3  60.5 -0.74235710
## 2028   18.3  61.5 -1.32768870
## 2056   17.8  61.3 -1.32768870
## 2074   14.2  58.4 -0.05187486
## 2089   13.2  61.0 -0.95276891
## 2137   18.1  57.5  0.80137753
## 2172   21.2  57.6  0.76342535
## 2173   15.6  58.7  1.05810217
## 2183   22.1  57.1  0.76342535
## 2190   16.3  59.5  0.76342535
## 2196   19.2  59.4  1.13433839
## 2198   18.4  61.4  1.13433839
## 2209   14.4  57.2 -1.32768870
## 2212   19.0  61.1  1.13433839
## 2218   17.7  55.8 -0.82738477
## 2221   18.6  61.8  0.23240908
## 2223   21.5  58.4 -0.08993420
## 2286   22.5  61.7 -1.11316424
## 2330   15.2  60.1  1.13433839
## 2333   13.8  61.7 -1.32768870
## 2355   16.2  61.4  1.05810217
## 2372   14.4  58.7 -0.95276891
## 2416   14.4  59.0 -0.05187486
## 2427   16.1  61.1  1.05810217
grep("2", chr2_large$name)
##   [1]   1   4   6  24  25  26  37  43  45  47  51  52  53  55  60  63  66  68
##  [19]  69  75  79  86  91  94  96 108 113 115 119 122 126 130 133 141 142 143
##  [37] 150 152 156 157 182 183 186 190 191 201 208 215 222 226 232 239 248 258
##  [55] 261 262 263 264 266 267 269 273 274 279 285 287 289 292 298 303 306 313
##  [73] 315 320 321 323 324 326 327 339 345 348 351 352 357 359 365 367 368 369
##  [91] 377 384 398 406 417 421 425 426 430 433 436 437 443 452 457 458 467 475
## [109] 481 484 489 492 503 507 524 527 535 541 544 546 552 556 558 569 571 574
## [127] 576 577 584 587 589 592 595 599 604 607 608 614 619 623 628 632 637 640
## [145] 646 651 661 666 669
lo=grepl("3", chr2_large$name)

lo=matrix(lo, ncol = 7)
## Warning in matrix(lo, ncol = 7): data length [671] is not a sub-multiple or
## multiple of the number of rows [96]
dat$lo=lo

saveRDS(dat, "Lista.rds")