2025-03-24

Visualize mutations on a gene

It is often helpful to visualize point mutations in a spatial context on a gene, creating an image like:


Lollipops in the Clinic: Information Dense Mutation Plots for Precision Medicine, 2016
Lollipops in the Clinic: Information Dense Mutation Plots for Precision Medicine, 2016


where regions in gene are indicated as well as variants.

We can make these plots by using G3viz package, that, thanks to its dependencies, also allows us to retrieve data directly from cBioPortal.

What’s cBioPortal?

The cBioPortal for Cancer Genomics is an open-access, open-source resource for interactive exploration of multidimensional cancer genomics data sets. The goal of cBioPortal is to significantly lower the barriers between complex genomic data and cancer researchers by providing rapid, intuitive, and high-quality access to molecular profiles and clinical attributes from large-scale cancer genomics projects, and therefore to empower researchers to translate these rich data sets into biologic insights and clinical applications.

Let’s explore it @ cBioPortal for Cancer Genomics

A complete user guide can also be found at https://docs.cbioportal.org

Introduction to G3viz

The G3viz package allows us to create lollipop diagrams with the addition of detailed translational effect of genetic mutations.

Using G3viz we can:

  • make interactive plots

  • label positional mutations

  • use different themes

  • save charts in PNG or high-quality SVG format

  • retrieve protein domain information and resolve gene isoforms

  • map variant classification to mutation class

You can find the tutorial @ https://g3viz.github.io/g3viz/ and a deepening on themes @ https://g3viz.github.io/g3viz/chart_themes.html

install.packages("g3viz", repos = "http://cran.us.r-project.org")

Once the installing process has been completed you can normally load it:

library(g3viz)

In the next steps we will explore package examples, explaining passages.

Visualize genetic mutation data from MAF file

Mutation Annotation Format (MAF) is a commonly-used tab-delimited text file for storing aggregated mutation information. It could be generated from VCF file using tools like vcf2maf. Translational effect of variant alleles in MAF files are usually in the column named Variant_Classification or Mutation_Type (i.e., Frame_Shift_Del, Splice_Site). In this example, the somatic mutation data of the TCGA-BRCA study was originally downloaded from the GDC Data Portal.

maf.file <- system.file("extdata", "TCGA.BRCA.varscan.somatic.maf.gz", package = "g3viz")
mutation.dat <- readMAF(maf.file)

head(mutation.dat)
##   Hugo_Symbol Chromosome Start_Position End_Position Strand
## 1        TP53      chr17        7676564      7676564      +
## 2        TP53      chr17        7676399      7676399      +
## 3        TP53      chr17        7676267      7676280      +
## 4        TP53      chr17        7676273      7676273      +
## 5        TP53      chr17        7676215      7676215      +
## 6        TP53      chr17        7676203      7676203      +
##   Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1
## 1      Missense_Mutation          SNP                C                 C
## 2      Missense_Mutation          SNP                G                 G
## 3            Splice_Site          DEL   GGGGGACTGTAGAT    GGGGGACTGTAGAT
## 4            Splice_Site          SNP                C                 G
## 5      Nonsense_Mutation          SNP                G                 G
## 6      Nonsense_Mutation          SNP                C                 C
##   Tumor_Seq_Allele2      HGVSp  HGVSp_Short
## 1                 T p.Glu11Lys       p.E11K
## 2                 A p.Pro27Ser       p.P27S
## 3                 -            p.X33_splice
## 4                 G            p.X33_splice
## 5                 A p.Gln52Ter       p.Q52*
## 6                 A p.Glu56Ter       p.E56*
##                                                                    COSMIC
## 1 COSM3820734;COSM3820735;COSM3820736;COSM3820737;COSM3820738;COSM3820739
## 2 COSM1167900;COSM1167901;COSM1167902;COSM1167903;COSM3522716;COSM3522717
## 3                                                                        
## 4                 COSM2745171;COSM29761;COSM4272163;COSM437642;COSM437643
## 5                   COSM1750375;COSM3932748;COSM44041;COSM99948;COSM99949
## 6                 COSM12168;COSM126989;COSM126990;COSM2745104;COSM4272098
##   Mutation_Class AA_Position
## 1       Missense          11
## 2       Missense          27
## 3     Truncating          33
## 4     Truncating          33
## 5     Truncating          52
## 6     Truncating          56
str(mutation.dat)
## 'data.frame':    828 obs. of  15 variables:
##  $ Hugo_Symbol           : chr  "TP53" "TP53" "TP53" "TP53" ...
##  $ Chromosome            : chr  "chr17" "chr17" "chr17" "chr17" ...
##  $ Start_Position        : int  7676564 7676399 7676267 7676273 7676215 7676203 98890391 179199035 15872946 7676140 ...
##  $ End_Position          : int  7676564 7676399 7676280 7676273 7676215 7676203 98890391 179199035 15872947 7676141 ...
##  $ Strand                : chr  "+" "+" "+" "+" ...
##  $ Variant_Classification: chr  "Missense_Mutation" "Missense_Mutation" "Splice_Site" "Splice_Site" ...
##  $ Variant_Type          : chr  "SNP" "SNP" "DEL" "SNP" ...
##  $ Reference_Allele      : chr  "C" "G" "GGGGGACTGTAGAT" "C" ...
##  $ Tumor_Seq_Allele1     : chr  "C" "G" "GGGGGACTGTAGAT" "G" ...
##  $ Tumor_Seq_Allele2     : chr  "T" "A" "-" "G" ...
##  $ HGVSp                 : chr  "p.Glu11Lys" "p.Pro27Ser" "" "" ...
##  $ HGVSp_Short           : chr  "p.E11K" "p.P27S" "p.X33_splice" "p.X33_splice" ...
##  $ COSMIC                : chr  "COSM3820734;COSM3820735;COSM3820736;COSM3820737;COSM3820738;COSM3820739" "COSM1167900;COSM1167901;COSM1167902;COSM1167903;COSM3522716;COSM3522717" "" "COSM2745171;COSM29761;COSM4272163;COSM437642;COSM437643" ...
##  $ Mutation_Class        : chr  "Missense" "Missense" "Truncating" "Truncating" ...
##  $ AA_Position           : num  11 27 33 33 52 56 69 70 73 77 ...
chart.options <- g3Lollipop.theme(theme.name = "default",
                                  title.text = "PIK3CA gene (default theme)")

g3Lollipop(mutation.dat,
           gene.symbol = "PIK3CA",
           plot.options = chart.options,
           output.filename = "default_theme")
## Factor is set to Mutation_Class
## legend title is set to Mutation_Class
# Notice that this is an interactive plot. You have to save it directly from the "Viewer" panel.

Visualize genetic mutation data from CSV or TSV file

In this example, we read genetic mutation data from CSV or TSV files, and visualize it using some customized chart options. Note this is equivalent to dark chart theme.

mutation.csv <- system.file("extdata", "ccle.csv", package = "g3viz")

head(mutation.csv)
## [1] "/Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/library/g3viz/extdata/ccle.csv"
#   "gene.symbol.col"    : column of gene symbol
#   "variant.class.col"  : column of variant class
#   "protein.change.col" : colum of protein change column

mutation.dat <- readMAF(mutation.csv,
                        gene.symbol.col = "Hugo_Symbol",  # names of column in wich information is contained
                        variant.class.col = "Variant_Classification",
                        protein.change.col = "amino_acid_change",
                        sep = ",")  # column-separator of csv file

head(mutation.dat)
##   Hugo_Symbol Chromosome Start_Position End_Position Strand
## 1         APC          5      112090592    112090592      +
## 2        TP53         17        7579882      7579882      +
## 3        TP53         17        7579882      7579882      +
## 4        TP53         17        7579882      7579882      +
## 5        TP53         17        7579882      7579882      +
## 6         APC          5      112090633    112090633      +
##   Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1
## 1      Missense_Mutation          SNP                C                 C
## 2      Missense_Mutation          SNP                C                 C
## 3      Missense_Mutation          SNP                C                 C
## 4      Missense_Mutation          SNP                C                 C
## 5      Missense_Mutation          SNP                C                 C
## 6                 Silent          SNP                C                 C
##   Tumor_Seq_Allele2 amino_acid_change Mutation_Class AA_Position
## 1                 T             p.A2V       Missense           2
## 2                 G            p.E11Q       Missense          11
## 3                 G            p.E11Q       Missense          11
## 4                 G            p.E11Q       Missense          11
## 5                 G            p.E11Q       Missense          11
## 6                 T            p.L16L        Inframe          16
str(mutation.dat)
## 'data.frame':    1535 obs. of  13 variables:
##  $ Hugo_Symbol           : chr  "APC" "TP53" "TP53" "TP53" ...
##  $ Chromosome            : int  5 17 17 17 17 5 3 17 5 3 ...
##  $ Start_Position        : int  112090592 7579882 7579882 7579882 7579882 112090633 178916661 7579866 112090638 178916673 ...
##  $ End_Position          : int  112090592 7579882 7579882 7579882 7579882 112090633 178916661 7579866 112090638 178916673 ...
##  $ Strand                : chr  "+" "+" "+" "+" ...
##  $ Variant_Classification: chr  "Missense_Mutation" "Missense_Mutation" "Missense_Mutation" "Missense_Mutation" ...
##  $ Variant_Type          : chr  "SNP" "SNP" "SNP" "SNP" ...
##  $ Reference_Allele      : chr  "C" "C" "C" "C" ...
##  $ Tumor_Seq_Allele1     : chr  "C" "C" "C" "C" ...
##  $ Tumor_Seq_Allele2     : chr  "T" "G" "G" "G" ...
##  $ amino_acid_change     : chr  "p.A2V" "p.E11Q" "p.E11Q" "p.E11Q" ...
##  $ Mutation_Class        : chr  "Missense" "Missense" "Missense" "Missense" ...
##  $ AA_Position           : num  2 11 11 11 11 16 16 16 17 20 ...
# set up chart options
plot.options <- g3Lollipop.options(
  # Chart settings
  chart.width = 600,
  chart.type = "pie",
  chart.margin = list(left = 30, right = 20, top = 20, bottom = 30),
  chart.background = "#d3d3d3",
  transition.time = 300,
  # Lollipop track settings
  lollipop.track.height = 200,
  lollipop.track.background = "#d3d3d3",
  lollipop.pop.min.size = 1,
  lollipop.pop.max.size = 8,
  lollipop.pop.info.limit = 5.5,
  lollipop.pop.info.dy = "0.24em",
  lollipop.pop.info.color = "white",
  lollipop.line.color = "#a9A9A9",
  lollipop.line.width = 3,
  lollipop.circle.color = "#ffdead",
  lollipop.circle.width = 0.4,
  lollipop.label.ratio = 2,
  lollipop.label.min.font.size = 12,
  lollipop.color.scheme = "dark2",
  highlight.text.angle = 60,
  # Domain annotation track settings
  anno.height = 16,
  anno.margin = list(top = 0, bottom = 0),
  anno.background = "#d3d3d3",
  anno.bar.fill = "#a9a9a9",
  anno.bar.margin = list(top = 4, bottom = 4),
  domain.color.scheme = "pie5",
  domain.margin = list(top = 2, bottom = 2),
  domain.text.color = "white",
  domain.text.font = "italic 8px Serif",
  # Y-axis label
  y.axis.label = "# of TP53 gene mutations",
  axis.label.color = "#303030",
  axis.label.alignment = "end",
  axis.label.font = "italic 12px Serif",
  axis.label.dy = "-1.5em",
  y.axis.line.color = "#303030",
  y.axis.line.width = 0.5,
  y.axis.line.style = "line",
  y.max.range.ratio = 1.1,
  # Chart title settings
  title.color = "#303030",
  title.text = "TP53 gene (customized chart options)",
  title.font = "bold 12px monospace",
  title.alignment = "start",
  # Chart legend settings
  legend = TRUE,
  legend.margin = list(left=20, right = 0, top = 10, bottom = 5),
  legend.interactive = TRUE,
  legend.title = "Variant classification",
  # Brush selection tool
  brush = TRUE,
  brush.selection.background = "#F8F8FF",
  brush.selection.opacity = 0.3,
  brush.border.color = "#a9a9a9",
  brush.border.width = 1,
  brush.handler.color = "#303030",
  # tooltip and zoom
  tooltip = TRUE,
  zoom = TRUE
)

g3Lollipop(mutation.dat,
           gene.symbol = "TP53",
           protein.change.col = "amino_acid_change",
           btn.style = "blue", # blue-style chart download buttons
           plot.options = plot.options,
           output.filename = "customized_plot")
## Factor is set to Mutation_Class

Visualize genetic mutation data from cBioPortal

cBioPortal offers downloadable data for numerous cancer genomics datasets. g3viz has a convenient way to retrieve data directly from this portal.

In this example, we first retrieve genetic mutation data of TP53 gene for the msk_impact_2017 study, and then visualize the data using the built-in cbioportal theme, to mimic cBioPortal’s mutation_mapper.

# Retrieve mutation data of "msk_impact_2017" from cBioPortal
mutation.dat <- getMutationsFromCbioportal("msk_impact_2017", "TP53")
## The Entrez Gene ID for TP53 is: 7157
## Found mutation dataset for msk_impact_2017: msk_impact_2017_mutations
plot.options <- g3Lollipop.theme(theme.name = "cbioportal",
                                 title.text = "TP53 gene (cbioportal theme)",
                                 y.axis.label = "# of TP53 Mutations")

g3Lollipop(mutation.dat,
           gene.symbol = "TP53",
           btn.style = "gray", # gray-style chart download buttons
           plot.options = plot.options,
           output.filename = "cbioportal_theme")
## Factor is set to Mutation_Class
## legend title is set to Mutation_Class

But how can we know which data we can find in cBioPortal?

G3viz has, as dependency, cBioPortalData package from Bioconductor.

You can see all available datasets by using:

library(cBioPortalData)
cbio <- cBioPortal()
studies <- getStudies(cbio)
studies=as.data.frame(studies)
head(studies)
##                                                            name
## 1        Acute Lymphoblastic Leukemia (St Jude, Nat Genet 2015)
## 2 Hypodiploid Acute Lymphoid Leukemia (St Jude, Nat Genet 2013)
## 3         Adenoid Cystic Carcinoma (FMI, Am J Surg Pathl. 2014)
## 4          Adenoid Cystic Carcinoma (JHU, Cancer Prev Res 2016)
## 5          Adenoid Cystic Carcinoma (MDA, Clin Cancer Res 2015)
## 6                  Adenoid Cystic Carcinoma (MGH, Nat Gen 2016)
##                                                                                  description
## 1  Comprehensive profiling of infant MLL-rearranged acute lymphoblastic leukemia (MLL-R ALL)
## 2 Whole genome or exome sequencing of 44 (20 whole genome, 20 exome) ALL tumor/normal pairs.
## 3                     Targeted Sequencing of 28 metastatic Adenoid Cystic Carcinoma samples.
## 4  Whole-genome or whole-exome sequencing of 25 adenoid cystic carcinoma tumor/normal pairs.
## 5 WGS of 21 salivary ACCs and targeted molecular analyses of a validation set (81 patients).
## 6                                        Whole-genome/exome sequencing of 10 ACC PDX models.
##   publicStudy     pmid                           citation      groups status
## 1        TRUE 25730765    Andersson et al. Nat Genet 2015      PUBLIC      0
## 2        TRUE 23334668    Holmfeldt et al. Nat Genet 2013                  0
## 3        TRUE 24418857   Ross et al. Am J Surg Pathl 2014 ACYC;PUBLIC      0
## 4        TRUE 26862087 Rettig et al, Cancer Prev Res 2016 ACYC;PUBLIC      0
## 5        TRUE 26631609 Mitani et al. Clin Cancer Res 2015 ACYC;PUBLIC      0
## 6        TRUE 26829750  Drier et al. Nature Genetics 2016        ACYC      0
##            importDate allSampleCount readPermission         studyId
## 1 2024-12-03 11:48:34             93           TRUE all_stjude_2015
## 2 2024-12-03 11:50:01             44           TRUE all_stjude_2013
## 3 2024-12-03 11:50:37             28           TRUE   acyc_fmi_2014
## 4 2024-12-03 11:50:39             25           TRUE   acyc_jhu_2016
## 5 2024-12-03 11:50:44            102           TRUE   acyc_mda_2015
## 6 2024-12-03 11:50:49             10           TRUE   acyc_mgh_2016
##   cancerTypeId referenceGenome
## 1          bll            hg19
## 2      myeloid            hg19
## 3         acyc            hg19
## 4         acyc            hg19
## 5         acyc            hg19
## 6         acyc            hg19
unique(studies$referenceGenome)
## [1] "hg19" "hg38"
unique(studies$studyId)
##   [1] "all_stjude_2015"                   "all_stjude_2013"                  
##   [3] "acyc_fmi_2014"                     "acyc_jhu_2016"                    
##   [5] "acyc_mda_2015"                     "acyc_mgh_2016"                    
##   [7] "acyc_sanger_2013"                  "all_stjude_2016"                  
##   [9] "appendiceal_msk_2022"              "blca_plasmacytoid_mskcc_2016"     
##  [11] "bcc_unige_2016"                    "brca_broad"                       
##  [13] "blca_mskcc_solit_2014"             "blca_nmibc_2017"                  
##  [15] "bfn_duke_nus_2015"                 "brca_jup_msk_2020"                
##  [17] "brca_mapk_hp_msk_2021"             "brca_hta9_htan_2022"              
##  [19] "biliary_tract_summit_2022"         "bowel_colitis_msk_2022"           
##  [21] "bladder_columbia_msk_2018"         "bladder_msk_2023"                 
##  [23] "bm_nsclc_mskcc_2023"               "breast_msk_2018"                  
##  [25] "brca_mskcc_2019"                   "breast_alpelisib_2020"            
##  [27] "cfdna_msk_2019"                    "ccrcc_dfci_2019"                  
##  [29] "breast_ink4_msk_2021"              "brca_pareja_msk_2020"             
##  [31] "cervix_msk_2023"                   "chol_jhu_2013"                    
##  [33] "chol_nccs_2013"                    "chol_nus_2012"                    
##  [35] "coadread_mskcc"                    "cllsll_icgc_2011"                 
##  [37] "coad_caseccc_2015"                 "chol_msk_2018"                    
##  [39] "chol_icgc_2017"                    "coadread_mskresistance_2022"      
##  [41] "cscc_dfarber_2015"                 "ctcl_columbia_2015"               
##  [43] "crc_msk_2017"                      "crc_eo_2020"                      
##  [45] "crc_apc_impact_2020"               "crc_nigerian_2020"                
##  [47] "crc_dd_2022"                       "difg_msk_2023"                    
##  [49] "escc_ucla_2014"                    "esca_broad"                       
##  [51] "gct_msk_2016"                      "egc_msk_2017"                     
##  [53] "hcc_mskimpact_2018"                "hcc_msk_venturaa_2018"            
##  [55] "dlbcl_duke_2017"                   "glioma_mskcc_2019"                
##  [57] "glioma_msk_2018"                   "gbc_msk_2018"                     
##  [59] "gbm_columbia_2019"                 "gct_msk_2020"                     
##  [61] "egc_mskcc_2020"                    "egc_msk_tp53_ccr_2022"            
##  [63] "gbc_mskcc_2022"                    "gist_msk_2022"                    
##  [65] "egc_msk_2023"                      "hcc_jcopo_msk_2023"               
##  [67] "es_dsrct_msk_2023"                 "kirc_bgi"                         
##  [69] "hnsc_jhu"                          "hnsc_tcga_pub"                    
##  [71] "hnc_mskcc_2016"                    "ihch_smmu_2014"                   
##  [73] "histiocytosis_cobi_msk_2019"       "ihch_mskcc_2020"                  
##  [75] "ihch_ismms_2015"                   "ihch_msk_2021"                    
##  [77] "hgsoc_msk_2021"                    "ilc_msk_2023"                     
##  [79] "lihc_riken"                        "liad_inserm_fr_2014"              
##  [81] "lcll_broad_2013"                   "lgsoc_mapk_msk_2022"              
##  [83] "luad_tsp"                          "lung_msk_2017"                    
##  [85] "lung_msk_pdx"                      "lymphoma_cellline_msk_2020"       
##  [87] "lung_msk_mind_2020"                "mbc_msk_2021"                     
##  [89] "luad_mskimpact_2021"               "lung_pdx_msk_2021"                
##  [91] "lung_nci_2022"                     "mbl_broad_2012"                   
##  [93] "mbl_icgc"                          "mbl_pcgp"                         
##  [95] "mcl_idibips_2013"                  "mds_tokyo_2011"                   
##  [97] "mbl_dkfz_2017"                     "mds_iwg_2022"                     
##  [99] "alal_target_gdc"                   "aml_target_gdc"                   
## [101] "bll_target_gdc"                    "nbl_target_gdc"                   
## [103] "os_target_gdc"                     "wt_target_gdc"                    
## [105] "mpn_cimr_2013"                     "mnm_washu_2016"                   
## [107] "metastatic_solid_tumors_mich_2017" "mixed_selpercatinib_2020"         
## [109] "mixed_cfdna_msk_2020"              "mel_mskimpact_2020"               
## [111] "mixed_kunga_msk_2022"              "mixed_impact_subset_2022"         
## [113] "nbl_amc_2012"                      "msk_ch_2020"                      
## [115] "msk_access_2021"                   "msk_ch_ped_2021"                  
## [117] "msk_spectrum_tme_2022"             "mtnn_msk_2022"                    
## [119] "msk_ch_2023"                       "npc_nusingapore"                  
## [121] "odg_msk_2017"                      "nsclc_unito_2016"                 
## [123] "nsclc_pd1_msk_2018"                "nsclc_mskcc_2015"                 
## [125] "nsclc_ctdx_msk_2022"               "paac_msk_jco_2023"                
## [127] "panet_jhu_2011"                    "pcnsl_mayo_2015"                  
## [129] "panet_shanghai_2013"               "plmeso_nyu_2015"                  
## [131] "past_dkfz_heidelberg_2013"         "pediatric_dkfz_2017"              
## [133] "paired_bladder_2022"               "panet_msk_erc_2023"               
## [135] "prad_cpcg_2017"                    "prad_mskcc_2017"                  
## [137] "prad_msk_2019"                     "prad_mcspc_mskcc_2020"            
## [139] "scco_mskcc"                        "rms_nih_2014"                     
## [141] "sarc_mskcc"                        "rectal_msk_2019"                  
## [143] "sarcoma_mskcc_2022"                "rbl_cfdna_msk_2020"               
## [145] "rbl_mskcc_2020"                    "prostate_pcbm_swiss_2019"         
## [147] "rms_msk_2023"                      "sarcoma_msk_2023"                 
## [149] "skcm_tcga"                         "skcm_yale"                        
## [151] "skcm_vanderbilt_mskcc_2015"        "soft_tissue_msk_2023"             
## [153] "thyroid_mskcc_2016"                "summit_2018"                      
## [155] "ucec_msk_2018"                     "uccc_nih_2017"                    
## [157] "tmb_mskcc_2018"                    "ucec_ccr_msk_2022"                
## [159] "ucec_ccr_cfdna_msk_2022"           "ucec_ancestry_cds_msk_2023"       
## [161] "ucec_msk_2024"                     "um_qimr_2016"                     
## [163] "urcc_mskcc_2016"                   "utuc_mskcc_2015"                  
## [165] "utuc_msk_2019"                     "utuc_pdx_msk_2019"                
## [167] "usarc_msk_2020"                    "utuc_igbmc_2021"                  
## [169] "plmeso_msk_2024"                   "acc_tcga_gdc"                     
## [171] "blca_tcga_gdc"                     "brca_tcga_gdc"                    
## [173] "cesc_tcga_gdc"                     "chol_tcga_gdc"                    
## [175] "dlbclnos_tcga_gdc"                 "esca_tcga_gdc"                    
## [177] "gbm_tcga_gdc"                      "hnsc_tcga_gdc"                    
## [179] "chrcc_tcga_gdc"                    "ccrcc_tcga_gdc"                   
## [181] "prcc_tcga_gdc"                     "aml_tcga_gdc"                     
## [183] "difg_tcga_gdc"                     "hcc_tcga_gdc"                     
## [185] "luad_tcga_gdc"                     "lusc_tcga_gdc"                    
## [187] "plmeso_tcga_gdc"                   "hgsoc_tcga_gdc"                   
## [189] "paad_tcga_gdc"                     "mnet_tcga_gdc"                    
## [191] "prad_tcga_gdc"                     "read_tcga_gdc"                    
## [193] "soft_tissue_tcga_gdc"              "skcm_tcga_gdc"                    
## [195] "stad_tcga_gdc"                     "nsgct_tcga_gdc"                   
## [197] "thpa_tcga_gdc"                     "thym_tcga_gdc"                    
## [199] "ucec_tcga_gdc"                     "ucs_tcga_gdc"                     
## [201] "um_tcga_gdc"                       "brain_cptac_gdc"                  
## [203] "breast_cptac_gdc"                  "coad_cptac_gdc"                   
## [205] "luad_cptac_gdc"                    "lusc_cptac_gdc"                   
## [207] "ohnca_cptac_gdc"                   "ovary_cptac_gdc"                  
## [209] "pancreas_cptac_gdc"                "rcc_cptac_gdc"                    
## [211] "uec_cptac_gdc"                     "msk_impact_2017"                  
## [213] "coad_tcga_gdc"                     "pancreas_msk_2024"                
## [215] "brca_aurora_2023"                  "crc_orion_2024"                   
## [217] "lms_msk_2024"                      "prostate_msk_2024"                
## [219] "ucs_msk_2024"                      "panet_msk_2018"                   
## [221] "kirp_tcga"                         "ntrk_msk_2019"                    
## [223] "heme_msk_impact_2022"              "makeanimpact_ccr_2023"            
## [225] "pancan_mimsi_msk_2024"             "acc_tcga"                         
## [227] "blca_tcga"                         "ampca_bcm_2016"                   
## [229] "blca_dfarber_mskcc_2014"           "blca_mskcc_solit_2012"            
## [231] "brca_bccrc_xenograft_2014"         "blca_bgi"                         
## [233] "blca_tcga_pub"                     "brca_bccrc"                       
## [235] "brca_igr_2015"                     "acbc_mskcc_2015"                  
## [237] "acyc_mskcc_2013"                   "angs_project_painter_2018"        
## [239] "all_phase2_target_2018_pub"        "aml_target_2018_pub"              
## [241] "blca_tcga_pub_2017"                "blca_tcga_pan_can_atlas_2018"     
## [243] "blca_cornell_2016"                 "aml_ohsu_2018"                    
## [245] "acc_2019"                          "blca_bcan_hcrn_2022"              
## [247] "angs_painter_2020"                 "blca_msk_tcga_2020"               
## [249] "brain_cptac_2020"                  "brca_cptac_2020"                  
## [251] "brca_mbcproject_2022"              "aml_ohsu_2022"                    
## [253] "asclc_msk_2024"                    "pcnsl_msk_2024"                   
## [255] "msk_ctdna_vte_2024"                "brca_dfci_2020"                   
## [257] "brca_tcga"                         "cesc_tcga"                        
## [259] "chol_tcga"                         "brca_sanger"                      
## [261] "brca_tcga_pub2015"                 "brca_tcga_pub"                    
## [263] "cellline_ccle_broad"               "ccrcc_irc_2014"                   
## [265] "ccrcc_utokyo_2013"                 "coadread_genentech"               
## [267] "cellline_nci60"                    "cll_iuopa_2015"                   
## [269] "brca_metabric"                     "coadread_dfci_2016"               
## [271] "cll_broad_2015"                    "brca_mbcproject_wagle_2017"       
## [273] "brca_tcga_pan_can_atlas_2018"      "cesc_tcga_pan_can_atlas_2018"     
## [275] "chol_tcga_pan_can_atlas_2018"      "ccle_broad_2019"                  
## [277] "coad_cptac_2019"                   "brca_smc_2018"                    
## [279] "coadread_cass_2020"                "cll_broad_2022"                   
## [281] "coadread_tcga"                     "dlbc_tcga"                        
## [283] "coadread_tcga_pub"                 "desm_broad_2015"                  
## [285] "dlbc_broad_2012"                   "cscc_hgsc_bcm_2014"               
## [287] "coadread_tcga_pan_can_atlas_2018"  "dlbc_tcga_pan_can_atlas_2018"     
## [289] "dlbcl_dfci_2018"                   "difg_glass_2019"                  
## [291] "cscc_ucsf_2021"                    "crc_hta11_htan_2021"              
## [293] "cscc_ranson_2022"                  "difg_glass"                       
## [295] "esca_tcga"                         "escc_icgc"                        
## [297] "es_dfarber_broad_2014"             "es_iocurie_2014"                  
## [299] "egc_tmucih_2015"                   "esca_tcga_pan_can_atlas_2018"     
## [301] "egc_trap_msk_2020"                 "egc_trap_ccr_msk_2023"            
## [303] "gbm_tcga"                          "gbc_shanghai_2014"                
## [305] "gbm_tcga_pub"                      "gbm_tcga_pub2013"                 
## [307] "gbm_tcga_pan_can_atlas_2018"       "gbm_mayo_pdx_sarkaria_2019"       
## [309] "gbm_cptac_2021"                    "gist_msk_2023"                    
## [311] "kirc_tcga"                         "kich_tcga"                        
## [313] "hnsc_tcga"                         "kirc_tcga_pub"                    
## [315] "kich_tcga_pub"                     "hnsc_broad"                       
## [317] "hnsc_mdanderson_2013"              "hcc_inserm_fr_2015"               
## [319] "hccihch_pku_2019"                  "hcc_meric_2021"                   
## [321] "hcc_clca_2024"                     "hcc_msk_2024"                     
## [323] "laml_tcga"                         "lgg_tcga"                         
## [325] "lihc_tcga"                         "luad_tcga"                        
## [327] "laml_tcga_pub"                     "lgg_ucsf_2014"                    
## [329] "lgggbm_tcga_pub"                   "lihc_amc_prv"                     
## [331] "luad_mskcc_2015"                   "luad_broad"                       
## [333] "luad_mskcc_2020"                   "luad_mskcc_2023_met_organotropism"
## [335] "luad_oncosg_2020"                  "luad_msk_npjpo_2021"              
## [337] "luad_cptac_2020"                   "lusc_tcga"                        
## [339] "meso_tcga"                         "luad_tcga_pub"                    
## [341] "lusc_tcga_pub"                     "mm_broad"                         
## [343] "mpnst_mskcc"                       "mbl_sickkids_2016"                
## [345] "mrt_bcgsc_2016"                    "mel_tsam_liang_2017"              
## [347] "mel_ucla_2016"                     "mixed_pipseq_2017"                
## [349] "mixed_allen_2018"                  "mbn_mdacc_2013"                   
## [351] "mds_mskcc_2020"                    "mel_dfci_2019"                    
## [353] "lung_smc_2016"                     "mixed_msk_tcga_2021"              
## [355] "mng_utoronto_2021"                 "mpcproject_broad_2021"            
## [357] "lusc_cptac_2021"                   "mbn_sfu_2023"                     
## [359] "mbn_msk_2024"                      "ov_tcga"                          
## [361] "paad_tcga"                         "nccrcc_genentech_2014"            
## [363] "ov_tcga_pub"                       "paac_jhu_2014"                    
## [365] "paad_icgc"                         "paad_utsw_2015"                   
## [367] "nepc_wcm_2016"                     "nbl_ucologne_2015"                
## [369] "nsclc_tcga_broad_2016"             "paad_qcmg_uq_2016"                
## [371] "pact_jhu_2011"                     "nbl_broad_2013"                   
## [373] "nhl_bcgsc_2011"                    "nhl_bcgsc_2013"                   
## [375] "nsclc_tracerx_2017"                "nbl_target_2018_pub"              
## [377] "nsclc_mskcc_2018"                  "paad_cptac_2021"                  
## [379] "pcpg_tcga"                         "prad_tcga"                        
## [381] "prad_fhcrc"                        "prad_broad"                       
## [383] "prad_broad_2013"                   "prad_mich"                        
## [385] "prad_mskcc"                        "prad_mskcc_2014"                  
## [387] "prad_su2c_2015"                    "prad_tcga_pub"                    
## [389] "panet_arcnet_2017"                 "pcpg_tcga_pub"                    
## [391] "prad_eururol_2017"                 "prad_p1000"                       
## [393] "prad_su2c_2019"                    "prostate_dkfz_2018"               
## [395] "pptc_2019"                         "prad_cdk12_mskcc_2020"            
## [397] "prad_mskcc_cheny1_organoids_2014"  "pan_origimed_2020"                
## [399] "prad_msk_stopsack_2021"            "pancan_pcawg_2020"                
## [401] "prad_pik3r1_msk_2021"              "pog570_bcgsc_2020"                
## [403] "prad_organoids_msk_2022"           "ptad_msk_2024"                    
## [405] "prad_msk_mdanderson_2023"          "sarc_tcga"                        
## [407] "stad_tcga"                         "tgct_tcga"                        
## [409] "thym_tcga"                         "thca_tcga"                        
## [411] "ucec_tcga"                         "ucs_tcga"                         
## [413] "uvm_tcga"                          "sclc_clcgp"                       
## [415] "sclc_jhu"                          "skcm_broad"                       
## [417] "skcm_broad_dfarber"                "stad_pfizer_uhongkong"            
## [419] "stad_tcga_pub"                     "stad_uhongkong"                   
## [421] "stad_utokyo"                       "tet_nci_2014"                     
## [423] "thca_tcga_pub"                     "ucs_jhu_2014"                     
## [425] "ucec_tcga_pub"                     "stes_tcga_pub"                    
## [427] "skcm_broad_brafresist_2012"        "sarc_tcga_pub"                    
## [429] "skcm_mskcc_2014"                   "sclc_cancercell_gardner_2017"     
## [431] "rt_target_2018_pub"                "wt_target_2018_pub"               
## [433] "skcm_tcga_pub_2015"                "vsc_cuk_2018"                     
## [435] "utuc_cornell_baylor_mdacc_2019"    "skcm_dfci_2015"                   
## [437] "sclc_ucologne_2015"                "stad_oncosg_2018"                 
## [439] "rectal_msk_2022"                   "ucec_cptac_2020"                  
## [441] "stmyec_wcm_2022"                   "sarcoma_msk_2022"                 
## [443] "hnsc_tcga_pan_can_atlas_2018"      "kich_tcga_pan_can_atlas_2018"     
## [445] "kirc_tcga_pan_can_atlas_2018"      "kirp_tcga_pan_can_atlas_2018"     
## [447] "laml_tcga_pan_can_atlas_2018"      "lgg_tcga_pan_can_atlas_2018"      
## [449] "lihc_tcga_pan_can_atlas_2018"      "luad_tcga_pan_can_atlas_2018"     
## [451] "lusc_tcga_pan_can_atlas_2018"      "meso_tcga_pan_can_atlas_2018"     
## [453] "ov_tcga_pan_can_atlas_2018"        "paad_tcga_pan_can_atlas_2018"     
## [455] "pcpg_tcga_pan_can_atlas_2018"      "prad_tcga_pan_can_atlas_2018"     
## [457] "sarc_tcga_pan_can_atlas_2018"      "nst_nfosi_ntap"                   
## [459] "skcm_tcga_pan_can_atlas_2018"      "stad_tcga_pan_can_atlas_2018"     
## [461] "tgct_tcga_pan_can_atlas_2018"      "thca_tcga_pan_can_atlas_2018"     
## [463] "thym_tcga_pan_can_atlas_2018"      "ucec_tcga_pan_can_atlas_2018"     
## [465] "ucs_tcga_pan_can_atlas_2018"       "uvm_tcga_pan_can_atlas_2018"      
## [467] "coad_silu_2022"                    "acc_tcga_pan_can_atlas_2018"      
## [469] "msk_chord_2024"                    "pancan_mappyacts_2022"            
## [471] "msk_met_2021"                      "pdac_msk_2024"                    
## [473] "rectal_radiation_msk_2024"         "normal_skin_fibroblast_2024"      
## [475] "normal_skin_keratinocytes_2024"    "breast_msk_2025"                  
## [477] "blca_msk_2024"                     "normal_skin_melanocytes_2024"     
## [479] "ovary_geomx_gray_foundation_2024"  "brca_fuscc_2020"                  
## [481] "braf_msk_archer_2024"              "braf_msk_impact_2024"             
## [483] "thyroid_gatci_2024"
sort(unique(studies$cancerTypeId))
##   [1] "acbc"          "acc"           "acyc"          "alal"         
##   [5] "aml"           "ampca"         "angs"          "apad"         
##   [9] "bcc"           "bfn"           "biliary_tract" "bladder"      
##  [13] "blca"          "bll"           "bowel"         "brain"        
##  [17] "brca"          "breast"        "ccrcc"         "cervix"       
##  [21] "cesc"          "chol"          "chrcc"         "cllsll"       
##  [25] "coad"          "coadread"      "cscc"          "desm"         
##  [29] "difg"          "dlbclnos"      "egc"           "es"           
##  [33] "esca"          "escc"          "gbc"           "gbm"          
##  [37] "gist"          "hcc"           "hccihch"       "hdcn"         
##  [41] "head_neck"     "hgsoc"         "hnsc"          "ihch"         
##  [45] "lgsoc"         "liad"          "luad"          "lung"         
##  [49] "lusc"          "lymph"         "mbc"           "mbl"          
##  [53] "mbn"           "mcl"           "mds"           "mel"          
##  [57] "mixed"         "mnet"          "mng"           "mnm"          
##  [61] "mpn"           "mpnst"         "mrt"           "mtnn"         
##  [65] "myeloid"       "nbl"           "nccrcc"        "nhl"          
##  [69] "npc"           "nsclc"         "nsgct"         "nst"          
##  [73] "odg"           "ohnca"         "os"            "ovary"        
##  [77] "paac"          "paad"          "pact"          "pancreas"     
##  [81] "panet"         "past"          "pcm"           "pcnsl"        
##  [85] "plmeso"        "prad"          "prcc"          "prostate"     
##  [89] "ptad"          "rbl"           "rcc"           "read"         
##  [93] "rms"           "scco"          "sclc"          "skcm"         
##  [97] "skin"          "soft_tissue"   "stad"          "stmyec"       
## [101] "stomach"       "testis"        "tet"           "thpa"         
## [105] "thym"          "thyroid"       "uccc"          "ucec"         
## [109] "ucs"           "uec"           "um"            "urcc"         
## [113] "usarc"         "utuc"          "vsc"           "wt"
# Suppose we are interested in prostate cancer, we can explore available datasets:

prad=subset(studies, cancerTypeId=="prad")

# To know for which genes mutation data are available you can directly interrogate cBioPortal by searching experiment "name".


prad2=subset(prad, name=="Prostate Adenocarcinoma (MSK, Clin Cancer Res. 2022)")

By searching this experiment on cBioPortal and clicking on the Explore Selected Study button, we obtain:

https://www.cbioportal.org/study/summary?id=prad_pik3r1_msk_2021

In the panel Mutated Genes we can see statistics for mutations, so we can choose our gene of interest.

Notice that each experiment is related to a scientific publication (you can search for it!).

On the contrary, if we are interested in a specific study we found in cBioPortal, we can check if it is available for downloading and retrieve its studyId that we have to use in getMutationsFromCbioportal() function.

mutation.dat <- getMutationsFromCbioportal("prad_pik3r1_msk_2021", "FOXA1")
## The Entrez Gene ID for FOXA1 is: 3169
## Found mutation dataset for prad_pik3r1_msk_2021: prad_pik3r1_msk_2021_mutations
# "cbioportal" chart theme
plot.options <- g3Lollipop.theme(theme.name = "nature2",
                                 title.text = "FOXA1 mutations",
                                 y.axis.label = "# of FOXA1 Mutations")

g3Lollipop(mutation.dat,
           gene.symbol = "FOXA1",
           btn.style = "gray", # gray-style chart download buttons
           plot.options = plot.options)
## Factor is set to Mutation_Class
## legend title is set to Mutation_Class

Now, we move directly to the tutorial.