Proteogenomics research – on the frontier of precision medicine
Posted: 14 December 2017 | Alicia Landeira, Javier Carabias, Jonatan García, Manuel Fuentes (Cancer Research Centre), Maria Gonzalez-Gonzalez, Paula Díez (Cancer Research Centre), Rafael Góngora, Rodrigo Garcia-Valiente | No comments yet
Proteogenomics is the systematic and comprehensive integration of proteomics with genomics and transcriptomics. Proteogenomics is opening new hallmarks in biomedical research. Recently, several studies have demonstrated the relevance of proteogenomics in cancer research. This article provides a brief review of the advantages of proteogenomics in precision medicine.
The principle of ‘omics’ approaches is the analysis in high-throughput format of genes/mRNA/proteins/metabolites presented in a biological sample and are called genomics, transcriptomics, proteomics and metabolomics, respectively. Nowadays, these omics approaches are becoming more relevant due to their application in biomedicine (such as novel drugs, novel biomarkers, earlier diagnosis, novel therapeutic targets, etc).1,2,3,4,5
In general, genomics is the field related to the massive characterisation of the genetic content presented within one cell of an organism,2 as much for specific investigation of selected genes as for coding sequences or whole genomes from minimal amounts of DNA.6
In a similar manner, proteomics is related to the comprehensive characterisation of a cell at the protein level.2 Currently, proteomics is based on a set of techniques to simultaneously analyse the presence and relative abundance of proteins in a particular biological sample,7,8,9 which will allow us to develop a complete and quantitative map of the proteome of a species, including cellular localisation of proteins; reconstruction of its networks and complexes; and tracing signalling pathways and protein modifications.10
During the last decade, proteomics has experienced huge development, mainly due to:
- Biological relevance: owing to better knowledge of the expression levels of proteins, changes in subcellular localisation and protein-protein interactions, and their post-translational modification – bearing in mind that the therapeutic targets are mostly proteins.11,12
- Development of high-throughput and massive analysis that allows the simultaneous detection of multiple proteins (including PTMs) in a single analysis. The Human Proteome Project (HPP) supports this systematic characterisation in order to help personalised medicine in five criteria: right patient/target, right diagnosis, right treatment, right drug/target and right dose/time.13
Recently, a new omics term, proteogenomics, has been coined as a consequence of all the developments in these fields. This word was used for the first time in literature in 2004 in a study published by Jaffe et al. The subject consists of the integration of proteomics with other omics, such as genomics and transcriptomics. Initially, proteogenomics was used to improve genomic annotation and characterisation of the protein-coding potential. Nowadays, it provides a unified vision of global understanding of cellular functions.10,13,14,15
The potential of applied proteogenomics has been discussed and demonstrated in several studies, as much in humans as with other living model organisms such as Plasmodium falciparum, Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana and Anopheles gambiae.10 These show the great potential in the biological research and biomedical field of this cutting-edge technology as it could generate a better understanding of the correlation between genotypes and phenotypes, which could be useful to provide accurate diagnosis and therapy, as well as other correlations that could aid the understanding of underlying mechanisms of antibiotic resistance, tumour microenviroment, etc.2,6,16
Proteogenomics and the Human Proteome Project
There is a well-known discrepancy between the level of mRNA and the predicted level of the encoded protein in a particular cell. This was confirmed by a study of global transcriptomics and proteomic analysis, which showed that approximately 30% of changes in mRNA levels could be correlated with protein levels. This discrepancy between transcriptomics and proteomics emphasises the relevance of post-translational modifications.5,14,17
In addition, the existence or deficiency of some post-translational modifications, such as glycosylation, phosphorylation, acetylation, or ubiquitinylation has a significant impact on protein stability (altering the half-life of proteins) and adds more complexity to the protein component of a cell.
Bearing this in mind, the content of the proteome is highly complex and highly dynamic; thus, proteomics analysis is required because this information cannot be deduced from genomics analysis.2,9,13,14
Recently, the first draft of the Human Proteome was published.2 The project began the discussion by the Human Proteome Organization (HUPO) in 2008, but did not start until 2010. It has as mission to provide a map relating to cell molecular architecture based on proteins of the human body. For this, the project was divided into two programmes: one based on chromosomes or C-HPP that allows characterisation of the human proteome, and another based on the biology/disease, or B/D-HPP.18
One of the main conclusions was around the protein complexity based on the compartmentalisation in cells, tissues and organs (around 200 types of cells form tissues and organs in a body).2,13,19,20 In general, the advances in this project are directly related to the progress in mass spectrometry and protein microarrays because both methodologies have increased the sensitivity for identification and evaluation of the proteins in high-throughput format.2,18,21,22 Moreover, novel bioinformatics tools have been designed and developed in order to cover the requirements of data analysis from these methodologies. However, there is also a growing concern about the processing capability of such data (because this information is on a large scale) and determining the false positive rate, particularly regarding new peptides.6,10
Proteogenomics integration from multi-omics datasets
Regarding the integration of multi-omics data sets, it is important to highlight a few proteomics aspects that are quite different from genomics and/ or transcriptomics:
- Proteins require isolation or purification steps, which can be tedious and inefficient. In addition, there is a lack of specific amplification steps for proteins similar to that for DNA/ RNA amplification.
- Availability of selective and specific affinity reagents for all the proteins, among the alterations of antibody recognition caused by post-translational modifications.2,14
Among the aspects previously mentioned, the environment and external stimulus play a critical role in protein expression patterns. As a consequence, in cells with similar or identical DNA content, the set of expressed proteins would be different according to the environmental conditions.8,19
Thus, the essential first step in proteogenomics is the creation of curated databases of protein/ peptide sequences from well-established genomic databases for specific models.5,13,14,17
At present, this approach is based on the systematic analysis of datasets related to:2,5,17,23
a) Epigenetic regulation and DNA
b) miRNAs and RNA expression
c) Proteins and PTMs.
Consequently, proteogenomics requires the integration of different softwares that integrate information from different omics and can be used easily. Furthermore, it must allow intensive calculations from large datasets and be capable of addressing false positives as well as the need to confirm the novelty of putative variants identified.17
So far, most of the annotations on genomic datasets have been based mainly on predictions, where many genomic sequences are in the public domain because of resources developed by the National Center for Biotechnology Information (NCBI) and Ensembl. In the case of Homo sapiens, no matter what annotations were obtained manually and by automation, the three main groups are NCBI, Ensembl and HAVANA. However, while such efforts provide more specific genetic notes, the corresponding relationship between genetics and proteins is still not ensured. For this reason, mass spectrometry is a powerful tool for analysis of proteins and to provide information for the correlation with the genome annotation.24
In this context, and considering the required time for thorough analysis of a particular proteome, dedicated software designed for proteogenomic tests is required and this allows us to search the datasets of proteins24 in order to provide an overall vision of the molecular scenery surrounding genes to proteins in a specific biological situation. The proteogenomics revolution in personalised oncology medicine After completion of the human genome, a great number of genes were mapped and had their specific role in disease progression described. Moreover, these included the role of the environment.25
In relation to cancer, these type of studies revealed the great complexity and heterogeneity of the genome, helping to trace the dysfunctional profile of the transcriptome.12 Thus, genetic analysis can indicate the type or subtype of cancer that a person has. All this has been an important step forward in the area of precision medicine through next-generation sequencing and bioinformatics analysis.25
In addition, the great advances in genomic and proteomic technologies that have emerged from this field of medicine show promise for diagnostic and treatment options for such diseases. Currently, genomic and proteomic biomarkers are being used to determine which individualised treatment is best suited for each patient.25
In this context, proteogenomics has emerged as a useful tool in cancer research because it integrates genomic and transcriptomic data tests with mass spectrometry. It also allows the identification of variant protein sequences that may have functional roles in cancer.17 This approximation, also known as onco-proteogenomics, takes advantage of how the information as a whole can aid the understanding of physio-pathologies through validation and refinement of known genes, identification of other novel genes and splice isoforms, validation of exons, assigning correct start sites, etc. Furthermore, proteogenomics informs us about the impact of present genomic modifications on biochemical cascades through identifi cation of variant proteins that would cause the pathology.6,10,12,24
Related to cancer progression, it could be a promising tool because genomic alterations (mutations, methylation, copy number aberrations, and/or translocation, etc) are directly related to tumour pathologies.2,4,12,14 As a consequence of the genomics changes, proteins are directly and indirectly aff ected in these cells, which could be active drivers to disease initiation, progression and/or response to treatment.2,12,13
Currently, Cancer Genome Consortium and Cancer Genome Atlas (TCGA) are projects that have demonstrated the correlation between genotypes and phenotypes with the protein profiles throughout deep sequencing of the genome. For this reason, the advent of high-throughput proteomics approaches (such as shot-gun MS/MS and protein microarrays) enables this correlation to be revealed and characterised for a specific tumour. Moreover, this helps to improve the global molecular knowledge of cancer cells.2
In this context, the National Cancer Institute (NCI) initiated the Clinical Proteomic Tumor Analysis Consortium (CPTAC) in 2011 to accelerate basic cancer molecular knowledge. The main aim of this analysis is to improve the ability to prevent, diagnose and treat disease through proteomics studies to main human cancers that are characterised by the TCGA.5,14
One of the achievements of the CPTAC has been deciphering novel molecular relationships within colorectal and breast cancer using these proteogenomics approaches. This allows the detection and quantification of proteins that are related and/or correlated to genomic abnormalities.2
The advances in proteomics have led to increased sensitivity, viability and accuracy. The detection and quantification of abnormal proteins have several challenges, including the following:
- Statistical models are necessary that can discern between ‘passenger’ and ‘driver’ mutations12
- Integration of all omics data is necessary for understanding cancer cell phenotype and to discover specific biomarkers12
- Not all peptides contained in a sample can be detected, especially for low abundance proteins, for two main reasons:
- The abundance of proteins is based on the dynamics of cellular and subcellular processes. It is known that there is a relation between mRNA and protein abundance, but it is only partially connected.
- In relation to the lost function of some proteins (due to bad folding causes) protein degradation arises firstly and following a reduction of protein abundance.6,12
For these reasons, onco-proteogenomics is a promising tool, not only for the understanding of pathologies like cancer, but also in the field of molecular biology.
- Dasilva, Noelia, Paula Díez, Sergio Matarraz, María González-González, Sara Paradinas, Alberto Orfao, y Manuel Fuentes. «Biomarker Discovery by Novel Sensors Based on Nanoproteomics Approaches». Sensors 12 (2):2284-2308. https://doi.org/10.3390/s120202284.
- Sajjad, Wasim, Muhammad Rafiq, Barkat Ali, Muhammad Hayat, Sahib Zada, Wasim Sajjad, y Tanweer Kumar. 2016. «Proteogenomics: New Emerging Technology». HAYATI Journal of Biosciences 23 (3):97-100. https://doi.org/10.1016/j.hjb.2016.11.002.
- Rodríguez-Cerdeira, Carmen, Alberto Molares-Vila, Miguel Carnero-Gregorio, y Alberte Corbalán-Rivas. «Recent advances in melanoma research via “omics” platforms». Journal of Proteomics, November. https://doi.org/10.1016/j.jprot.2017.11.005.
- Galicia N, Dégano R, Díez P, González-González M, Góngora R, Ibarrola N, Fuentes M. «CSF analysis for protein biomarker identification in patients with leptomeningeal metastases from CNS lymphoma». Expert Rev Proteomics. 2017 Apr;14(4):363-372. doi: 10.1080/14789450.2017.1307106.
- Vasaikar, Suhas V, Peter Straub, Jing Wang, y Bing Zhang. 2017. «LinkedOmics: Analyzing Multi-Omics Data within and across 32 Cancer Types». Nucleic Acids Research, November. https://doi.org/10.1093/nar/gkx1090.
- Roos, Andreas, Rachel Thompson, Rita Horvath, Hanns Lochmüller, y Albert Sickmann. 2017. «Intersection of Proteomics and Genomics to “Solve the Unsolved” in Rare Disorders Such as Neurodegenerative and Neuromuscular Diseases». Clinical Applications, October. https://doi.org/10.1002/prca.201700073.
- Wouters, Bradly G. 2008. «Proteomics: Methodologies and Applications in Oncology». Seminars in Radiation Oncology, Prognostic and Predictive Markers in Oncology, 18 (2):115-25. https://doi.org/10.1016/j.semradonc.2007.10.008.
- Jara, A, y J. J. Kopchick. 2013. «Proteomics: a comprehensive approach». Anales De Pediatria (Barcelona, Spain: 2003) 78 (3):137-39. https://doi.org/10.1016/j.anpedi.2012.10.007.
- Díez, Paula, Rafael Góngora, Alberto Orfao, y Manuel Fuentes. «Functional proteomic insights in B-cell chronic lymphocytic leukemia». Expert Review of Proteomics 14 (2):137-46. https://doi.org/10.1080/14789450.2017.1275967.
- Nesvizhskii AI. Proteogenomics: concepts, applications and computational strategies. Nature Methods. 2014;11:1114-1125. doi:10.1038/nmeth.3144.
- Sánchez-Carbayo M. 2007. «Use of antibodies arrays in the study of bladder cancer». Actas urologicas espanolas 31 (9):1082-88. https://doi.org/10.1016/S0210-4806(07)73769-5.
- Alfaro, Javier A, Ankit Sinha, Thomas Kislinger, y Paul C. Boutros. 2014. «Onco Proteogenomics: Cancer Proteomics Joins Forces with Genomics». Nature Methods 11 (11):1107-13. https://doi.org/10.1038/nmeth.3138.
- Díez, Paula, Conrad Droste, Rosa M Dégano, María González-Muñoz, Nieves Ibarrola, Martín Pérez-Andrés, Alba Garin-Muga, et al. 2015. «Integration of Proteomics and Transcriptomics Data Sets for the Analysis of a Lymphoma B-Cell Line in the Context of the Chromosome-Centric Human Proteome Project». Journal of Proteome Research 14 (9):3530-40. https://doi.org/10.1021/acs.jproteome.5b00474.
- Faulkner, Sam, Matt Dun, y Hubert Hondermarck. 2015. «Proteogenomics: Emergence and promise». Cellular and molecular life sciences : CMLS 72 (enero). https://doi.org/10.1007/s00018-015-1837-y.
- Díez P, Fuentes M.«Proteogenomics for the comprehensive analysis of human cellular and serum antibody repertories».Adv Exp Med Biol. 2016;926:153-162
- Song, Ehwang, Yuqian Gao, Chaochao Wu, Tujin Shi, Song Nie, Thomas L. Fillmore, Athena A. Schepmoes, et al. 2017. «Targeted proteomic assays for quantitation of proteins identified by proteogenomic analysis of ovarian cancer». Scientific Data 4 (julio). https://doi.org/10.1038/sdata.2017.91.
- Chambers, Matthew C, Pratik D Jagtap, James E Johnson, Thomas McGowan, Praveen Kumar, Getiria Onsongo, Candace R Guerrero, et al. 2017. «An Accessible Proteogenomics Informatics Resource for Cancer Researchers». Cancer Research 77 (21):e43-46. https://doi.org/10.1158/0008-5472.CAN-17-0331.
- Tabas-Madrid, Daniel, Joao Alves-Cruzeiro, Victor Segura, Elizabeth Guruceaga, Vital Vialas, Gorka Prieto, Carlos García, Fernando J Corrales, Juan Pablo Albar, y Alberto Pascual-Montano. «Proteogenomics Dashboard for the Human Proteome Project». Journal of Proteome Research 14 (9):3738-49. https://doi.org/10.1021/acs.jproteome.5b00466.
- Kim MS, et al. 2014. «A draft of the human proteome». Nature. 2014 May 29;509(7502):575-81. doi: 10.1038/nature13302.
- Wilhelm M, et al.«Mass-spectromety-based draft of the human proteome». 2014 May 29;509(7502):582-7. doi: 10.1038/nature13319.
- Gonzalez-Gonzalez, María, Ricardo Jara-Acevedo, Sergio Matarraz, María Jara-Acevedo, Sara Paradinas, J M Sayagües, Alberto Orfao, y Manuel Fuentes. «Nanotechniques in proteomics: Protein microarrays and novel detection platforms». European Journal of Pharmaceutical Sciences, 3rd ESF-UB Conference on Nanomedicine, 45 (4):499-506. https://doi.org/10.1016/j.ejps.2011.07.009.
- Dasilva, Noelia, Paula Díez, María González-González, Sergio Matarraz, J M Sayagués, Alberto Orfao, y Manuel Fuentes. «Protein Microarrays: Technological Aspects, Applications and Intellectual Property». Recent Patents on Biotechnology 7 (2):142-52.
- Hwang, Heeyoun, Gun Wook Park, Ji Yeong Park, Hyun Kyoung Lee, Ju Yeon Lee, Ji Eun Jeong, Sung-Kyu Robin Park, et al. 2017. «Next Generation Proteomic Pipeline for Chromosome-Based Proteomic Research Using NeXtProt and GENCODE Databases». Journal of Proteome Research, octubre. https://doi.org/10.1021/acs.jproteome.7b00223.
- Renuse, Santosh, Raghothama Chaerkady, y Akhilesh Pandey. «Proteogenomics». PROTEOMICS 11 (4):620-30. https://doi.org/10.1002/pmic.201000615.
- Shukla, Hem D. 2017. «Comprehensive Analysis of Cancer-Proteogenome to Identify Biomarkers for the Early Diagnosis and Prognosis of Cancer». Proteomes 5 (4). https://doi.org/10.3390/proteomes5040028.
MANUEL FUENTES is Group Leader of the Proteomics Facility at the Cancer Research Center, University of Salamanca, where his research is focused on biomarkers and drug discovery in haematological diseases, mainly for personalised medicine. He was previously at the Harvard Institute of Proteomics at Harvard Medical School. Dr Fuentes is co-author of 105 peer-reviewed papers in international journals, nine licensed international patents, 20 book chapters, and more than 50 invited lectures at national and international meetings.
ALICIA LANDEIRA completed her MSc in Cellular and Molecular Biology at the University of Salamanca, Spain. Her Masters final project, undertaken in the Group of Proteomics of the Cancer Research Center in Salamanca, focused on the differential characterisation of B Cell protein expression profiles in chronic lymphocytic leukaemia in order to identify biomarker candidates.
University of Salamanca