Proteogenomics research – on the frontier of precision medicine

Fuentes, Manuel; Landeira, Alicia

Proteogenomics research – on the frontier of precision medicine

15

SHARES

Share via

Posted: 14 December 2017 | Alicia Landeira, Javier Carabias, Jonatan García, Manuel Fuentes (Cancer Research Centre), Maria Gonzalez-Gonzalez, Paula Díez (Cancer Research Centre), Rafael Góngora, Rodrigo Garcia-Valiente | No comments yet

Proteogenomics is the systematic and comprehensive integration of proteomics with genomics and transcriptomics. Proteogenomics is opening new hallmarks in biomedical research. Recently, several studies have demonstrated the relevance of proteogenomics in cancer research. This article provides a brief review of the advantages of proteogenomics in precision medicine.

Proteogenomics research – on the frontier of precision medicine

The principle of ‘omics’ approaches is the analysis in high-throughput format of genes/mRNA/proteins/metabolites presented in a biological sample and are called genomics, transcriptomics, proteomics and metabolomics, respectively. Nowadays, these omics approaches are becoming more relevant due to their application in biomedicine (such as novel drugs, novel biomarkers, earlier diagnosis, novel therapeutic targets, etc).^1,2,3,4,5

In general, genomics is the field related to the massive characterisation of the genetic content presented within one cell of an organism,² as much for specific investigation of selected genes as for coding sequences or whole genomes from minimal amounts of DNA.⁶

In a similar manner, proteomics is related to the comprehensive characterisation of a cell at the protein level.² Currently, proteomics is based on a set of techniques to simultaneously analyse the presence and relative abundance of proteins in a particular biological sample,^7,8,9 which will allow us to develop a complete and quantitative map of the proteome of a species, including cellular localisation of proteins; reconstruction of its networks and complexes; and tracing signalling pathways and protein modifications.¹⁰

During the last decade, proteomics has experienced huge development, mainly due to:

Biological relevance: owing to better knowledge of the expression levels of proteins, changes in subcellular localisation and protein-protein interactions, and their post-translational modification – bearing in mind that the therapeutic targets are mostly proteins.^11,12
Development of high-throughput and massive analysis that allows the simultaneous detection of multiple proteins (including PTMs) in a single analysis. The Human Proteome Project (HPP) supports this systematic characterisation in order to help personalised medicine in five criteria: right patient/target, right diagnosis, right treatment, right drug/target and right dose/time.¹³

Recently, a new omics term, proteogenomics, has been coined as a consequence of all the developments in these fields. This word was used for the first time in literature in 2004 in a study published by Jaffe et al. The subject consists of the integration of proteomics with other omics, such as genomics and transcriptomics. Initially, proteogenomics was used to improve genomic annotation and characterisation of the protein-coding potential. Nowadays, it provides a unified vision of global understanding of cellular functions.^10,13,14,15

The potential of applied proteogenomics has been discussed and demonstrated in several studies, as much in humans as with other living model organisms such as Plasmodium falciparum, Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana and Anopheles gambiae.¹⁰ These show the great potential in the biological research and biomedical field of this cutting-edge technology as it could generate a better understanding of the correlation between genotypes and phenotypes, which could be useful to provide accurate diagnosis and therapy, as well as other correlations that could aid the understanding of underlying mechanisms of antibiotic resistance, tumour microenviroment, etc.^2,6,16

Proteogenomics and the Human Proteome Project

There is a well-known discrepancy between the level of mRNA and the predicted level of the encoded protein in a particular cell. This was confirmed by a study of global transcriptomics and proteomic analysis, which showed that approximately 30% of changes in mRNA levels could be correlated with protein levels. This discrepancy between transcriptomics and proteomics emphasises the relevance of post-translational modifications.^5,14,17

In addition, the existence or deficiency of some post-translational modifications, such as glycosylation, phosphorylation, acetylation, or ubiquitinylation has a significant impact on protein stability (altering the half-life of proteins) and adds more complexity to the protein component of a cell.

Bearing this in mind, the content of the proteome is highly complex and highly dynamic; thus, proteomics analysis is required because this information cannot be deduced from genomics analysis.^2,9,13,14

Recently, the first draft of the Human Proteome was published.² The project began the discussion by the Human Proteome Organization (HUPO) in 2008, but did not start until 2010. It has as mission to provide a map relating to cell molecular architecture based on proteins of the human body. For this, the project was divided into two programmes: one based on chromosomes or C-HPP that allows characterisation of the human proteome, and another based on the biology/disease, or B/D-HPP.¹⁸

One of the main conclusions was around the protein complexity based on the compartmentalisation in cells, tissues and organs (around 200 types of cells form tissues and organs in a body).^2,13,19,20 In general, the advances in this project are directly related to the progress in mass spectrometry and protein microarrays because both methodologies have increased the sensitivity for identification and evaluation of the proteins in high-throughput format.^2,18,21,22 Moreover, novel bioinformatics tools have been designed and developed in order to cover the requirements of data analysis from these methodologies. However, there is also a growing concern about the processing capability of such data (because this information is on a large scale) and determining the false positive rate, particularly regarding new peptides.^6,10

Proteogenomics integration from multi-omics datasets

Regarding the integration of multi-omics data sets, it is important to highlight a few proteomics aspects that are quite different from genomics and/ or transcriptomics:

Proteins require isolation or purification steps, which can be tedious and inefficient. In addition, there is a lack of specific amplification steps for proteins similar to that for DNA/ RNA amplification.
Availability of selective and specific affinity reagents for all the proteins, among the alterations of antibody recognition caused by post-translational modifications.^2,14

Among the aspects previously mentioned, the environment and external stimulus play a critical role in protein expression patterns. As a consequence, in cells with similar or identical DNA content, the set of expressed proteins would be different according to the environmental conditions.^8,19

Thus, the essential first step in proteogenomics is the creation of curated databases of protein/ peptide sequences from well-established genomic databases for specific models.^5,13,14,17

At present, this approach is based on the systematic analysis of datasets related to:^2,5,17,23

a) Epigenetic regulation and DNA

b) miRNAs and RNA expression

c) Proteins and PTMs.

Consequently, proteogenomics requires the integration of different softwares that integrate information from different omics and can be used easily. Furthermore, it must allow intensive calculations from large datasets and be capable of addressing false positives as well as the need to confirm the novelty of putative variants identified.¹⁷

So far, most of the annotations on genomic datasets have been based mainly on predictions, where many genomic sequences are in the public domain because of resources developed by the National Center for Biotechnology Information (NCBI) and Ensembl. In the case of Homo sapiens, no matter what annotations were obtained manually and by automation, the three main groups are NCBI, Ensembl and HAVANA. However, while such efforts provide more specific genetic notes, the corresponding relationship between genetics and proteins is still not ensured. For this reason, mass spectrometry is a powerful tool for analysis of proteins and to provide information for the correlation with the genome annotation.²⁴

In this context, and considering the required time for thorough analysis of a particular proteome, dedicated software designed for proteogenomic tests is required and this allows us to search the datasets of proteins²⁴ in order to provide an overall vision of the molecular scenery surrounding genes to proteins in a specific biological situation. The proteogenomics revolution in personalised oncology medicine After completion of the human genome, a great number of genes were mapped and had their specific role in disease progression described. Moreover, these included the role of the environment.²⁵

In relation to cancer, these type of studies revealed the great complexity and heterogeneity of the genome, helping to trace the dysfunctional profile of the transcriptome.¹² Thus, genetic analysis can indicate the type or subtype of cancer that a person has. All this has been an important step forward in the area of precision medicine through next-generation sequencing and bioinformatics analysis.²⁵

In addition, the great advances in genomic and proteomic technologies that have emerged from this field of medicine show promise for diagnostic and treatment options for such diseases. Currently, genomic and proteomic biomarkers are being used to determine which individualised treatment is best suited for each patient.²⁵

In this context, proteogenomics has emerged as a useful tool in cancer research because it integrates genomic and transcriptomic data tests with mass spectrometry. It also allows the identification of variant protein sequences that may have functional roles in cancer.¹⁷ This approximation, also known as onco-proteogenomics, takes advantage of how the information as a whole can aid the understanding of physio-pathologies through validation and refinement of known genes, identification of other novel genes and splice isoforms, validation of exons, assigning correct start sites, etc. Furthermore, proteogenomics informs us about the impact of present genomic modifications on biochemical cascades through identifi cation of variant proteins that would cause the pathology.^6,10,12,24

Related to cancer progression, it could be a promising tool because genomic alterations (mutations, methylation, copy number aberrations, and/or translocation, etc) are directly related to tumour pathologies.^2,4,12,14 As a consequence of the genomics changes, proteins are directly and indirectly aff ected in these cells, which could be active drivers to disease initiation, progression and/or response to treatment.^2,12,13

Currently, Cancer Genome Consortium and Cancer Genome Atlas (TCGA) are projects that have demonstrated the correlation between genotypes and phenotypes with the protein profiles throughout deep sequencing of the genome. For this reason, the advent of high-throughput proteomics approaches (such as shot-gun MS/MS and protein microarrays) enables this correlation to be revealed and characterised for a specific tumour. Moreover, this helps to improve the global molecular knowledge of cancer cells.²

In this context, the National Cancer Institute (NCI) initiated the Clinical Proteomic Tumor Analysis Consortium (CPTAC) in 2011 to accelerate basic cancer molecular knowledge. The main aim of this analysis is to improve the ability to prevent, diagnose and treat disease through proteomics studies to main human cancers that are characterised by the TCGA.^5,14

One of the achievements of the CPTAC has been deciphering novel molecular relationships within colorectal and breast cancer using these proteogenomics approaches. This allows the detection and quantification of proteins that are related and/or correlated to genomic abnormalities.²

The advances in proteomics have led to increased sensitivity, viability and accuracy. The detection and quantification of abnormal proteins have several challenges, including the following:

Statistical models are necessary that can discern between ‘passenger’ and ‘driver’ mutations¹²
Integration of all omics data is necessary for understanding cancer cell phenotype and to discover specific biomarkers¹²
Not all peptides contained in a sample can be detected, especially for low abundance proteins, for two main reasons:
- The abundance of proteins is based on the dynamics of cellular and subcellular processes. It is known that there is a relation between mRNA and protein abundance, but it is only partially connected.
- In relation to the lost function of some proteins (due to bad folding causes) protein degradation arises firstly and following a reduction of protein abundance.^6,12

For these reasons, onco-proteogenomics is a promising tool, not only for the understanding of pathologies like cancer, but also in the field of molecular biology.

References

Dasilva, Noelia, Paula Díez, Sergio Matarraz, María González-González, Sara Paradinas, Alberto Orfao, y Manuel Fuentes. «Biomarker Discovery by Novel Sensors Based on Nanoproteomics Approaches». Sensors 12 (2):2284-2308. https://doi.org/10.3390/s120202284.
Sajjad, Wasim, Muhammad Rafiq, Barkat Ali, Muhammad Hayat, Sahib Zada, Wasim Sajjad, y Tanweer Kumar. 2016. «Proteogenomics: New Emerging Technology». HAYATI Journal of Biosciences 23 (3):97-100. https://doi.org/10.1016/j.hjb.2016.11.002.
Rodríguez-Cerdeira, Carmen, Alberto Molares-Vila, Miguel Carnero-Gregorio, y Alberte Corbalán-Rivas. «Recent advances in melanoma research via “omics” platforms». Journal of Proteomics, November. https://doi.org/10.1016/j.jprot.2017.11.005.
Galicia N, Dégano R, Díez P, González-González M, Góngora R, Ibarrola N, Fuentes M. «CSF analysis for protein biomarker identification in patients with leptomeningeal metastases from CNS lymphoma». Expert Rev Proteomics. 2017 Apr;14(4):363-372. doi: 10.1080/14789450.2017.1307106.
Vasaikar, Suhas V, Peter Straub, Jing Wang, y Bing Zhang. 2017. «LinkedOmics: Analyzing Multi-Omics Data within and across 32 Cancer Types». Nucleic Acids Research, November. https://doi.org/10.1093/nar/gkx1090.
Roos, Andreas, Rachel Thompson, Rita Horvath, Hanns Lochmüller, y Albert Sickmann. 2017. «Intersection of Proteomics and Genomics to “Solve the Unsolved” in Rare Disorders Such as Neurodegenerative and Neuromuscular Diseases». Clinical Applications, October. https://doi.org/10.1002/prca.201700073.
Wouters, Bradly G. 2008. «Proteomics: Methodologies and Applications in Oncology». Seminars in Radiation Oncology, Prognostic and Predictive Markers in Oncology, 18 (2):115-25. https://doi.org/10.1016/j.semradonc.2007.10.008.
Jara, A, y J. J. Kopchick. 2013. «Proteomics: a comprehensive approach». Anales De Pediatria (Barcelona, Spain: 2003) 78 (3):137-39. https://doi.org/10.1016/j.anpedi.2012.10.007.
Díez, Paula, Rafael Góngora, Alberto Orfao, y Manuel Fuentes. «Functional proteomic insights in B-cell chronic lymphocytic leukemia». Expert Review of Proteomics 14 (2):137-46. https://doi.org/10.1080/14789450.2017.1275967.
Nesvizhskii AI. Proteogenomics: concepts, applications and computational strategies. Nature Methods. 2014;11:1114-1125. doi:10.1038/nmeth.3144.
Sánchez-Carbayo M. 2007. «Use of antibodies arrays in the study of bladder cancer». Actas urologicas espanolas 31 (9):1082-88. https://doi.org/10.1016/S0210-4806(07)73769-5.
Alfaro, Javier A, Ankit Sinha, Thomas Kislinger, y Paul C. Boutros. 2014. «Onco Proteogenomics: Cancer Proteomics Joins Forces with Genomics». Nature Methods 11 (11):1107-13. https://doi.org/10.1038/nmeth.3138.
Díez, Paula, Conrad Droste, Rosa M Dégano, María González-Muñoz, Nieves Ibarrola, Martín Pérez-Andrés, Alba Garin-Muga, et al. 2015. «Integration of Proteomics and Transcriptomics Data Sets for the Analysis of a Lymphoma B-Cell Line in the Context of the Chromosome-Centric Human Proteome Project». Journal of Proteome Research 14 (9):3530-40. https://doi.org/10.1021/acs.jproteome.5b00474.
Faulkner, Sam, Matt Dun, y Hubert Hondermarck. 2015. «Proteogenomics: Emergence and promise». Cellular and molecular life sciences : CMLS 72 (enero). https://doi.org/10.1007/s00018-015-1837-y.
Díez P, Fuentes M.«Proteogenomics for the comprehensive analysis of human cellular and serum antibody repertories».Adv Exp Med Biol. 2016;926:153-162
Song, Ehwang, Yuqian Gao, Chaochao Wu, Tujin Shi, Song Nie, Thomas L. Fillmore, Athena A. Schepmoes, et al. 2017. «Targeted proteomic assays for quantitation of proteins identified by proteogenomic analysis of ovarian cancer». Scientific Data 4 (julio). https://doi.org/10.1038/sdata.2017.91.
Chambers, Matthew C, Pratik D Jagtap, James E Johnson, Thomas McGowan, Praveen Kumar, Getiria Onsongo, Candace R Guerrero, et al. 2017. «An Accessible Proteogenomics Informatics Resource for Cancer Researchers». Cancer Research 77 (21):e43-46. https://doi.org/10.1158/0008-5472.CAN-17-0331.
Tabas-Madrid, Daniel, Joao Alves-Cruzeiro, Victor Segura, Elizabeth Guruceaga, Vital Vialas, Gorka Prieto, Carlos García, Fernando J Corrales, Juan Pablo Albar, y Alberto Pascual-Montano. «Proteogenomics Dashboard for the Human Proteome Project». Journal of Proteome Research 14 (9):3738-49. https://doi.org/10.1021/acs.jproteome.5b00466.
Kim MS, et al. 2014. «A draft of the human proteome». Nature. 2014 May 29;509(7502):575-81. doi: 10.1038/nature13302.
Wilhelm M, et al.«Mass-spectromety-based draft of the human proteome». 2014 May 29;509(7502):582-7. doi: 10.1038/nature13319.
Gonzalez-Gonzalez, María, Ricardo Jara-Acevedo, Sergio Matarraz, María Jara-Acevedo, Sara Paradinas, J M Sayagües, Alberto Orfao, y Manuel Fuentes. «Nanotechniques in proteomics: Protein microarrays and novel detection platforms». European Journal of Pharmaceutical Sciences, 3rd ESF-UB Conference on Nanomedicine, 45 (4):499-506. https://doi.org/10.1016/j.ejps.2011.07.009.
Dasilva, Noelia, Paula Díez, María González-González, Sergio Matarraz, J M Sayagués, Alberto Orfao, y Manuel Fuentes. «Protein Microarrays: Technological Aspects, Applications and Intellectual Property». Recent Patents on Biotechnology 7 (2):142-52.
Hwang, Heeyoun, Gun Wook Park, Ji Yeong Park, Hyun Kyoung Lee, Ju Yeon Lee, Ji Eun Jeong, Sung-Kyu Robin Park, et al. 2017. «Next Generation Proteomic Pipeline for Chromosome-Based Proteomic Research Using NeXtProt and GENCODE Databases». Journal of Proteome Research, octubre. https://doi.org/10.1021/acs.jproteome.7b00223.
Renuse, Santosh, Raghothama Chaerkady, y Akhilesh Pandey. «Proteogenomics». PROTEOMICS 11 (4):620-30. https://doi.org/10.1002/pmic.201000615.
Shukla, Hem D. 2017. «Comprehensive Analysis of Cancer-Proteogenome to Identify Biomarkers for the Early Diagnosis and Prognosis of Cancer». Proteomes 5 (4). https://doi.org/10.3390/proteomes5040028.

Biography

MANUEL FUENTES is Group Leader of the Proteomics Facility at the Cancer Research Center, University of Salamanca, where his research is focused on biomarkers and drug discovery in haematological diseases, mainly for personalised medicine. He was previously at the Harvard Institute of Proteomics at Harvard Medical School. Dr Fuentes is co-author of 105 peer-reviewed papers in international journals, nine licensed international patents, 20 book chapters, and more than 50 invited lectures at national and international meetings.

ALICIA LANDEIRA completed her MSc in Cellular and Molecular Biology at the University of Salamanca, Spain. Her Masters final project, undertaken in the Group of Proteomics of the Cancer Research Center in Salamanca, focused on the differential characterisation of B Cell protein expression profiles in chronic lymphocytic leukaemia in order to identify biomarker candidates.

Related conditions
Cancer

Related organisations
University of Salamanca

Cookie	Type	Duration	Description
cookielawinfo-checkbox-advertising-targeting	persistent	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	persistent	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	session	1 year	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	persistent	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	session	1 year	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Type	Duration	Description
advanced_ads_browser_width	persistent	1 month	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	persistent	2 years	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	persistent	1 month	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	persistent	1 year	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	persistent	2 years	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	persistent	2 years	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	persistent	3 months	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	persistent	1 month	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	persistent	5 months	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
bcookie	persistent	2 years	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	persistent	30 minutes	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	session	1 year	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	persistent	1 day	This cookie is set by LinkedIn and used for routing.
lissc	persistent	11 months	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	persistent	2 years	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	persistent	2 years	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	persistent	20 minutes	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	persistent	20 minutes	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	persistent	20 minutes	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	persistent	2 years	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	persistent	1 minute	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	persistent	1 day	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Type	Duration	Description
cf_ob_info	persistent	1 minute	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	persistent	1 minute	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	session	1 year	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	persistent	1 month	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	persistent	Until cleared	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	session	1 year	This cookie is set by Youtube and is used to track the views of embedded videos.

Recommended

Proteogenomics research – on the frontier of precision medicine

Proteogenomics and the Human Proteome Project

Proteogenomics integration from multi-omics datasets

References

Biography

Leave a Reply Cancel reply

Recommended

Proteogenomics research – on the frontier of precision medicine

Proteogenomics and the Human Proteome Project

Proteogenomics integration from multi-omics datasets

References

Biography

Solving the AOC puzzle: Strategies for chemistry, manufacturing and regulatory success

FGF19 hormone could be the key to new obesity treatments

Future-proofing drug development with GenAI

Meet the hemifusome: a new organelle with big impact

Disabling the SETD1B enzyme halts leukaemia cell growth

Leave a Reply Cancel reply