Designing proteins from scratch with computer science

Baiget-Francesch, Marc

Designing proteins from scratch with computer science

68

SHARES

Share via

Posted: 17 September 2019 | Marc Baiget-Francesch (Pharmaceutical Engineer) | No comments yet

Marc Baiget-Francesch highlights interesting developments in the field of protein drug design and explains how continual software improvements are speeding up the process.

PROTEINS ARE one of the most versatile biomolecules that exist. From structural functions – the most abundant class of proteins – to pathogen destruction (antibodies) and metabolic activities (enzymes), proteins are responsible for a wide array of functions. Consequently, protein malfunctions can create severe disorders in a host organism. Alzheimer’s and Parkinson’s diseases, for example, result from the presence of misfolded proteins;¹ Becker Muscular Dystrophy and Crohn’s disease are caused by the production of an abnormally short-sized protein;^2,3 and Phenylketonuria is the consequence of a missing protein.⁴

As proteins are encoded by genes, one of the most common approaches to tackle these kinds of disease is to focus on the defective genes. In the long-term view, this idea offers one of the most promising solutions; however, manipulating genes is rather complicated and has presented significant challenges so far, such as unwanted immune responses, complicated gene release, unstable expression, upstream processing and lack of sufficient facilities for viral vector production.^5,6 Given these complications, dealing with proteins – while by no means easy – presents a more straight-forward solution: using proteins to deal with protein problems seems a logical approach.

Protein design has not always entailed designing new proteins from scratch. As with so many scientific fields, nature has always been a major source of inspiration. Antibodies are an example of this; in order to make antibodies that resemble the ones our bodies produce, antibodies from other animal species have been slightly modified, as is the case with chimeric and humanised antibodies (the latter being the most similar ones to human). Alemtuzumab and Mepozulimab are examples of two humanised antibodies that have reached the market. Alemtuzumab has been commercialised by Sanofi under the name of Lemtrada for the treatment of multiple sclerosis and Mepolizumab, from GlaxoSmithKlein, has been launched under the name of Nucala to treat eosinophilic asthma.^7,8 While Nucala, first authorised in 2015, is still being monitored by the European Medicines Agency (EMA) to further assess its safety, Lemtrada, which received its first authorisation in 2013, has been restricted temporarily in April 2019 by the EMA while it investigates some unexpected side effects.⁹

The temporary restriction of Lemtrada shows that designing new molecules, even if just slightly modified, can be more complicated than it seems. However, the increasing advances of computer science have revolutionised this field. If mimicking natural proteins was the main focus for new protein design, the in silico approach is increasingly becoming the go-to mechanism for designing synthetic biomolecules.

Computer simulations, while perhaps not as accurate as in vitro or in vivo experimentation, facilitate the exploration of thousands of molecular interactions in a short amount of time, saving significant sums of money and resources. With respect to antibodies, in 2017 a research group from the University of Texas in Austin, led by Dr Jennifer A Maynard, used computer science to design new antibody complementarity determining regions – the part responsible for the antibody-antigen interaction.¹⁰ Similar to humanised antibodies, only a small fragment of the antibody was changed. To design their sequences, Maynard’s group used PyMOL – an open-source molecular visualisation tool, which did not exist when Campath-1, the precursor of Alemtuzumab, was designed.¹¹

…the efficacy of new drugs and reducing the time they take to reach the clinical phase is vital to bring new pharmaceuticals to the market as quickly as possible”

PyMOL allows the user to visualise the structure of small to large biomolecules and simulate interactions between different molecules.¹² Aside from PyMOL, which is popular among molecular biologists, other software packages have been developed to aid the design of new biomolecules. In many cases, researchers use a combination of different software packages. EvoDesign, for instance, is a computational algorithm that is also used to design new proteins. From an initial protein scaffold, EvoDesign helps researchers identify protein families with similar three-dimensional (3D) structures and folds.¹³ This is a powerful tool with which to preview protein-protein interactions and protein folds of newly designed structures. One of its principal advantages is the use of evolutionary designs in contrast to physics-based approaches, which are less accurate at picturing atomic interactions and folds.¹⁴ In addition, the algorithm is continually being improved and new servers are created to enhance its functionality.¹⁵

Rosetta is another popular software among those in the field of synthetic biology and one of the most extensively developed softwares for de novo design. Rosetta facilitates 3D structure prediction of proteins, redesigns existing structures and models new proteins from scratch.¹⁶ However, there are few well-established protocols in the de novo design of proteins, so its success (which applies to all protein design software) relies mostly on the user’s knowledge of protein design principles. It is predominantly the combination of extensive protein science knowledge and use of bioinformatics tools that have sped up the process of de novo protein design.

Some research groups specialising in protein design have written papers about its principles, including David Baker and Daniel-Adriano Silva’s groups, both from the Institute for Protein Design at the University of Washington.^17,18 In fact, David Baker’s group recently delivered an interesting feature in this field: the creation of a bioactive protein switch.^19,20 This complex system, named LOCKR (Latching Orthogonal Cage/Key pRotein), responds to environmental stimuli and has been used for many purposes, from inducing cell death to moving material in both yeast and human cells. One of the main characteristics of this newly-designed protein is that it only activates its mechanism if a key molecule interacts with the protein.

The implications of this new design are extensive. In addition to the function of the LOCKR protein itself, the significance of this discovery is that it highlights what can be achieved by combining computer science with protein design. This discovery sets a precedent, marking the transition from just imitating what already exists to designing proteins with unique functions. Furthermore, advances in protein design software are enabling scientists to overcome the limitations of traditional methods for structural determination, such as X-ray crystallography and nuclear magnetic resonance spectroscopy, which are usually very time-consuming. This is especially relevant in the pharmaceutical field, where many proteins are used as pharmaceuticals. The drug discovery process is very long, especially given that drugs must undergo strict, lengthy clinical trials before hitting the market – and even then, there is no guarantee that the product will not cause side effects, as we have seen with Lemtrada. For this reason, enhancing the efficacy of new drugs and reducing the time they take to reach the clinical phase is vital to bring new pharmaceuticals to the market as quickly as possible. Generating several new molecular models at the same time and simulating protein interactions in silico will certainly aid this endeavour: it appears that we are approaching the dawn of a revolution in synthetic biology.

About the author

Marc Baiget-Francesch graduated as an MSc in pharmaceutical engineering and design in 2017 from the Technical University of Denmark (DTU). He participated at the SensUs competition twice as a student team co-ordinator designing biosensors for creatinine and NT-proBNP.

References

Irvine GB, et al. Protein Aggregation in the Brain: The Molecular Basis for Alzheimer’s and Parkinson’s Diseases. Molecular Medicine, vol. 14, no. 7-8, 2008, pp. 451–464., doi:10.2119/2007-00100.irvine.
Becker Muscular Dystrophy (BMD). Muscular Dystrophy Association, Muscular Dystrophy Association, 31 Jan. 2018, www.mda.org/disease/becker-muscular-dystrophy.
Ogura Y, et al. A Frameshift Mutation in NOD2 Associated with Susceptibility to Crohns Disease. Nature, vol. 411, no. 6837, 2001, pp. 603–606., doi:10.1038/35079114.
Phenylketonuria: MedlinePlus Medical Encyclopedia. MedlinePlus, U.S. National Library of Medicine, medlineplus.gov/ency/article/001166.htm.
Carbonell R, et al. A Technology Roadmap For Today’s Gene Therapy Manufacturing Challenges. Www.cellandgene.com, 18 Apr. 2019, www.cellandgene.com/doc/a-technology- roadmap-for-today-s-gene-therapy-manufacturing- challenges-0001.
Gonçalves GAR, Paiva RdMA. Gene Therapy: Advances, Challenges and Perspectives. Einstein (São Paulo), vol. 15, no. 3, 2017, pp. 369–375., doi:10.1590/s1679- 45082017rb4024.
Lemtrada Product Information. European Medicines
Agency, 2018, www.ema.europa.eu/en/documents/product- information/lemtrada-epar-product-information_en.pdf.
Nucala Product Information. European Medicines Agency, 2015, www.ema.europa.eu/en/documents/product- information/nucala-epar-product-information_en.pdf.
Francisco EM. Use of Multiple Sclerosis Medicine Lemtrada Restricted While EMA Review Is Ongoing. European Medicines Agency, 12 Apr. 2019, www.ema.europa.eu/en/ news/use-multiple-sclerosis-medicine-lemtrada-restricted- while-ema-review-ongoing.
Entzminger KC, et al. De Novo Design of Antibody Complementarity Determining Regions Binding a FLAG Tetra-Peptide. Scientific Reports, vol. 7, no. 1, 2017, doi:10.1038/s41598-017-10737-9.
Riechmann L, et al. Reshaping Human Antibodies for Therapy. Nature, vol. 332, no. 6162, 1988, pp. 323–327., doi:10.1038/332323a0.
PyMOL, pymol.org/2/.
EvoDesign: De Novo Protein Design Based on Structural and Evolutionary Profiles. Zhang Lab, zhanglab.ccmb.med. umich.edu/EvoDesign/.
Mitra P, et al. EvoDesign: De Novo Protein Design Based on Structural and Evolutionary Profiles. Nucleic Acids Research, vol. 41, no. W1, 2013, doi:10.1093/nar/gkt384.
Pearce R, et al. “EvoDesign: Designing Protein–Protein Binding Interactions Using Evolutionary Interface Profiles in Conjunction with an Optimized Physical Energy Function.” Journal of Molecular Biology, vol. 431, no. 13, 2019, pp. 2467–2476., doi:10.1016/j.jmb.2019.02.028.
The Rosetta Software. RosettaCommons, www. rosettacommons.org/software.
MarcosE,SilvaDA.EssentialsofDeNovoProteinDesign: Methods and Applications. Wiley Interdisciplinary Reviews: Computational Molecular Science, vol. 8, no. 6, 2018, doi:10.1002/wcms.1374.
Koga N, et al. Principles for Designing Ideal Protein Structures. Nature, vol. 491, no. 7423, 2012, pp. 222–227., doi:10.1038/nature11600.
Langan RA, et al. De Novo Design of Bioactive Protein Switches. Nature, vol. 572, no. 7768, 2019, pp. 205–210., doi:10.1038/s41586-019-1432-8.
Ng AH, et al. Modular and Tunable Biological Feedback Control Using a De Novo Protein Switch. Nature, vol. 572, no. 7768, 2019, pp. 265–269., doi:10.1038/s41586- 019-1425-7.

Related organisations
Thermo Fisher Scientific

Cookie	Type	Duration	Description
cookielawinfo-checkbox-advertising-targeting	persistent	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	persistent	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	session	1 year	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	persistent	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	session	1 year	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Type	Duration	Description
advanced_ads_browser_width	persistent	1 month	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	persistent	2 years	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	persistent	1 month	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	persistent	1 year	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	persistent	2 years	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	persistent	2 years	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	persistent	3 months	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	persistent	1 month	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	persistent	5 months	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
bcookie	persistent	2 years	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	persistent	30 minutes	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	session	1 year	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	persistent	1 day	This cookie is set by LinkedIn and used for routing.
lissc	persistent	11 months	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	persistent	2 years	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	persistent	2 years	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	persistent	20 minutes	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	persistent	20 minutes	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	persistent	20 minutes	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	persistent	2 years	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	persistent	1 minute	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	persistent	1 day	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Type	Duration	Description
cf_ob_info	persistent	1 minute	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	persistent	1 minute	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	session	1 year	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	persistent	1 month	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	persistent	Until cleared	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	session	1 year	This cookie is set by Youtube and is used to track the views of embedded videos.

Recommended

Designing proteins from scratch with computer science

About the author

Leave a Reply Cancel reply

Recommended

Designing proteins from scratch with computer science

About the author

Advancing obesity drug discovery: Cell-based assays for GLP-1 and the G-Suite

Disabling the SETD1B enzyme halts leukaemia cell growth

Protein folding milestone achieved with quantum tech

Bird flu is changing – AI might help us keep up

Why PARP inhibitors fail: key role of the CST complex in BRCA1-deficient cancers

Leave a Reply Cancel reply