Part three: pragmatic guidelines to getting the best out of LLMs
Posted: 24 July 2024 | Dr Raminderpal Singh (Hitchhikers AI and 20/15 Visioneers) | No comments yet
There have been a slew of announcements over the past few months from AI-led biotechs around the potential of Large Language Models (LLM) in early drug discovery. In the third of a three-part series, Dr Raminderpal Singh presents some pragmatic guidelines for scientists in accessing and obtaining value from LLMs.
In our previous article, published Monday 15 July, we presented a simple case example to download and practise with ChatGPT,1 or other accessible LLM systems. In this article, we address the challenging topic of guidelines that will lead to useful scientific insights. Thank you to Nina Truter2 for her support.
The guidelines are separated into two sections: those specific to using ChatGPT, and those to use when deciding on which LLM system to use. The former section allows the reader to begin now, with the recommendation not to share any confidential information on the tool. The latter section allows the reader to think ahead about an affordable, private and trustworthy system for the next couple of years.
Pragmatic guidelines when using ChatGPT
Note that many of these recommendations reflect the need to upload your own .pdf and .csv files, when using ChatGPT.
- Where measurement data is used in your uploaded documents, label them as variables in your queries.
- In your queries, do not embed statements within statements – bring everything out in simple single purpose statements.
- Where ChatGPT stops because of long answers, break up the prompts and ask it to export to .csv or .pdf instead of printing on the screen. If it still stops part way through, type “Please continue from where you left off and finish the answer.”
- When the results are inaccurate and you need to re-run the query, try switching to a new ChatGPT window/chat. Another trick is to clear short-term memory within a prompt workflow, using this command “From now on, assume [new context] without considering our previous conversation.” For example, “From now on, assume we are talking about human-only clinical trials without considering our previous conversation.”
Pragmatic guidelines when deciding what LLM system to use
- LLM systems are made up of LLM models (for example, OpenAI3 and Claude4 ) and software frameworks (for example, Autogen Studio5 and Open Web UI6 ). Both these technology types are evolving rapidly, with multiple offerings. Any system you select needs to be adaptable to technology changes, for example if an effective drug discovery LLM model is offered as open-source next year.
- In addition to a rapidly evolving technology landscape, it is important to understand the changing balance between commercial and open-source offerings. As with other industries, these words of wisdom stand (relatively) true: “Wait a year, and someone will offer it for free!” It is important to be agile and respond to changes in this balance.
- Your LLM system may need to support both research-intensive activities, such as extracting new insights from large amounts of (.pdf) research publications, and workflow-driven tasks, such as designing a lab experiment. These are different problems that need to be solved with different approaches. A useful tip here is to design your system with a human performing the proposed LLM functions, and then swap the LLM technology in for the human once the design is ready.
The above guidelines are the tip of the iceberg and will hopefully help you get started. Please reach out to the author if you have any questions.
References
1 ChatGPT. Available at: https://chatgpt.com/
2 Nina Truter. LinkedIn. Available at: https://www.linkedin.com/in/nina-truter/
3 OpenAI. Available at: https://openai.com/
4 Claude. Available at: https://claude.ai/
5 Autogen Studio. Available at: https://microsoft.github.io/autogen/blog/2023/12/01/AutoGenStudio/
6 Open Web UI. Available at: https://docs.openwebui.com/
About the author
Dr Raminderpal Singh
Dr Raminderpal Singh is a recognised key opinion leader in the techbio industry. He has over 30 years of global experience leading and advising teams on building computational modelling systems that are both cost-efficient and have significant IP value. His passion is to help early to mid-stage life sciences companies achieve novel biological breakthroughs through the effective use of computational modelling.
Raminderpal is currently leading the HitchhikersAI.org open-source community, accelerating the adoption of AI technologies in early drug discovery. He is also CEO and co-Founder of Incubate Bio – a techbio providing a service to life sciences companies who are looking to accelerate their research and lower their wet lab costs through in silico modelling.
Raminderpal has extensive experience building businesses in both Europe and the US. As a business executive at IBM Research in New York, Dr Singh led the go-to-market for IBM Watson Genomics Analytics. He was also Vice President and Head of the Microbiome Division at Eagle Genomics Ltd, in Cambridge. Raminderpal earned his PhD in semiconductor modelling in 1997. He has published several papers and two books and has twelve issued patents. In 2003, he was selected by EE Times as one of the top 13 most influential people in the semiconductor industry.
For more: http://raminderpalsingh.com; http://hitchhikersAI.org; http://incubate.bio
Related topics
Artificial Intelligence, Drug Discovery, Drug Discovery Processes, Informatics
Related organisations
HitchhikersAI