AI Coding Assistants Transform Bioinformatics Learning in Drug Discovery

Dr Raminderpal Singh interviews Dr Jack Scannell on his six-week transformation from coding spectator to genomic data practitioner.

For decades, many scientists have found themselves in an uncomfortable position: they understand the value of computational analysis, they’ve dabbled in coding at various points in their careers, but the activation energy required to actually do new computational work today feels insurmountable. Dr Jack Scannell, CEO of a small early-stage biotech and renowned for his work on R&D productivity, recently experienced a transformation that challenges this paradigm. In just six weeks, he added practical bioinformatics capabilities to his scientific toolkit – not through months of coursework, but through strategic use of AI coding assistants.

The activation energy problem

Scannell’s coding journey mirrors that of many scientists. During his PhD in neuroscience at UC Davis and Oxford in the early 1990s, he learnt BASIC and C writing codes for visual stimuli generation and data analysis. Through his academic career, he used MATLAB, Mathematica, SPSS and SAS for various analytical tasks. Then came a long hiatus.

“When I went to work for consulting firms and later in drug and biotech investment, if we analysed things, it would almost always be in Excel,” Scannell explains. While he occasionally returned to Mathematica for academic papers and simple mathematical simulations, the pattern was consistent: “I did it so infrequently that there was always a huge activation energy problem. If you don’t do any coding for a couple of years, when you start again, it’s not quite like starting from scratch, but it’s still a slow and painful process.”

This is the crux of the problem for many experienced scientists. They have a conceptual understanding and often some experience of coding, but the gap between understanding coding requirements and actually doing it feels insurmountable. As Scannell notes: “For the first few days, you don’t really get anything done, other than trying to vaguely remember how coding works.”

The emotional dimension matters too. “Why am I reading about simple syntax here?” Scannell asks, articulating a common frustration. “How is that a good use of my time? I know I should do some coding, but there are lots of twelve-year-old kids who are a hundred times better than me.”

The initial awakening

Scannell’s first ‘aha’ moment came 18-24 months ago with ChatGPT. His interest was scientific rather than recreational: he wanted to model iterative search processes through complicated chemical spaces for drug discovery. While the problem had always interested him, he lacked the activation energy to get started.

“I thought, well, hang on. I can describe the chemical search problem that I would like to simulate.” he recalls. He wrote a couple of paragraphs giving a formal verbal description of the task and asked ChatGPT to generate Mathematica code to implement it. “It got it right first or second time, or at least it produced something that looked like the sort of thing I wanted. I knew if I had tried to do it myself, having not done any Mathematica coding for a year or two, it would have taken me a week to get past the syntax errors.”

This initial success was promising but did not immediately transform his practice; it would take a concrete business need to push him into sustained engagement with AI coding tools.

The real-world challenge

That business need emerged from Scannell’s work running a small biotech that consisted of five founders, $1.8 million raised capital and roughly 2.5 FTEs. Human genetic validation has become increasingly important in drug R&D. If you can show that humans with genetic variations in a target protein have different disease-relevant phenotypes, your drug target is far more likely to be valid, rather than an illusion from epidemiological studies or animal models.

However, the best information is no longer found in academic papers but in large data repositories like the MRC-IEU OpenGWAS or the NHGRI-EBI GWAS Catalog. Accessing this information requires building a complete analytical workflow: downloading large datasets (tens of millions of genetic polymorphisms), handling a variety of data formats where columns are labelled differently and genes are annotated or mapped differently, then applying standard analytical packages to extract meaningful insights.

“I basically needed to build a workflow to do that,” recalls Scannell. “That was a more systematic and larger effort and is where I got into what you might call ‘vibe coding’.”

The tool journey: ChatGPT to Claude

Scannell initially attempted this work with ChatGPT, given his earlier positive experience with simple Mathmatica simulations. However, the results were frustrating.

“ChatGPT seemed quite good at understanding the bioinformatics literature and having a general idea about what needs to be done,” he notes. “But it wasn’t very good at organising the different parts of the project.” His approach was to make things modular – downloading data before creating a series of scripts and intermediate data files for each analysis step. This allows understanding, debugging and maintenance.

I found Claude much better at structuring the workflow as a whole, so that you’ve got sensible intermediate data files and scripts that were interoperable and which made it easy to do the stepwise analyses.

Scannell’s problem: “ChatGPT would get stuck. You’d be optimising one particular script, but ChatGPT would forget the prior or subsequent data structures. By fiddling with one script, you would then make the output incompatible with things happening downstream.”

Following my recommendation, Scannell switched to Claude Opus 4.5. Over the past six weeks, the difference has been immediately apparent.

“I found Claude much better at structuring the workflow as a whole, so that you’ve got sensible intermediate data files and scripts that were interoperable and which made it easy to do the stepwise analyses,” he explains. “It seemed to be able to hold that structure much better so that you didn’t have this problem where you edit one script and everything falls over.”

Beyond structural integrity, Claude demonstrated domain understanding: “I found Claude actually very good and thoughtful around the practical issues of data analytics and bioinformatics and running biological datasets and what the pros and cons are. It seemed to know the academic literature quite well, as well as simply being able to write code.”

The Venn diagram of understanding

This touches on a critical concept we often discuss: the three-way intersection of understanding the data, understanding the computational engines and understanding the science. Claude appeared to operate across all three dimensions.

“You could have quite thoughtful conversations with Claude about scientific pros and cons and technical pros and cons and how these things interact,” Scannell notes. “There may be three or four different ways of solving a given problem, which may have different pros and cons – this one may be the quickest and easiest to implement but won’t be as scientifically rigorous, or this one could be very rigorous but complicated and difficult. Claude seemed to be quite good at navigating those kinds of choices and helping me decide which path to choose.”

The practical workflow

Scannell’s working method was surprisingly straightforward. He used the Claude web interface or app, having the R console open separately. Claude would produce R code files, which he’d download and move into his R scripts folder before executing. Scannell notes: “My cutting and pasting between Claude and R was relatively quick compared to Claude’s thinking time – and compared to the time the R scripts would take to run.”

My cutting and pasting between Claude and R was relatively quick compared to Claude’s thinking time – and compared to the time the R scripts would take to run.

His prompting approach evolved but became relatively systematic: “Here’s the kind of analysis I want to do. Here are the datasets I want to exploit. Here are some academic papers that show you what the dataset is and the sorts of analyses that can be done on it. I’d like you to make it stepwise and modular so we can fix what’s going on as we go through it. Now give me a proposal about what I should do.”

Claude would respond with a multipage plan. Then: “What packages do I need to download? How do I check that I’ve got the software installed that I need to do this? Let’s run some diagnostics on my setup.”

Critical lessons: planning, structures and validation

Three key learnings emerged from Scannell’s six-week crash course:

Upfront planning matters – “I think you should think quite hard about what you’re trying to do and what a sensible workflow would be,” Scannell emphasises. “I sort of let the machine design it a bit too much initially. I think I could have been a bit more proactive there.”
Data structures are fundamental – “I didn’t think enough about data structures at first,” he admits. “In the old days when I was taught how to program, it was drilled into you to think about the data stuctures. This time, I didn’t do enough of that early on.”
Build validation from the start – This was perhaps the most important insight. When analysing genomic datasets for disease associations, a null result could mean either “there’s nothing there” or “my analysis was wrong.” Scannell’s solution: “I’d always have a set of positive genes – ones I know were strongly associated with the disease and some negative ones – genes that should not be associated. So I’d always have positive and negative controls so I would get a sense as to whether I was getting sensible-looking output. If the positives weren’t positive or the negatives weren’t negatives, I knew something had gone wrong.”

Data diagnostics before data analysis – “By the end of the project, I’d realised that the first thing you want to do is to automate the process to characterise the data before you try to analyse it.” This is because the bioinformatics datasets he was working with – GWAS data – proved surprisingly non-standard across different analyses. Building upfront data standardisation and diagnostic scripts became essential. “By the end of the project, things were running much quicker and more effectively because the analytical scripts knew what kind of data structures and data annotation they should expect.”

The deeper transformation

The immediate benefit was access to valuable information. “I found some really interesting stuff that I wouldn’t have seen if I just looked at academic papers,” Scannell notes. “It’s genuinely the case that there’s lots of useful stuff in these public genomic datasets that doesn’t really make it into academic papers. Academic papers are no longer a good medium for communicating the information being collected in large population studies of genetics or bioinformatics.”

But something more profound happened: actual learning through doing.

“I’ve read quite a lot of genomics literature over the years, but I think by trying to do it myself, I understand it much better than I did,” Scannell reflects. “Of course, I have found out things about the genetics of some drug targets that I could not have easily found out in other ways, but I will also understand more of the genomics papers I read in future. “

This is learning from the inside rather than the outside. “Lots of things that I never bothered understanding, or that I imagined I understood – some of them I now understand and some of them I realise I don’t understand. But doing that through experience is different from just reading papers.”

The six-week scientist-bioinformatician

In summary: a scientist with decades of experience but sporadic coding exposure gained practical bioinformatics capabilities in six weeks. Not through formal training, not through months of tutorials, but through strategically leveraging AI tools to overcome activation energy barriers.

Scannell is appropriately modest about his transformation: “Some people spend 40 years doing genomics; I spent a couple of weeks.” But the reality is significant: “Now at least I have some practical experience of genomics-based bioinformatics and I can get more if I want. A lot of the standard stuff – and mostly you’re going to want to do the standard stuff – is not terribly difficult once the activation energy of creating the tools has been reduced.”

This is the real message for the pharmaceutical and biotech community: the barrier between understanding what computational analyses could tell you and actually performing those analyses has fundamentally shifted. The activation energy hasn’t disappeared – you still need to think carefully about workflows, data structures and validation – but it has dropped from ‘insurmountable’ to ‘achievable in weeks.’

For an industry increasingly dependent on genetic validation, large-scale data analysis and computational approaches, that’s a transformation worth understanding.

About the author

Dr Raminderpal Singh

Dr Raminderpal Singh is a recognised visionary in the implementation of AI across technology and science-focused industries. He has over 30 years of global experience leading and advising teams, helping early- to mid-stage companies achieve breakthroughs through the effective use of computational modelling. Raminderpal is currently the Global Head of AI and GenAI Practice at 20/15 Visioneers. He also founded and leads the HitchhikersAI.org open-source community and is Co-founder of the techbio, Incubate Bio.

Raminderpal has extensive experience building businesses in both Europe and the US. As a business executive at IBM Research in New York, Dr Singh led the go-to-market for IBM Watson Genomics Analytics. He was also Vice President and Head of the Microbiome Division at Eagle Genomics Ltd, in Cambridge. Raminderpal earned his PhD in semiconductor modelling in 1997 and has published several papers and two books and has twelve issued patents. In 2003, he was selected by EE Times as one of the top 13 most influential people in the semiconductor industry.