Solving the disconnect between lab and data scientists: part 2

Share via

Posted: 26 June 2025 | Dr Raminderpal Singh (Hitchhikers AI and 20/15 Visioneers) | No comments yet

As the lab–data science divide continues, Ian Kerman looks ahead to a future of deeper collaboration – one where shared skills, smarter tools and a shift in mindset could finally break down the barriers. In this second interview, he shares his vision, practical ideas and advice for the next generation of scientists.

Anonymous Medical Researcher Working in Modern Office on a Desktop Computer with Molecular Modelling Software and Source Code on Screen. Biologist Designing New Drug Molecules to Combat Bacteria

In the rapidly evolving landscape of life sciences, the communication gap between laboratory scientists and data scientists remains a significant challenge. This ‘silo problem’ can impede innovation and slow down research progress. Ian Kerman, Data Science and AI Solutions Architect at Certara and co-chair of the Society for Laboratory Automation and Screening (SLAS) Data Science & AI Topical Interest Group, is working to bridge this divide. In this second of two interviews, he shares his vision for the future of collaboration between lab and data scientists.

A vision for the future

Raminderpal Singh (RS): Looking 3-5 years ahead, what would be your ideal vision for this relationship between lab scientists and data scientists?

Ian Kerman (IK): Having worked in the life sciences industry for a long time, I’m sure you’re familiar with the concept of silos – data silos and other types of organisational barriers. There are both logical and physical barriers that can prevent collaboration and hinder data integration.

While I try to avoid corporate jargon like ‘breaking down silos,’ I genuinely believe that’s precisely what needs to happen for data science and life science professionals to collaborate more effectively.

access your free copy

Biomarkers are redefining how precision therapies are discovered, validated and delivered.

This exclusive expert-led report reveals how leading teams are using biomarker science to drive faster insights, cleaner data and more targeted treatments – from discovery to diagnostics.

Inside the report:

How leading organisations are reshaping strategy with biomarker-led approaches
Better tools for real-time decision-making – turning complex data into faster insights
Global standardisation and assay sensitivity – what it takes to scale across networks

Discover how biomarker science is addressing the biggest hurdles in drug discovery, translational research and precision medicine – access your free copy today

Does the data scientist need to be physically in the lab? Not necessarily, but I don’t think it would hurt either. From personal experience, I believe I’m a more effective data scientist in the life sciences industry because I gained laboratory experience as an undergraduate and in graduate school.

That experience gives me insight into how experiments are run, how data is collected, why data is messy, and why there are missing values. I understand what happens in the lab, which helps on the data science side because I can better appreciate the work that goes into collecting the data. This gives me a deeper understanding of why the data looks the way it does and how to handle it. As a life scientist, you have to deal with missing data – whether by conducting further experiments or through extrapolation – and taking those concepts into data science is valuable.

The evolution of scientific data analysis

RS: In the past, scientists often did their own data analysis using tools like R or spreadsheets with macros. What are your thoughts on what might have been lost with increasing specialisation and the potential to bring some of that integration back?

IK: From what I’ve observed, there’s significant potential for that integration to return. Far more coding now happens in labs than when I was in school. Languages like Python and Go are not only powerful but relatively easy to learn. It has become simpler for someone in the lab without formal coding training to create scripts that meet their specific needs.

Even if these scripts don’t adhere to formal coding standards or lack proper documentation – things software developers prioritise – they work for the scientists and provide access to the data they need. Tools like Jupyter Notebooks can easily generate the plots and graphs they require.

What I’m less sure about is whether this integration continues when students transition from academia to industry. Is there a silo effect where lab work and coding remain separate? Or is it simply that scientists in industry are so busy running experiments and other lab responsibilities that they don’t have time for their own data analysis?

Company size is also a factor. Larger organisations may have policies regarding which groups handle specific tasks, whereas in smaller companies, everyone might do a bit of everything. Early in my career, I worked for a biotech startup with only seven to eight employees, and it was definitely that kind of environment where everybody contributed across different areas.

The role of AI and LLMs

RS: You mentioned large language models (LLMs) earlier. Do you see potential for tools that provide prompts allowing lab scientists to create code without having to write it themselves?

IK: I think there’s definite potential there, and it’s already happening. For example, Anthropic has positioned itself well in this space with Claude Code, where you can feed it your GitHub repository and it will analyse your codebase, helping you develop further code that matches your style and utilises your existing classes.

I can absolutely envision something like that paired with Jupyter notebooks, where everything executes directly in your browser. There are two different areas of expertise required here: understanding the code to ensure it’s doing what you want (because an AI might generate code to calculate statistical tables, but you need to verify they’re the right statistics) and also prompt engineering – giving the right commands and phrasing things effectively. This represents another skill we’re asking laboratory scientists to develop.

The question becomes: is there a lower-friction approach to help lab scientists without requiring them to learn coding? Reviewing code is far easier than writing it, especially with tools that check syntax and other elements automatically.

RS: What about AI assistants that could help with gathering additional data or metadata?

IK: That’s a fascinating idea I hadn’t considered. It reminds me of algorithms and machine learning models used in design of experiments (DOE), which suggest the next experiment to run in order to fill knowledge gaps. I could absolutely see something similar from a data perspective, where an AI model could suggest different experimental conditions or additional data that should be collected to enhance the overall dataset.

Even simpler implementations could be effective – like a chatbot that prompts scientists with questions such as, ‘Did you collect this?’ or ‘Did you collect that?’ based on commonly useful parameters. It could serve as a virtual data scientist in the lab or a virtual lab scientist in the data centre.

I think it could be relatively straightforward to implement a virtual lab scientist for data centres, as much of that would involve Q&A about experiment details and methodologies. The reverse might require more training and direction to be effective.

Advice for early career scientists

RS: What advice would you give to graduate students or early career professionals entering this field?

IK: Not to sound too corporate, but networking and meeting people – especially beyond your immediate circle – is invaluable. Connecting with people in different departments or groups who have diverse perspectives will help you think in new ways and enhance your innovation capabilities.

Along those same lines, become really skilled at your primary discipline – whether that’s laboratory science or data science – and then cross-train. This could mean practical experience, like lab scientists doing coding projects or data scientists spending time in a lab. Alternatively, it could be as simple as watching educational videos to familiarise yourself with the terminology and methods used by professionals on ‘the other side of the wall.’

It’s generally easier to gain additional experience in data science because much of the work can be done on a computer – such as downloading datasets from platforms like Kaggle, building models and participating in challenges. Gaining lab experience is more difficult without institutional access, which is why I’d encourage computer science students interested in pharma or life sciences to intern or volunteer in campus labs early in their education. That hands-on experience helps create a more well-rounded professional.

Meet the interviewee

Ian Kerman is a data science and AI client solutions architect at Certara, where he leads initiatives at the intersection of artificial intelligence, data science and life sciences. With over 15 years of industry experience, Ian brings deep expertise in machine learning, MLOps and scientific informatics, helping life sciences organisations translate complex data into actionable insights. At Certara, he spearheads advanced R&D efforts in large language models, user experience design and integrated biological and chemical data knowledge systems.

Before joining Certara, Ian held leadership roles at LabVoice and BIOVIA (Dassault Systèmes), where he led cross-functional teams to deliver AI-powered solutions, voice-enabled lab assistants, and custom data platforms for pharma and biotech customers. His work has contributed to innovations in computational drug discovery, antibody developability prediction and laboratory automation.

Ian is also an experienced educator and advocate for scientific collaboration. He has developed and delivered technical training programmes, mentored students on AI-focused research projects, and co-founded the Data Science and AI Topical Interest Group with the Society for Laboratory Automation and Screening (SLAS). A frequent speaker at industry conferences, Ian combines technical depth with a passion for advancing AI in the life sciences.

Ian earned an MS in computer science, focusing on machine learning, from the Georgia Institute of Technology, as well as an MS in biology, alongside undergraduate degrees in bioinformatics and molecular biology, from the University of California, San Diego.

Related topics
Artificial Intelligence, Assays, Lab Automation

Related organisations
Certara, Hitchhikers AI and 20/15 Visioneers

Cookie	Type	Duration	Description
cookielawinfo-checkbox-advertising-targeting	persistent	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	persistent	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	session	1 year	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	persistent	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	session	1 year	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Type	Duration	Description
advanced_ads_browser_width	persistent	1 month	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	persistent	2 years	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	persistent	1 month	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	persistent	1 year	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	persistent	2 years	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	persistent	2 years	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	persistent	3 months	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	persistent	1 month	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	persistent	5 months	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
bcookie	persistent	2 years	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	persistent	30 minutes	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	session	1 year	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	persistent	1 day	This cookie is set by LinkedIn and used for routing.
lissc	persistent	11 months	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	persistent	2 years	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	persistent	2 years	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	persistent	20 minutes	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	persistent	20 minutes	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	persistent	20 minutes	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	persistent	2 years	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	persistent	1 minute	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	persistent	1 day	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Type	Duration	Description
cf_ob_info	persistent	1 minute	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	persistent	1 minute	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	session	1 year	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	persistent	1 month	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	persistent	Until cleared	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	session	1 year	This cookie is set by Youtube and is used to track the views of embedded videos.

Recommended

Solving the disconnect between lab and data scientists: part 2

A vision for the future

Raminderpal Singh (RS): Looking 3-5 years ahead, what would be your ideal vision for this relationship between lab scientists and data scientists?

Biomarkers are redefining how precision therapies are discovered, validated and delivered.

The evolution of scientific data analysis

RS: In the past, scientists often did their own data analysis using tools like R or spreadsheets with macros. What are your thoughts on what might have been lost with increasing specialisation and the potential to bring some of that integration back?

The role of AI and LLMs

RS: You mentioned large language models (LLMs) earlier. Do you see potential for tools that provide prompts allowing lab scientists to create code without having to write it themselves?

RS: What about AI assistants that could help with gathering additional data or metadata?

Advice for early career scientists

RS: What advice would you give to graduate students or early career professionals entering this field?

Leave a Reply Cancel reply

Recommended

Solving the disconnect between lab and data scientists: part 2

A vision for the future

Raminderpal Singh (RS): Looking 3-5 years ahead, what would be your ideal vision for this relationship between lab scientists and data scientists?

Biomarkers are redefining how precision therapies are discovered, validated and delivered.

The evolution of scientific data analysis

RS: In the past, scientists often did their own data analysis using tools like R or spreadsheets with macros. What are your thoughts on what might have been lost with increasing specialisation and the potential to bring some of that integration back?

The role of AI and LLMs

RS: You mentioned large language models (LLMs) earlier. Do you see potential for tools that provide prompts allowing lab scientists to create code without having to write it themselves?

RS: What about AI assistants that could help with gathering additional data or metadata?

Advice for early career scientists

RS: What advice would you give to graduate students or early career professionals entering this field?

Optimizing antibody leads in early drug discovery with key developability insights

Eye movements as objective biomarkers: accelerating CNS drug development

HIV antibody 04_A06 almost neutralises all strains

Advancing gene editing: the role of lipid nanoparticles in CRISPR delivery

Scientists track amyloid plaques in living mice for first time

Leave a Reply Cancel reply