Part four: an industry leader’s perspective on managing data quality

Share via

Posted: 24 September 2024 | Dr Raminderpal Singh (Hitchhikers AI and 20/15 Visioneers) | No comments yet

In this four-part series, Dr Raminderpal Singh discusses the challenges surrounding limited data quality and offers some pragmatic solutions. In this fourth article, he talks to John Conway, Chief Visioneer Officer at 20/15 Visioneers for an expert perspective.

In the first article in this series, published on Wednesday 14 August, the significance of data quality for the effectiveness of data analyses by machine learning (ML) and AI was discussed. In this article, John Conway, Chief Visioneer Officer at 20/15 Visioneers¹ and industry leader, shares his perspectives on addressing data quality issues in the life sciences industry with Dr Raminderpal Singh. Read on to discover their key discussion points.

The state of data quality in drug discovery

In the realm of drug discovery, data quality is paramount. However, despite the advancements in technology and data generation tools, the quality of scientific data has not kept pace. Conway notes that while the scientific community has made significant strides in generating large volumes of data through high-throughput technologies, issues such as missing metadata, poor contextualisation, and inconsistent data management practices continue to plague the field.

The rapid increase in data generation has also led to a proliferation of what Conway refers to as “cheap data,” which is often collected without the necessary rigour to ensure its long-term usability. This has resulted in a landscape where vast amounts of data are underutilised or rendered useless due to a lack of proper management, leading to wasted resources and delayed drug discovery processes.

Addressing data quality challenges

Improving data quality in drug discovery requires a comprehensive approach that addresses the technical, procedural, and cultural aspects of data management. See below for strategies to tackle these challenges:

Adopting a unified data strategy

A critical first step is for organisations to develop and implement a unified scientific data strategy, which should ensure that all data generated within the organisation is findable, accessible, interoperable, and reusable (FAIR). By aligning all departments and research teams to a common set of data management standards, organisations can prevent the fragmentation and inconsistency that often arise when different teams operate in silos.

Investing in data governance

Effective data governance is essential to maintaining high data quality. This involves setting up governance structures that oversee the collection, management, and use of data across the organisation. Data governance should include the creation of standardised protocols for data capture, metadata generation, and data storage, ensuring that data is managed consistently and with the necessary context to be useful in the future.

Automating metadata capture

One of the significant challenges in data quality is the manual effort required to capture metadata. Conway points out that scientists are often too busy to dedicate extra time to this task, which can lead to incomplete or inaccurate metadata. To address this, organisations should invest in technologies that automate the capture of metadata at the point of data generation. By embedding metadata requirements into the design of experiments and data capture systems, organisations can ensure that data is accompanied by the necessary contextual information without placing additional burdens on researchers.

Implementing Standard Operating Procedures (SOPs)

Standard Operating Procedures (SOPs) are essential for ensuring consistency in data generation and management. These procedures should be designed to integrate seamlessly with the scientific workflow, providing clear guidelines on how data should be captured, stored, and analysed. SOPs should also be flexible enough to accommodate the specific needs of different research projects while maintaining a consistent approach to data quality.

Encouraging reproducibility through rigorous methodology

Reproducibility is a cornerstone of scientific research and improving data quality is key to achieving it. Organisations should prioritise rigorous experimental design and methodology, ensuring that all experiments are conducted in a way that enables accurate replication. This includes thorough documentation of experimental conditions, data collection methods, and any variables that may affect the outcomes.

Leveraging technology for data quality control

Advances in artificial intelligence (AI) and machine learning (ML) can play a significant role in improving data quality. These technologies can be used to monitor data in real time, flagging potential issues such as inconsistencies, missing metadata, or errors in data capture. By integrating these tools into the research process, organisations can proactively address data quality issues before they impact the research outcomes.

Continuous training and development

Finally, ensuring data quality requires ongoing education and training for all members of the research team. This includes not only formal training in data management best practices but also continuous professional development to keep pace with new technologies and methodologies. By fostering a culture of learning and adaptation, organisations can ensure that their teams are equipped to maintain high standards of data quality in an ever-evolving scientific landscape.

Reference

¹https://www.20visioneers15.com/

About the author

Dr Raminderpal Singh

Dr Raminderpal Singh is a recognised visionary in the implementation of AI across technology and science-focused industries. He has over 30 years of global experience leading and advising teams, helping early to mid-stage companies achieve breakthroughs through the effective use of computational modelling.

Raminderpal is currently the Global Head of AI and GenAI Practice at 20/15 Visioneers. He also founded and leads the HitchhikersAI.org open-source community. He is also a co-founder of Incubate Bio – a techbio providing a service to life sciences companies who are looking to accelerate their research and lower their wet lab costs through in silico modelling.

Raminderpal has extensive experience building businesses in both Europe and the US. As a business executive at IBM Research in New York, Dr Singh led the go-to-market for IBM Watson Genomics Analytics. He was also Vice President and Head of the Microbiome Division at Eagle Genomics Ltd, in Cambridge. Raminderpal earned his PhD in semiconductor modelling in 1997. He has published several papers and two books and has twelve issued patents. In 2003, he was selected by EE Times as one of the top 13 most influential people in the semiconductor industry.

For more: http://raminderpalsingh.com; http://20visioneers15.com; http://hitchhikersAI.org; http://incubate.bio

Related topics
Artificial Intelligence, Drug Discovery

Related organisations
20/15 Visioneers, HitchhikersAI

Cookie	Type	Duration	Description
cookielawinfo-checkbox-advertising-targeting	persistent	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	persistent	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	session	1 year	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	persistent	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	session	1 year	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Type	Duration	Description
advanced_ads_browser_width	persistent	1 month	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	persistent	2 years	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	persistent	1 month	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	persistent	1 year	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	persistent	2 years	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	persistent	2 years	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	persistent	3 months	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	persistent	1 month	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	persistent	5 months	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
bcookie	persistent	2 years	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	persistent	30 minutes	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	session	1 year	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	persistent	1 day	This cookie is set by LinkedIn and used for routing.
lissc	persistent	11 months	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	persistent	2 years	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	persistent	2 years	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	persistent	20 minutes	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	persistent	20 minutes	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	persistent	20 minutes	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	persistent	2 years	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	persistent	1 minute	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	persistent	1 day	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Type	Duration	Description
cf_ob_info	persistent	1 minute	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	persistent	1 minute	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	session	1 year	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	persistent	1 month	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	persistent	Until cleared	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	session	1 year	This cookie is set by Youtube and is used to track the views of embedded videos.

Recommended

Part four: an industry leader’s perspective on managing data quality

The state of data quality in drug discovery

Addressing data quality challenges

Reference

Leave a Reply Cancel reply

Recommended

Part four: an industry leader’s perspective on managing data quality

The state of data quality in drug discovery

Addressing data quality challenges

Reference

AI-powered drug discovery: Accelerating the development of life-saving therapies

The biotech mapping thousands of hidden therapeutic clues

Why PARP inhibitors fail: key role of the CST complex in BRCA1-deficient cancers

New drug combo targets colorectal cancer mutation

$250K grant fuels development of new type 1 diabetes therapy

Leave a Reply Cancel reply