Scientific data management: the core of quality

Castelnovo, Roberto

Scientific data management: the core of quality

3

SHARES

Share via

Posted: 26 March 2020 | Roberto Castelnovo (NL42 Consulting) | No comments yet

Gone are the days where mere data is considered king. We now understand the real value is in proper data management; but how best to harness its value? This article addresses current market options for companies wanting to extract every ounce of knowledge from raw data.

COMPANIES use the word “quality” to communicate that their products or services can satisfy their customers’ needs both at the time of purchase and long into the future. For those companies focused on delivering high-level products such as medicines, quality represents the ability to deliver continuously at the highest standards, so that trust is built with consumers. This article focuses on the core of the quality concept: the scientific data.

Scientific data

Nowadays, all companies must rely on the data produced throughout the product lifecycle, from R&D to production. The data can help inform companies, at any phase of product development and delivery, as to whether the processes are reliable, reproducible and repeatable. The scientific data are generated in various departments including R&D and quality operations, as well as others, which helps companies identify short-term issues, trends, anomalies and potential risks. Too often, however, companies simply store scientific data and only react to issues highlighted as “deviations”, yet continuous analysis of scientific data is key to anticipate the difficulties, correct processes and improve reliability.

The limited use of software systems generates gaps in the entire information ecosystem and consequently, data are not visualised as a whole picture. Only by interrogating different systems (often manually) can deviations be linked to a root cause, and this requires sufficient time to investigate properly. The reality is that processes are delayed, product delivery stops and efficiency is compromised.

One single system, perhaps composed of different software applications, can be the data management solution for scientific data. This requires sophisticated data analysis tools that can search in different systems in order to generate trends and identify root causes by aggregating information stored in various databases.

The use of state-of-the-art applications is a cornerstone of this platform. Too often, however, systems are chosen based solely on their ability to deliver good functionalities. This is still important, as day-to-day activities should be handled properly, efficiently and in a way that reduces the human effort required to store the information. Nevertheless, the system structure and IT architecture are critical aspects. Modern systems are more open, flexible and agile, while older systems often experience technical difficulties with integration and data aggregations. We are still encountering situations where customers are using applications designed more than 20 years ago. Their implicit limitations are the limiting factor for any real use of the scientific data.

Internet of Lab Things

The second aspect that is proving to be a limiting factor for good quality is the reduced or non- existent implementation of the concept of Internet of Lab Things (IoLT). This concept is the laboratory interpretation of the well-known concept of Internet of Things (IoT), which is now widely used even in our private lives. The ability to collect raw data from the source, with the proper level of quality, is key to enable the creation of a solid ecosystem of scientific data.

Most scientific data are still manual transcripts from the source to the first application of infrastructure. The intrinsic limited quality created by manual actions prevents the creation of a true digitilised quality management environment. The amount of time spent manually logging the information, checking this information and then reviewing it to detect errors that are fatally part of manual processing significantly reduces the ability to provide solid information to data consumers. However, not only can laboratory instruments now be connected, but any other device can too – this is essential in order to generate quality. Companies need to learn how to automate data collection, even from old devices.

The correct use of scientific data hinges on the ability to collect the raw data. Most quality departments still rely on manual activities. The instrument interfacing, automation of data collected from sensors and use of barcodes is still very limited, largely due to the cost of implementing these technologies. Too often the costs associated with the time spent by the employees is hidden or not properly evaluated. A fraction of these costs could be invested in modernising the instrumentation or implementing software systems that could read the raw data. In some instances, there is a feeling that the introduction of IoLT may reduce the ability to control the data; however, the reality is that a solid software solution is a more reliable option than implementing double or triple checks by human beings.

Data standardisation

The ability to collect raw data from the source, with the proper level of quality, is key to enable the creation of a solid ecosystem of scientific data”

The final element in the equation is data standardisation. Scientific data management requires a structured approach to standardisation. The wealth of information generated in quality departments is useless if it cannot be transformed in clustered data to be analysed and investigated to help businesses make the decisions that will allow them to become more efficient and ultimately more successful. Research projects or production activities can only be improved by efficiently using their scientific data. If the universe is based on tonnes of numbers, experiments and annotations that cannot be somehow linked, it will remain an unexplored vastness. The ability to categorise information is key to achieving the goal of associating scientific data from different departments, research studies or physical locations.

Internally, companies have realised that a common vocabulary is an essential element of data management. Only by creating naming conventions that are valid throughout the entire organisation, is it possible to generate meaningful information. The chaotic situation resulting from only storing data does not generate the beneficial business value from information systems. The creation of true searchable systems allows companies to make the correct business decisions related to quality.

Conclusion

The technological improvements in the IT market are visible to all of us. We can use technologies that were unimaginable just several years ago. We can now store in any device a quantity of information that is thousands of times larger than 20 years ago. The power of information remains one of the greatest opportunities for all companies. Everything starts from the very beginning of the data lifecycle, so proper data collection is key. There is no other way to start the process to achieve the right quality than by taking data from the source. Modern systems are now an essential part of the equation. IT solutions have evolved in all aspects in just a few years, and consequently it is no longer an option to rely on old fashioned systems. Technological updates are necessary to be ready for the future. An internal effort to create standards generates the ability to aggregate, analyse and bring value to the company.

About the author

Roberto Castelnovo has a degree in Computer Science from Milan University and has worked for 30 years in laboratory information management. He built a large body of experience in managing complex sales and services organisations to provide solutions at international level and had the opportunity to work for several years in multi-cultural and global environments. In 2013, Roberto cofounded the consulting firm NL42 to provide specialised services dedicated to paperless projects, specifically focused in the areas of operations, quality assurance and quality control. In 2016, NL42 acquired the rights of the international event Paperless Lab Academy^® and organises an annual European edition.

Related topics
Analysis, Big Data, Informatics, Lab Automation, Technology

Cookie	Type	Duration	Description
cookielawinfo-checkbox-advertising-targeting	persistent	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	persistent	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	session	1 year	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	persistent	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	session	1 year	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Type	Duration	Description
advanced_ads_browser_width	persistent	1 month	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	persistent	2 years	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	persistent	1 month	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	persistent	1 year	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	persistent	2 years	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	persistent	2 years	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	persistent	3 months	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	persistent	1 month	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	persistent	5 months	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
bcookie	persistent	2 years	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	persistent	30 minutes	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	session	1 year	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	persistent	1 day	This cookie is set by LinkedIn and used for routing.
lissc	persistent	11 months	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	persistent	2 years	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	persistent	2 years	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	persistent	20 minutes	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	persistent	20 minutes	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	persistent	20 minutes	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	persistent	2 years	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	persistent	1 minute	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	persistent	1 day	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Type	Duration	Description
cf_ob_info	persistent	1 minute	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	persistent	1 minute	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	session	1 year	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	persistent	1 month	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	persistent	Until cleared	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	session	1 year	This cookie is set by Youtube and is used to track the views of embedded videos.

Recommended

Scientific data management: the core of quality

Scientific data

Internet of Lab Things

Data standardisation

Conclusion

About the author

Leave a Reply Cancel reply

Recommended

Scientific data management: the core of quality

Scientific data

Internet of Lab Things

Data standardisation

Conclusion

About the author

Smarter labs, faster results: The power of automation in drug discovery

Bird flu is changing – AI might help us keep up

Solving the disconnect between lab and data scientists: part 2

Scientists discover enzymes that redefine glycan pathways

Rethinking antibody discovery in the age of automation

Leave a Reply Cancel reply