Data Driven Research and the Next Generation of Informatics
 The wave of new and emerging technologies is significantly impacting the data capabilities and utilization of data in life sciences and drug development research. With the increasing public popularity of wearable devices and mobile health applications, coupled with growth in the use of social media, more data streams are available to researchers looking to extract meaningful information. As companies like Apple and Google enter directly or collaboratively into the life sciences market with direct-to-consumer products, services and partnerships (e.g. 23andme, ResearchKit), the potential for gathering vast amounts of data grows even further. While this wealth of data offers more opportunity for scientists to better understand individual patient stratification, and to better prepare therapeutically, it also poses a significant data management challenge.
The wave of new and emerging technologies is significantly impacting the data capabilities and utilization of data in life sciences and drug development research. With the increasing public popularity of wearable devices and mobile health applications, coupled with growth in the use of social media, more data streams are available to researchers looking to extract meaningful information. As companies like Apple and Google enter directly or collaboratively into the life sciences market with direct-to-consumer products, services and partnerships (e.g. 23andme, ResearchKit), the potential for gathering vast amounts of data grows even further. While this wealth of data offers more opportunity for scientists to better understand individual patient stratification, and to better prepare therapeutically, it also poses a significant data management challenge.
A critical factor is that irrespective of how data is sourced, be it from a scientific journal, an Electronic Medical Record (EMR), a social media post or a wearable device, the data can only be analyzed effectively if it has been semantically organized. After all, what’s the use in having all the information in the world if you can’t make sense of it? This is where the next generation of informatics solutions comes in. To derive actionable information from this sea of data, life sciences and pharmaceutical companies need a clear data management strategy for harmonizing data in a way that supports this critical need.
Big data – from real world to real time
In pharmaceutical and life science R&D;, big data is seen more and more simply as today’s data. In addition to the newer data sources mentioned above, consider just some of the typical streams, such as EMRs; genomics and screening data; clinical trial data; mobile diagnostics and monitoring. The compounding growth of information should lead one to recognize that data is transitioning from ‘real world’ to ‘real time’, as data becomes available directly from the individual.
The challenge is in the diversity and quality of the data. Data is captured with differing levels of detail, with little coordination and few standards. As a result, trust levels in the information vary, with incomplete, unconfirmed, expired or retracted information often found. Added to this is the speed in which data is being generated at the individual level. This lack of consistency, coupled with the velocity of generation, leads us to a major challenge; how do we utilize the data meaningfully and quickly to make confident clinical drug development decisions, or predict outcomes of research or therapies with accuracy?
New strategy needed – apply within
Given that getting a single drug to market costs on average over $2 billion dollars and takes up to 15 years, researchers must be able to gain trusted insights from data that maximize risk mitigation. Pharmaceutical and life science companies need a new mindset that approaches data with a view to harmonize it across the enterprise; while consuming and parsing new data near constantly, as well as consistently. This collective and consistent approach to data has the potential to inform research, but when left in inaccessible silos it is of little use.
Many organizations rely on manual curation of their data, with individual scientists and research functions looking for answers in the data in different ways and at different times. Therefore, harmonizing data will allow researchers to cross-connect across different platforms, use similar algorithms to search different data sources and make connections between seemingly unrelated information. This harmonizing of data should begin right at the point of collection, where it is vital that the context of that data is retained. Only then can it be distilled into something useful to researchers.
Life sciences and technology – the grey area
The need to harmonize data not only applies to the multiple sources generated and held internally, but also to external, third-party and commercial data – where new technologies are adding to the info deluge. We are already seeing pharmaceutical companies make great strides in using technology in research, for example Roche’s use of smartphones to gather continuous data on participants in a Parkinson’s clinical trial. Yet as companies like Google and Apple begin to muscle into traditional life science territories they face a significant challenge. Pharmaceutical companies don’t want to be technology companies, but Apple and Google have raised the general public’s expectations of rapid advancement. The challenge is how utilization of data can grow in parallel to the generation of data.
The public’s understanding is that their smartwatch, for example, can be sending critical real time data about a patient with a specific disease and drug therapy to an important source. This data can be captured and viewed over time, but can also be utilized in real time. The public, in the ‘Google’ age, blur the differentiation between making that data available, and making that data actionable. Therefore they expect the recipients, from pharmaceutical companies to care providers, be able to do something with that data. And the complexity multiplies with the diversity of data and the broader population contributing. Google and Apple take a very broad-based approach to genotyping and phenotyping, and have begun to consumerize the process of DNA sequencing and screening. But it falls to pharmaceutical companies to really make sense of this massive, ancillary data set that is being generated.
Getting data into the hands of scientists
Ultimately, all of these new technologies and methods of gathering data need scientific expertise to organize information and make it accessible to researchers. Once collected and harmonized, semantic organization and analytics, combined with accurate text-mining, are the keys to unlocking the potential insights hiding in this sea of data. To organize information successfully and make it discoverable through next-generation informatics solutions, data must be governed by expertly designed taxonomies and ontologies. Through thoughtful taxonomies, data can be parsed and indexed in such a way that allows researchers to make novel associations and trends possible between disparate data sets.
Extracting meaningful answers via next-generation informatics solutions will enable researchers to make confident, data-driven decisions. By making data more usable, researchers can efficiently derive insights from it; those insights then become actionable and can influence business critical decisions and further alert businesses when those decisions need to be re-evaluated. The prospective power of data in the hands of life science researchers is enormous – given the right tools, researchers can use that data to not only boost productivity, but, potentially, discovery.
Topics

 
  




