Data Science for Healthcare & Life Sciences

Digital transformation carried out in the last decade has paved the path for new business models that necessitates leveraging “data science” as a tool to build strategy. New-age business challenges require utilizing data, computation power, and technology breakthroughs to modernize their traditional business models and consider Data as the new Gold.

Data science is playing a pivotal role in redefining the life science businesses starting from discovery phases to commercial activities. Diverse data sources, humongous volume, and varied data types such as text, audio, images, and video, demand the adoption of numerous available algorithms and models for data processing and analysis to aid decision making.

Image

FIG 1: PHARMA INDUSTRY VALUE CHAIN

The use cases illustrated below are a few examples where the benefits of Data Science tools and techniques can be leveraged in the healthcare industry.

Image

FIG 2: SAMPLE USE CASES 

Data science brings innovative solutions for every stage in the business life cycle. The repertoire of data and modeling techniques provides the opportunity to fasten drug discovery, literature mining, adverse events prediction, and even personalized medications using EHR and genomics data. Manufacturing and distribution can be benefitted from use cases like predictive maintenance, demand forecasting, and fraud analytics in the supply chain, etc. Clinical and pharmacovigilance studies can be supported using sentiment analysis, patient stratification, information extraction, and analysis from ADR systems, etc. Marketing and sales activities can also use tailored marketing, finding key opinion leaders, cohort analysis, etc. For workforce analytics, attrition analysis, regional and personal risk scoring, lifetime prediction can aid value to a healthcare firm.  


Details pertaining to a few of these use-cases:
1.    Adverse Events Prediction: Predicting and preventing ADRs before clinical trials in the early stage of the drug development pipeline can help to enhance drug safety and reduce financial costs. Analysis of EHR, clinical, social media feeds and literature data along with targets or pathways or side effects profiles or structural details from various resources could create a data science-related comprehensive analysis and an effective pipeline for AE prediction.

2.    Product Safety or Sentiment Analysis: It can act as an early warning signal for pharmaceutical companies about product safety issues and public sentiments about the product or company. ML models can be constructed to analyze historical as well as live streaming social media feeds and web scraping data for sentiment analysis. 

3.    Pharmacovigilance in Phase IV or Post – Marketing Survey: Surveillance of spontaneously reported adverse events continues as long as a product is marketed but whether the reported AE or claims are valid/not remains questionable. Machine Learning-based classification models can offer a solution that includes data acquisition, integration, and unstructured data extraction from AER tools, emails, telephonic conversation related text data.

Drug Repurposing 

Drug repositioning of failed or existing drugs near patent expiration can provide other tremendous benefits like >90% reduction in development cost, 50% reduction in time, and 1000 times better success rate. Most drugs failed in clinical studies due to a lack of efficacy or unexpected toxicities. Reasons attributed to inadequate understanding of drug action due to the complexity of human disease biology. Identifying new diseases for existing failed or expired drugs needs a system biology approach. 

Our solution offers automated data ingestion and integration using drug identifiers from pharma, partners, CRO, third-party, and authenticated open data sources. Phenotypic, therapeutic, structural, and genomic details

were used to construct numerical features using statistical measures. Our machine learning pipeline offers easy integration with the client’s proprietary and external data with a dynamic selector for the best algorithms. It generates a comprehensive drug-drug interaction network overlaid to the drug-disease network to compute the confidence score for the mapped diseases. Readily available dashboards provide a quick exploration of drug descriptors, most similar drug and top recommended diseases network.

Image

FIG 3: DRUG REPURPOSING FOR PHARMA

Fraud Analytics - Health Insurance Claims

Healthcare fraud is a potential problem for payers or health insurance providers. It usually involves filling of falsified or dishonest healthcare claims such as wrong billing for services, duplicate submission of a claim, misrepresenting the service, charging more than actual services provided. 
These frauds may involve a network of people including caregivers and beneficiaries which can be investigated using network analysis to find the associated fraud influencers and their behaviors. In-depth multidimensional analysis is required to explore the provider, beneficiary, claims, diagnosis, and procedure level details.

Image

FIG 4: MULTILEVEL ANALYSIS - HEALTHCARE FRAUDS

The entire journey map of the patients from their demographics to the claim filing can be analyzed using anomaly detection, sequential or time trajectory analysis, clustering, and network-based techniques.

Image

FIG 5: PATIENT JOURNEY - DEMOGRAPHICS TO CLAIM FILING

Risk Scoring 

Risk profiling for the population can be done based on high, medium, and low-risk geographical zones & industrial sectors. In this Covid era, this could be useful in resource allocation and planning for various businesses. 

Pharma industries can use this to plan distribution and priority settings for a product, patient, and care providers’ analysis. Health insurance companies can use people demographics such as; age (crucial factor as the report says that age >60 having high mortality rates), location, occupation (health workers, essential service providers, etc.), travel history, comorbidity, or past disease profiles (comorbidity with cardiac, diabetes are at high risk) and other factors for assessing the probability of future claims and predicted amount to be paid for future claims due to Covid19 pandemic. Even claims triage management can be assisted given a close linking and large dependency on the spread of the virus in the region over the coming months.

Image

FIG 6: RISK SCORE PREDICTION FRAMEWORK