Data Science approach in healthcare

The HealthTech industry is immersed in so much data that institutions and doctors are overwhelmed. The question is: how do you generate actionable insights from all this data? How can doctors, who often feel more like data clerks than actual health care providers, optimize their workflow and improve accurate patient outcomes? And how can hospital systems extract value from data to improve operations and patient care? Star’s HealthTech Practice Lead and Expert Perry Simpson sat down with CTO Sergii Gorpynich to tackle these questions.

Star Data Science approach:

1. Data research or exploratory data analysis

Use a multi-disciplinary team to analyze data assets, business flows, data flows, as well as the client’s business model as a whole

2. Machine learning modeling

Create specific machine learning models focused on extracting meaningful insights from data

3. Converting models into software-driven solution

Converting ML models into reliable, scalable solutions with high usability and a great user interface

“Machine Learning, and Deep Learning AI toolsets can dramatically improve business processes within healthcare organizations. From patient diagnosis and prognosis to optimizing workflows and customer care, these technologies provide immense opportunity on par with the onset of personal computing.”

Sergii Gorpynich

CTO at Star

Sergii Gorpynich/CTO (Sergii): The healthcare system generates a ton of data same as any other industry, and everybody appreciates and understands that data is immensely valuable. But how do you extract and precisely define that value? These are open questions that concern executives.

We at Star have an approach for solving the problem of extracting and defining value, based on data. Our Data Science practice has a three-phase solution.

The first phase is what we call Data Research or Exploratory Data Analysis, and the idea is that we combine a multidisciplinary team of data scientists from Star or other organizations, as well as subject matter experts, typically from our clients. This team works together to analyze data assets, business flows and data flows, as well as the client’s business model as a whole. They chart all stakeholders and players within the business as an internal ecosystem: patients, care providers, and third parties, analyzing data and data sources.

The second task in this phase is really analyzing workflows and available data and trying to extract meaningful insights from that data. Through this process, we are looking for any correlation between different segments of the data, or hidden rules within that data. It’s an exploration of possible value.

Perry Simpson/Executive Director (Perry): So the idea is to come at it with a couple of hypotheses?

Sergii: Or maybe even more than a couple – maybe there will be a number of hypotheses about those meaningful insights – about the value which we are able to extract based on data. This exploratory data analysis process has to be really well structured and focused on specific outcomes. Our data science team does this daily within a variety of industries. Therefore, they are able to constantly leverage their deep expertise, experience, and learning across industries and client engagements.

Perry: So we can bring expertise, not just from healthcare, but from other domains?

Sergii: Absolutely. The data analysis models which you utilize within one industry are applicable to other industries as well. Healthcare is extremely interesting from this perspective, because it has large quantities of data. So that's the first phase – data research or exploratory data analysis.

the first phase – data research or exploratory data analysis

The second phase of the three-phase process is actual modeling, and here I talk about machine learning models. The terminology can be confusing: people talk about A.I. (artificial intelligence), Data Science, and Machine Learning. The general, or umbrella term which we use at Star is Data Science, which in our definition covers data collection, data processes, and data analysis, as well as Machine Learning.

In most cases, when we talk about modeling, we talk about Machine Learning modeling and why that is specifically important. It's important because your data is a dynamic asset, that changes over time. When you apply Machine Learning modeling, you give your software solution capabilities to automatically use incoming data and adapt the outcome accordingly.

The second solution modeling phase is machine learning modeling, so the goal of this phase is to use our exploratory data analysis results as an input to create specific models focused on extracting meaningful insights from exploratory data.

Perry: So we would recommend a particular kind of machine learning model?

Sergii: Exactly right, at this stage.

Perry: Natural Language Processing (NLP) for example, or some other machine learning.

Sergii: These could be classical machine learning models, maybe even a simple logistic regression, or the “new generation” Machine Learning, which is called Deep Learning, based on deep neural networks. We have capabilities within both of these domains at Star; we have knowledge of the models and we are able to apply whatever model is optimal in a specific data analysis situation. So the outcome of this second phase is well-defined, efficient models that provide meaningful business data insights.

Perry: When does that convert to Artificial Intelligence? Is there a combination of machine learning models that become A.I., or where do they apply in this case?

Sergii: You can think of A.I. as a subset of Machine Learning. So when we talk about ML models, ultimately we are also talking about artificial intelligence. Obviously, this is not a broad artificial intelligence. Rather, it’s a narrow artificial intelligence, focused on very specific goals, and extracting specific data insights. This is what makes A.I. extremely efficient and powerful nowadays.

The third phase is converting these models, comprised of bits and pieces of software, into a great solution that’s reliable, scalable and, most importantly, has a great user interface that makes it easily consumable. We start at this 'data lake', as we call it in data science, which is unstructured data with hidden insights, and we finish with this software-driven solution, which analyzes the data, adapts models towards incoming data, then provides important insights in different forms: these may be informational insights or some specific actions, that the software system implements based on incoming data. That's the approach we practice here at Star and which turns out to be effective and helpful within different industries, and healthcare specifically.

Perry: Some use cases, for example, might be a radiology lab with a bunch of X-ray images, and they want to produce a machine learning or A.I. algorithm that helps with more precise and faster diagnoses. In a very clear-cut use case, you could offload that workflow onto a machine, instead of sending it to a doctor or a radiologist I imagine that this presents a broad-based challenge – you need large data sets to tag with information that helps train the model. So where does that come in? How do you bridge the gap between raw data sets and training the model on tagging, before you really start to get a reliable algorithm that's doing what you want it to do?

Sergii: You touched on a very interesting aspect: this example of the automatic analysis of radiology images is a classic example of a machine learning application, specifically a deep learning application. There are many positive aspects of deep learning: it's extremely effective in some cases, but one of the challenges related to deep learning – in fact, the key challenge – is the availability of so-called training data. When you work with deep neural nets to make an effective instrument, you need a lot of reliable, labeled training data... We do know based on our own experience that getting labeled data is a big challenge across industries. It's oftentimes an even bigger challenge than defining an optimal neural net topology. There are different ways to solve it, but some situations require coding – implementing a dedicated tool chain for labeling the data to collect training data, and labeling the results. That is indeed the constraint of deep learning technology.

Perry: There is a concept called federated learning, where for example, a hospital, wants to optimize diagnosis, but here's a challenge with labelling or tagging. You could, in theory, use federated learning to leverage other data sets that have done some of this work, using their anonymized data to help train another dataset so you don't start from scratch.

Sergii: Yes, it's a well-known technique within data science machine learning, and it turns out to be quite effective, specifically within the healthcare space as this is a space of strict NDAs and HIPAA compliance. What federated learning allows you to do as a solution provider is to get the data from one entity and process that data so that you lose all the data’s individual characteristics. But at the same time, you keep the learning out of that data. Then you aggregate the data across many patients and keep this aggregation as a trained model. This model has all the knowledge related to that well of data, but it literally loses all the individual parameters of specific patients – so that's the way you train your model.

There’s also transfer learning, where you take this trained model, then feed into it another set of data to make it even more efficient. By so doing, you retain the overall integral learning based on that data, but you don’t use any personal patient data. That's just one technique you can use as a solution provider working with many organizations. All those organizations operate in silo-mode – meaning they are not able to share their data, but you can share the learnings from the data. That's a very interesting approach, and we are using that approach within this practice.

Perry: We could honor both HIPAA compliance and data privacy, and ensure that the patient's individual data is not exposed. Yet, we still leverage that learning “federated” for other purposes so that organizations don't have to start from scratch every time.

Sergii: And then all these organizations can leverage that combined learning and these efficient models.

Perry: There are a lot of possibilities for healthcare organizations to rapidly diagnose and improve their workflows, and we've got quite a toolset to help make that happen.

Sergii: Yes, exactly. The AI, Machine Learning, and Deep Learning toolset can dramatically improve business processes within healthcare organizations. From patient diagnosis and prognosis to optimizing workflows and customer care, these technologies provide immense opportunities on par with the onset of personal computing, so there are very exciting days ahead!