Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only T&Cs apply

Find out more

Medication decision as a reinforcement learning problem

So if we were to imagine the perfect medical system what could that be? we could think about a system situation where someone or a computer would have complete knowledge of whole human physiology all diseases, all existing treatment options and what would be the most the most optimal one. Obviously this is impossible. So, an another may be possible approach would be to imagine having at any point in time, a permanent and unbiased knowledge about a vast number of patients that are very similar to a new patient presenting in front of you.
If at the same time you could know which treatments all these patients received and what was their outcome, This would probably allow you to reduce the uncertainty around what medical decision is the most optimal for your new patient. So I’d like to picture this as a sort of cloud of patients that are similar to your new patient, in front of you. Imagine if he knew exactly what treatment this patient received at any point in time what was their outcome and then maybe if a new patient was coming. This would give you an idea about what is the most optimal decision for this new case. So indeed we can formulate medical decision as a reinforcement learning problem.
This is by adapting a figure found in the textbook by Sutton and Barto that was recently updated. So in this in this case, the physician follows a policy the policy is classically called Pi, the Greek letter pi. The physician will act on the environment of the patient by in the particular case that I will be describing by giving drugs to to the patient, the patient will react by switching states going getting better or getting worse sometimes, and if the new state is more desirable than a previous state, then undoubtedly this the physician will get a reward and the the action that led to this beneficial transition will be reinforced.
So if we had such a framework we could imagine asking two questions: First of all, how good is the policy of physicians? Can we quantify this? And secondly, obviously we can identify an optimal policy. Can we optimize medical decision using reinforcement learning and this is a very exciting field of research. A very classical framework for implementing reinforcement learning and indeed the simplest possible one is a Markov decision process. and this is used this is being used times and times again, not that often in in healthcare for the reasons that we’ll be discussing. But it is a powerful system, a powerful framework for for modeling sequential decisions.
Decisions are also stochastic many partially random and if model properly does this MGP approach can lead to about finding optimal solutions. There are different elements that need to be defined in order to run an MDP so you need to define States; you need to define actions; you need to model or compute the dynamics of the system in terms of transition between the different states so this is what we call the transition matrix. And finally you didn’t need to define the reward you need to associate reward and penalty to some states.
And if we consider the current state of the art reinforcement learning there are a few ground breaking papers that have been published over, over the years and and here is here is probably one of the most famous ones published in 2015 in Nature where authors from the deep mind group were able to program algorithms to play Atari game,s pixel to action meaning there was no… that they used grow image data and in an algorithm that used deep reinforcement learning. There’s been lots of progress and actually very fast progress recently they demonstrated that there one of their algorithm was able to beat the world most famous player at the game of GO.
Another of their algorithm was implemented in chess and now we’re seeing algorithms are very competent at playing real-time first-person shooter like the game of Doom here in the middle. On the right here when I’m showing you is another subtype of reinforcement learning that’s an another very exciting field called a hierarchical reinforcement and where you have sub agents that optimize a sub task and then overarching agents that basically rule those sub agents and enable to play very complex games and to achieve very complex tasks such as the strategy game Starcraft 2 here. But it is surprising to see that there’s these groundbreaking algorithms have not yet trickled into the field of health care.
And there are certainly some very good reasons for this. So we can ask ourselves why is medicine harder than playing video games? I would argue that medicine is a high-risk environment, meaning that we cannot afford to deploy a bad policy that would be harmful for people. The second aspect is we have limited amount of training data. This highlights the importance of open data and making data publicly available.
Another important aspect as the environment is not fully specified in medicine if we go back to our Nature paper here when you see on the top the four screens that the environment is fully specified at any point in time, all the information available to make the best decision is displayed on the screen and in healthcare is very different because the information we have is vital signs lab values x-ray reports and so on. So this is if you wish this is like looking through the keyhole of physiology with the data that we have in healthcare, the environment is not fully specified. We don’t have all the information available to make the best decision.
Also in healthcare, of course impossible to learn by trial and error on the right here I’m showing you an example of a classic computer science paper where you see that there have been training virtual car, learning a virtual task of climbing a mountain. This is the Montaigne car problem and you see that they’ve used 20,000 trials to train the model, meaning that many of those trials ended up by the car crashing but as long as it’s all virtual it doesn’t matter. This is impossible of course in medicine where we could never afford to do this with human life.
And that’s related to my previous comment there’s no simulator, there’s no high fidelity simulator that would allow us to test those strategies without harming humans. Another key difficulty in deploying or reinforcement learning is about evaluation. And this is a very accurate, very ongoing field of research where the difficulty is to evaluate the value of an AI policy without deploying it. So using only retrospective data, this is a task called of policy evaluation in reinforcement learning. And there are methods that all have limitations that allow to do this task meaning to generate estimates of the value of an AI policy with confidence intervals.
This the picture I’m showing you here is an example of the research I’m conducting on sepsis where we evaluated the value of positions of the physicians policy when treating patients with sepsis in intensive care and in parallel we are also evaluating the value of an AI policy that was generated using the same data meaning it’s a retrospective analysis and using some type of algorithms called high confidence of policy evaluation we were able to generate those confidence bounds that basically guarantee the safety of an AI policy with it an accepted level of risk. So I’m gonna I’m gonna leave it here. This was a very quick overview of some of the exciting prospect of using reinforcement in healthcare.
It’s a very much an ongoing field of research but with the at the condition that doctors, and data scientists, computer scientists work together, there is a chance that we will be able to address some of those challenges and ultimately improve the way we treat our patients. Thank you

Dr. Komorowski gives a brief explain and few example of reinforcement learning. There are some challenges in medical area. Using an example of AI in video games, Dr. Komorowski discuss AI in medical decisions.

This article is from the free online

Artificial Intelligence for Healthcare: Opportunities and Challenges

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now