Skip to 0 minutes and 15 seconds Alright, so that was step one. Step two is, of course, now we need to improve performance, For one easy solution here is instead of using those seven variables that they used in that study, Why not just use all the information that we have like an insurance claims database has a lot of information in it. We have like thousands of different drugs over being recorded, thousands of different diagnoses and procedures, other observations. Why not use all of that information? So, instead of a case-control design, we use a new user core design where you compare people that start one treatment to people that start another treatment.
Skip to 0 minutes and 57 seconds And that makes life a little bit easier in that, anything that happens before the start of the treatment, then, it’s a fair game to use as an adjustment because the treatments can actually use something that happens before you start the treatment. So you know it’s not on the cause of backway. So we use all the information that we have prior to people starting treatment and that we do what’s called a large-scale regularized regression. And we actually just recently published a paper on this this process, but just to give you an idea of what that look looks like. We did a study where we compared to drugs duloxetine and through sertraline which are both anti-depression drugs.
Skip to 1 minute and 41 seconds And we collected all the data before treatment initiation. So we ended up with almost 60,000 variables that go into this model. And we fitted what’s called a propensity score model, and then we use that to stratify the populations and basically, make them more similar. And so what I’m showing here on the x-axis is the standardized difference of mean. It basically tells you how different the two groups, the two users of duloxetine and the users of sertraline were before any adjustment. And every one of these dots is one of these 60,000 variables that we used, you see that all these groups were actually very different. That would have been a very strong calls of confounding if you’re comparing people that already….
Skip to 2 minutes and 30 seconds I think the serpentine users were already much sicker than the duloxetine users before they started taking the drug. So if you didn’t just compare them without adjusting for that, you will get a lot of wrong answers. Well, you also see is that after we do the adjustment,
Skip to 2 minutes and 47 seconds this imbalance goes away. And actually, usually, what people do is zero point one is the rule of thumb is anything below that we consider to be imbalance. So, we see that after adjustment all of these 60,000 variables are balanced on. And so you could have done the hard work of picking seven variables yourself and that those would have been included in the 60,000. So it’s better to actually just let the computer figure it out. I would say. That’s why we call it artificial intelligence. Just an idea of the scale is we had 90,000 people in one group, and 75,000 in the other group. So this is already getting to be a large or computational problem.
Skip to 3 minutes and 29 seconds We’re probably already could have used that supercomputer on a chip here. Of course, we need to measure that performance and so here’s a plot similar to what I showed earlier, showing these negative controls, in this case, negative control outcomes. So outcomes that we believe are not caused by either drug, and we see that that is actually a much better picture. We see that there is a most of these negative controls where the true hazard ratio is equal to one. We actually get estimates that are close to that. And if you look at the calibrated, the area where the calibrated p-value is smaller than 0.05. It’s almost the same as where the uncalibrated one is smaller than 0.05.
Skip to 4 minutes and 14 seconds So already doing very good and getting much less bias in our estimates. I think… we’ll actually skip over this but basically, what we did is we managed to not only use negative controls where we could derive; positive controls where that true hazard ratio is greater than 1 will basically able to show that we also get good estimates. And in those ranges we were able to use that to calibrate a p-value and a confidant about which we wrote in this paper. But I’ll skip over that. But the point here, is that for an individual study, we believe that we actually were able to improve performance, in terms of getting the right answer quite a bit.
Skip to 4 minutes and 59 seconds Also in terms of computation, just to point out. We started of with just needing to do one study. Now we actually have to do it 200 times for all of these controls. So and all of those using those 60,000 variables. We’re getting more and more in need of that fancy computer chip.
Redesign Method to Improve the Performance
Dr. Martijn Schuemie explains step 2 for 21st Century development is to improve performance by explaining the new model for the experiments.
A new method is to use all the information in the study design. Instead of a case-control design, he explains that the researcher uses a new user cohort design where you can compare people that start one treatment to people that start another treatment.
Click here if you want to learn more on the research, Evaluating large-scale propensity score performance through real-world and synthetic data experiments.