Dr. Maddux: Artificial intelligence and machine learning is being widely adopted in health care. Staffing challenges, coupled with the rising cost of care around the world, puts cost pressure on medical care providers and creates ever increasing demands on physicians and nurses. By adopting artificial intelligence and machine learning algorithms and care delivery, workflows may be adjusted to improve efficiencies and afford clinicians more time to be with patients.
Dr. Isaac Kohane, chair of Harvard Medical Schools Department of Biomedical Informatics, joins us to discuss opportunities created by AI and machine learning in healthcare. Welcome, Zak.
Dr. Kohane: Thank you for having me on. I'm glad to be able to have this conversation.
Dr. Maddux: Tell us about the development of AI in healthcare. And we've heard so much recently about the impact of generative AI. Give us a little bit of the timeline of how AI really became as important as it seems like it is today and is going to be in the future.
Dr. Kohane: It may or may not surprise you that as far back as 1950s, there were doctors who were already writing about how AI would be a change maker for medicine. And one of our most famous, endocrinologists who discovered the syndrome of inappropriate ADH, he was at Tufts, wrote an article in the New England Journal of Medicine in 1970 called “Medicine and the Computer”.
Where he outlined a lot of the opportunities that we're now beginning to see. The difference between then and now was, it's interesting to ponder because it really relates to why there's so much excitement now. Well, number one, we had no data. The data was not online. And so everything had to be hand fed to the computer, which was unrealistic in any clinical setting.
Second, was updating the computer as to the practice of medicine involved writing a bunch of rules. Let's say in the 1980s, which was another big heyday of artificial intelligence medicine which I'm familiar with, because that's when I did my Ph.D. in computer science around my clinical training. And there was no way that human beings can keep all the rules up to date to reflect the state of medicine.
Just think about how much trouble we have keeping textbooks up to date, and those are nowhere near as detailed. Think of something now up to date, which is much more course grade and so what happened is now that we have clinical data online, but we have, for better or worse, whatever human beings have said, at some point in some publication or textbook about medicine, is in a computer readable form on the web. And the fact that that was available along with the development of really high performance computers?
So, if you're wondering why the NVIDIA stock is so high these days, NVIDIA makes graphical processing units. Graphical processing units were originally developed to allow sweaty teenagers to kill aliens in very high frames per second on 4K screens. And the way you did it is by having this graphical posse and it’s thousands of parallel computers. It turns out that those same parallel computers are able to do calculations which are the base of neural network calculations.
So with having a lot of data about our patients online, having a lot of data about medicine or anything that human beings say about medicine, and then having these GPUs that actually can run neural networks to allow us to calculate models that, for example, predict next word in a sentence using an equation that essentially like a big equation with a trillion parameters, all that became feasible. Feasible in ways that shocked all of us in the last two years.
Dr. Maddux: it's been fascinating to watch the evolution of the discussion where I think the technology's been way ahead of society's ability to absorb what the meaning of it might be and what the opportunity and risk is. So you get wide extremes, as we see in many other areas of our society, getting talked about more than where the useful middle might be.
Dr. Kohane: I agree. And what's fascinating about these large language models, which, by the way, the large language models really came into their own with a paper that was published by Google in 2017. And, ironically, it's Microsoft through its investment in OpenAI that actually benefited from that. And now Google's playing catch-up. But what OpenAI did that's really interesting and that is a direct challenge, a good challenge for medicine is they released ChatGPT not to doctors, not to lawyers, but to the whole world.
Whether we like it or not, patients are now getting access to these same models as our clinicians. So that even though these machines, these large language models are error prone, can also make things up. Nonetheless, in the absence of sufficient conversations with doctors, the absence of primary care and absence of enough time even with tertiary care, patients are increasingly using these models in ways that we could not imagine.
And at the same time, doctors are using them in ways that we could not imagine. So, for example, these models, as from OpenAI, are not HIPPA compliant, and yet doctors are currently pasting in patient summaries into the field on that web browser and saying, please write the authorization request letter. And it does it.
Now, there are HIPPA compliant versions of this available through Microsoft, through their Azure Cloud. But the point I'm making is that both doctors, clinicians broadly, and patients are in such need, of this kind of support that they are running roughshod across a lot of niceties that we thought would be barriers.
Dr. Maddux: I see that physicians have great interest in using this, and yet are also skeptical when you look at the side of the generative AI models that are understanding things that maybe they haven't been able to put together themselves. Speak to that for a moment and also a little bit about this concept of hallucinations.
Dr. Kohane: In the end, these models do something incredibly simple, which is if I tell you what's the next word in “getting a blood pressure is…” say, “useless”, it’d be unlikely. “Useful” is likely. “Is best done every day.” That's also possibly likely. And somehow just calculating that allows these programs to have conversations like I'm called down to the emergency room with a child's non palpable testes, and a small phallus.
What's the differential diagnosis? And it gets it right, going all the way through to the diagnostic procedures and including both imaging and genetic testing. Somehow it does all that. But turns out that there is a fine line that these programs do not quite understand between hypothesizing something that's reasonable and imagining something that is completely unreasonable and stating it in a way that is so confident that you are very likely to say, Oh, sure, but this is exactly how a lawyer recently got flagged, because he put in a case and his case was fully generated by ChatGPT, but he did not verify the case law that was cited with good sounding case names.
They were all made up. Likewise, you can find in GPT papers that are referenced in with good sounding titles and journals that you know, but they're just made up. Now, in fairness, some of the more blatant hallucinations can be now checked by looking up on the web and the various data companies are doing that actively.
So they're actually having it look up these references. But when you think about it, thinking about a counterfactual, you know what if I did not do this, is an act of imagination and the distinction between imagining and really going on a trip and hallucinating it, is not that easy a distinction for computers to make.
And so what’s interesting is when you actually tune them to be less likely hallucinate, they're also less likely to have intelligent hypothesis generation. It's a double-edged sword, but something that you have to use.
Dr. Maddux: They’ll be lots of conversations in our medical communities about at what point do you trust these devices and for what things do you trust them? And it seems like there will be a lot of skepticism and yet a lot of amazement at how exceptional that the models can be at times in discerning things that we might struggle with discerning ourselves from the data that's presented.
Dr. Kohane: That's right. But I hope I don't come across as overly cynical if I relate to you the following distinction is, the way medicine's practiced, where the way we think it should be practiced. I'm reminded of a study done by David Bates at Brigham and Women's, maybe 20 years ago, where in the P&T committee there were endless discussions about the use of a drug Ondvansetron an antiemetic drug for chemotherapy, expensive.
And so it actually made the difference if you gave it every 4 hours or 6 hours and at what dose, and there would be costs to the hospital for different doses. And there were endless discussions. And so when they created an order entry system at the Brigham, they decided to have all those dosage regimens available in a pop-up menu. What they found is despite all those discussions, the default dose that was given when you open the screen was the one that was used in 95% of cases.
And so my fear is that despite all the concerns that we might have about the accuracy of AI. That a harried, busy doctor is going to use it to generate a clinic note or a note to the patient. Not really look at it too closely and then miss the errors just because of the pressure.
So what we say in a discussion with each other when we realize we're on the mic is live and we're looking at each other's clinical acumen. It's very different from the way we practice medicine. And I don't know how you feel about the following model. The same issue is present in some way with self-driving cars.
And so if you have a Tesla, not only does it require that you keep your hands on the wheel to make sure that you're paying attention, it actually has cameras that look at your eyes. And I have a Tesla and let's say something like I pick up my phone and doing something Tesla will see that I'm looking at the phone and it will switch off the autopilot.
Not only that, it will say that's the first time, you do that four more times and I’m switching it off forever. And so, the question is, are we going to have that kind of oversight for doctors to make sure that they just don't go in, just sort of default following of the AI.
Dr. Maddux: I think that productivity enhancement of trying to do more and more reduces the editing checks that we need to make. That is a concern I know that many people have. What do you think the regulatory environment is going to be both domestically here in the US, but also internationally? We deal in our company with not only HIPPA, but we deal with GDPR quite a bit as we have large scale operations in other parts of the world. I'm just curious how are they going to catch up?
Dr. Kohane: So that is I think the really, really perhaps the most important question. Because we tend to have myopia in our healthcare system thinking that we're the only regulatory regime. But I think there's actually four regulatory regimes that we need to think about. Obviously ours. But there's European, which is quite distinct. There's China, and then there's the poor and developing world like parts of Africa, for example, which in some sense are leapfrogging certain technological states that we've had to go into full direct to your iPhone where in fact, you could use those supports without a lot of that infrastructure.
And so it's very clear that there will be different tradeoffs for those societies because of their vision of how government should be aware, should take care of that. How that plays out in our particular society is still being, I think, negotiated. I think that just as electronic health record companies were actually for regulation after they made it to market, you'll see that a lot of the current big players with large language models might want regulatory capture.
In other words, they might want to invite regulation now. Let's create a bigger barrier of entry for others, let's say from the open-source community. At the same time, speaking as both a patient and as a doctor, we do want to have some measures of quality in what we have. We use Erythropoietin or another drug to increase red blood cell count.
We want to know how effective it is, and how dangerous it is. We want to know the same things about these programs. But here's the rub, human beings are fairly standardized as still a variation that we can talk about different ethnic groups, individuals from different continents of origin, but they're still mostly human beings. Turns out what's very different is the practice of medicine. Far more different than patients.
And so the FDA already knows it has a problem that if you train up a AI model on one health care system, it may not transfer well to a different part of the United States. And that problem called data set shift makes it very hard to create the same model as we did for drugs. And here's where I will just put in an advertisement, and I apologize, but because I believe this problem is so significant, it's why I agreed to become the editor in chief of a new journal NEJM AI.
It's a spinoff of the NEJM of internal medicine, focused exactly on this, which is evaluating AI products in ways that are mythologically sound and reproducible. I think it's going to be absolutely central to our success. And by the way, I do hope that nephrologists will be running some trials that we can publish.
Dr. Maddux: Our European and Asian clinics operate their anemia management on machine learning model that was developed that we call the anemia control module. In the US. we developed a series of patient avatars that we took an algorithm and did virtual clinical trials to optimize the algorithm. We're now testing between the two of them whether the European, Asian, Latin American model does a better job than the algorithm that we produced in the US, which is our biggest market.
Dr. Kohane: I would love to see that manuscript.
Dr. Maddux: We spend a lot of time obsessing over ESA use, hemoglobins. and the combination of iron management and ESAs.
Dr. Kohane: I think it's very important then in other ways to tell you why we should be worried about this and focused on this is what will be paid for, what will be covered. We'll have concrete poured over it by these algorithms. And so just as the eGFR was a challenge, this will be a much bigger challenge. So I think we have to really stay awake here to go back to the Tesla metaphor, make sure that we're driving the car so that our patients are best served.
Dr. Maddux: One of the things that we just developed is called the Apollo database. Since 2018, all of our clinical data developed in clinics down to the treatment by treatment data of kidney disease patients across the entire world where we are operating in 50 different countries where we deliver services.
This is truly an anonymized database that is both HIPPA and GDPR compliant. And we will be using that as a way to try to answer questions about different population groups that we can't answer today because we hadn't previously harmonized all the data sets. I think these are the kinds of things where a language model would be really helpful.
Dr. Kohane: It would be helpful. I think the best use for large language models. A slam dunk is annotating these large databases, reading through the records in whatever language, by the way they're just as capable in French as they are in Italian, as they are in English, and annotate it and say, tell me about this patient at different time points.
I think it's going to be instead of having to hire a bunch of medically savvy people and not even be able to keep up. Do this annotation, it's I think, a really good use. And by the way, that's incredibly valuable database. And I'm sure you have already thought about this but for running trials, having baselines for trials, it's absolutely essential to have that.
Dr. Maddux: The teams work hard for the last, it's been four years getting through all the privacy, legal and other machinery that we need to because we operate in so many countries under different rules, we've had to make sure we can accommodate each of those. It's now time to launch and then we will start doing quarterly updates so that it will move from being at the end of last year to being as current as we can get it with our data.
And right now, it's opening population of patients is a little over 530,000 patients. It’s a sizable database, and I think will be quite valuable because it's down to the treatment by treatment blood pressures, blood flow rates, other diagnostic parameters as well as all the lab data and nursing data.
Dr. Kohane: There was a company called Flatiron Health, which was trying to do the analogous thing for cancer. And, at the time, they didn't have large language models, so they had to employ thousands of people to do careful curation of all the narrative. So, I still think you need some careful curation, but I think the economics of this is going to be much more feasible if you have these large language models.
Dr. Maddux: We've shared the book that you and Peter Lee and Carrie Goldberg wrote together to try to set a baseline of where are we as a society in our maturity of looking at AI in medicine, especially the generative AI.
Any final thoughts about how you think organizations should look at this going forward and the speed of adoption and the kinds of questions that we need to raise so that we're thoughtful in the approach that we have.
Dr. Kohane: I think there's an opportunity to get ahead in a way that's not too risky, and there's always a risk of getting too far ahead of where the world is and end up spending resources wastefully. But I do think that patients are getting access to these models as well as doctors, I think there's a genuine opportunity to combine the data that patients get at home and outside the clinic and diaries, various measurements and integrate those with what they're getting from the formal health care system to really get much more holistic trajectories of patients over time and ways that will allow us to be perhaps much more responsive, even more responsive than we currently are to patients.
You've seen many companies who focused on measured blood pressure at home or looking at step counts, looking at perhaps at dietary efforts. Increasingly, you have parts of your health record through Apple Health, extraction from hospital systems. All this is creating the substrate for a rich but messy composition of the patient’s clinical state across time and the use of these models to summarize it both for doctors and patients I think will be very, very useful and forward looking and to be crass, will create a much better customer relation process where patients are going to be much more involved.
These models allow you to ask questions about your care and get actual thoughtful answers without actually having, necessarily a human being in the loop based on your entire history. I do think that in terms of continuity of care and a comprehensive understanding of a patient's history, this flipping of the clinic, taking all the data not just from a hospital but from the entirety of their care, it's now no longer a pipe dream.
Dr. Maddux: Health equity is such a significant topic in our conversation in medicine today, and I'm interested in your perspective on whether you think these generative AI models will reduce the divide that exists between marginalized groups or expand that. Do you do you have any forward looking thoughts on that?
Dr. Kohane: There's two ways that these large language models could create further inequity. One obviously is to the extent that the data from which they're drawn reflects bias, then we've got a problem. And indeed, since human beings are biased both in good ways and bad ways that will be captured by these models. On the other hand, I think that if we're overly sensitive, overly careful to avoid this, then we end up disadvantaging the groups that might most benefit from this.
We say these large language models should only be used for white, wealthy people and we have to wait for their use in disadvantaged communities, that would really accelerate inequities. And that's why I'm proud to say that I'm working with colleagues at Emory and Morehouse School of Medicine to actually evaluate these, to see to what extent is the bias, in fact, reflected in the decision making. For example, classification of heart disease on presentation in emergency rooms, and to what degree could it actually be used today?
So on the one hand, we don't want to further instantiate bias. On the other hand, we don't want to limit possible good payoff from an exciting application of this technology.
Dr. Maddux: The internet shifted the tectonic plates of our society fundamentally, and I expect that generative AI is the next big tectonic plate shifter that we're going to see in society over the next 20, 25 years. And it's been so nice having you give your thoughts and perspectives, having been so thoughtful in this field, not only in medicine, but overall in bioinformatics.
Dr. Kohane: Well, it's really great.
Dr. Maddux: I've been speaking with Dr. Isaac Kohane and we've been talking about the impact of AI in healthcare and medicine. Thanks so much for being here.
Dr. Kohane: Thank you.