I’d imagine that the variance of interpretation from (most) AI on images would be significantly smaller than the variance of interpretation from humans. I think that this is where AI can be a huge benefit for medical imaging reviews, but I think the “accuracy” of the interpretation is going to be limited to what it’s allowed to learn from.
I wonder if we restrict the humans to specialists in the field (e.g., hepatologist for reviewing liver scans) if that human variance gets any smaller . . .
Someone posted an article noting that newer(?) doctors had a tendency to take the AI at face value. I’d argue that this is a training issue rather than an AI issue. You see the same thing with people believing the numbers a calculator spits out or the media reports. In those cases, people who work with these sorts of things fairly often get training on looking at the results for reasonableness rather than just taking them as fact. Most (all?) tools we use have flaws of one sort or another. Part of becoming proficient at using them is learning the strengths and weaknesses of them.
As an example, most doctors take blood pressure readings. If they measure higher than normal or lower than normal, it indicates that there are issues. We’ve since learned, going to the doctor raises people’s blood pressure. So now doctors ask people to measure it at home if possible. BP also varies throughout the day and based on the position you’re in. Doctors are still learning how much this affects things given there’s active research on it today.
If you’re using AI on mammograms to detect breast cancer, you may figure out it’s good for these types, but not so good for those types. Using AI to help diagnosis isn’t that different from doctors having to take different approaches to screen for breast cancer based on the type of breast tissue a woman has or sometimes based on the size of their breasts.
Aside from grade 10 for a job shadowing thing, it hasn’t really been in my career plans. The pay is nice, but the training sucks, and the job kind of sucks too. A former co-worker of mine is married to a doctor and the more she tells me about his training, the happier I am that I never pursued the med school path.
I did do the entry exam for optometry, and had scores that were good enough to get in and the connections to help me get in, but I didn’t end up applying as it just wasn’t going to be a job I enjoyed in the long run.
I’ve got a friend who’s a pharmaceutical epidemiologist. I likely could have done that. Her pay is significantly more than I make, which I’m a bit jealous of, on the other hand, she works vastly more hours than I do and so I suspect my hourly rate might be higher than hers. Seeing that, I’m fine with not having her job and the jealousy also goes away.
I find the tech and the computation around MRIs fascinating. I sort of think becoming an MRI operator could be an interesting job. On the other hand, it would be disturbing to have to maintain a neutral appearance dealing with someone after spotting tumours spreading throughout their body and knowing that they’re going to die soon and I can’t tell them as it’s the doctor’s job. So probably unlikely to switch to that.
I mostly like my job. I don’t have to roll into work until noon most days, I get a bunch of vacation, a lot of independence and a defined benefit pension plan that’s indexed to inflation.
Why would she get a second scan? Just upload the image.
Speaking of, one of the more interesting things about AI is that it will be used to audit everything we already decided. Eg. x-rays, criminal evidence, legal documents, financial documents, court transcripts, scientific studies, audio records, translations, books, essays, etc.
As it gets good at identifying errors, it will be able to do so with all our records.
It will be interesting to see. The government just took a billion or two out of our pension plan because it’s currently overfunded (more than 25% above the target funding level). I don’t really get the logic of doing that, I’d think the smarter move would be to maybe slow their contributions for a bit and let things even out.
Isn’t all AI trained to please in some way? If you are having it scan xrays, some feedback loop is needed to further train it on successes and failures. It seems that could diverge reality just like a LLM.
A model trained to recognize cancer in scans will be given factual responses to predict. Either the scan was should have been diagnosed as potential cancer, or not. Or perhaps the scan was of cancer, or was not. In any case, the idea of a fact is “baked” into the training of the model.
The LLMs just predict which words come next. If the words “1+1=” are usually followed by “3”, then “1+1=3” is the right prediction as far as the LLM is concerned. Facts have nothing to do with it. Or rather, the facts that matter are what people wrote, not whether those words were factually correct.
You can do supervised training, which is what I’ve been doing with my machine learning approaches.
For training my machine learning type models I tell the model: here is a set of 900 spectra with both an age and a species associated with each one. Randomly pick 750 of these spectra as the fit data set. Use these ones to develop a relationship between spectra and age and spectra and species (two seperate problems I’m trying to solve with the same set of spectra). Once you’ve done that, see how good a job you do at predicting ages or species for the other 150 spectra.
For your x-ray scenario, I think what I’d do is probably build a data set of a few to many thousands of femur x-rays with known diagnoses (e.g. broken, not broken). I’d split the data set into a test and a fit dataset and let it loose on them using the same protocol as I used above.
I meant literally trained to please, LLMs are incentivized to produce results that people like, which is one reason it’s difficult to get them to say no.
Anyway-- like magillam said, ideally you aren’t using a test result to train the AI-- you are using the final outcome to train the AI. I assume making sure that your training data is complete valid is a perennial problem though.
“It turns out," Ng said, “that when we collect data from Stanford Hospital, then we train and test on data from the same hospital, indeed, we can publish papers showing [the algorithms] are comparable to human radiologists in spotting certain conditions."
But, he said, “It turns out [that when] you take that same model, that same AI system, to an older hospital down the street, with an older machine, and the technician uses a slightly different imaging protocol, that data drifts to cause the performance of AI system to degrade significantly. In contrast, any human radiologist can walk down the street to the older hospital and do just fine.
“So even though at a moment in time, on a specific data set, we can show this works, the clinical reality is that these models still need a lot of work to reach production."