Many decisions you make in the modern world, and nearly every single action you take, is essentially recorded and, at some point, fed through a model.
A great example is your smart phone. From the moment you interact with it in the morning, every subsequent interaction is catalogued. For example, your location is continuously piped through to a big model that basically tries to decide, “How can we use this data best?”
A common use of this data, your location, might be deciding the best way for you to get to work on Google Maps, or it could be used to make sure that the ads you’re seeing are targeted to where you are currently located. In both cases, your data is being used as an input to a massive model, which is too big to write down.
This huge model is actually just a sequence of what we call “function approximations” – a sequence of ways of computing and moving through a data space. If you try and look at that object statistically, as a whole, you can’t do it. It’s too difficult – it’s simply too big. So, as statisticians, we use methods that don’t require us to write down a physical model. We create “implicit” models.
My research is all about measuring the accuracy of these implicit models. As you can imagine, if I have this big structure working in the background, you would be pretty concerned if the things that are getting spat out of it are nonsensical. For instance, when Google Maps started off around 15 years ago, you had people driving into rivers because the models that were being used to deliver directions in real time weren’t accurate enough.
Your data is being used as an input to a massive model.
Think of self-driving cars. The model they operate on is actually relatively simple – a sequence of decisions through time. But when you take the whole path over which the car is driving, that becomes a very complicated sequence of decisions that quickly become too numerous to enumerate.
How then do we analyse the output or the decisions that that model is making? The way we do that is through “Bayesian probability.”
Any model is a sequence of inputs and outputs. The inputs could be features of the data we’re trying to analyse, and things that are we are uncertain about as they are essentially random/unobservable, which we call “unknowns”. Our goal is to see how those things that we’re uncertain about impact the output of the model.
This is where we employ the language of probability. And this is where Bayes comes in. In conventional statistical analysis, decisions are made conditional on the value of these “unknowns”. But what we really care about is how these unknowns impact our output.
Hence, what we would really like to get at is the reverse, how our decisions are influenced by these unknowns. It’s the inverse of the forward way we normally think about probability.
This is inverse probability.
The Reverend Thomas Bayes (1702-1761) was a very interesting chap who died relatively young, and even before properly publishing his initial results that really sparked most of this thinking about inverse probability (his results were posthumously published by his friend and patron Richard Price).
The idea of inverse probability was also arrived at by Pierre-Simon Laplace about the same time, though completely independent of one another – two similar ideas coming from very different approaches. But neither of them could actually work out how to calculate the probabilities they wanted because they generally depend on integrals that we didn’t know how to calculate until we invented computers in the 1950s.
How do we think about the uncertainty of things like future decisions or other unknowns? In machine learning and statistics, we generally can’t measure things exactly (there’s always an element of randomness), and so we can only ever measure an approximation. To quantify the uncertainty of that approximation, in a meaningful probabilistic sense, is what I find fascinating, and what I spend a lot of time thinking about.
The next big thing? What if we were able to obtain an interpretable machine-learning model that humans could understand?
We can only ever measure an approximation.
The big models that we’re talking about are just too big for humans to be able to interpret – they’re too complex. But if we were able to come up with interpretable understandings of those models, we would see a massive genesis in new architecture for machine learning, and new ways for scientists to solve problems.
The way it works currently is that we train a big model, but we don’t really understand how it arrives at the answers it gives us. We have some “knobs and buttons” that we can tune to make it work, and we get some output. So we think great, this works well. But we have no idea why. There’s no, or very little, theory for it. There’s no sense in which we can 100% say that this is why it works because we can’t interpret it in the way we can interpret simpler models.
But if we can find ways of making those models interpretable, we would see a revolutionary leap forward because we could use that interpretation to extrapolate other ways of working or thinking that would make sense.
It would be like inventing an interpretable version of AI. When we think of language models like ChatGPT, interpreting the output is easy. But interpreting how they get to the decision is almost impossible. They’re just too complex to be able to know what the hell we’re getting. They work unbelievably well, but we don’t know why. From a real mathematical perspective, no one so far has been able to grok why this works.
Humans are really good at pattern recognition, and manipulating existing patterns to fit their goals. The real difficulty is that with current machine-learning technology, we often can’t recognise the pattern it is using. We know there’s a pattern there, but it’s at a level beyond our current comprehension.
If we can somehow find a way of comprehending these patterns, then we can start manoeuvring to build entirely new technologies, new AIs, new machine-learning models that would really help us move forward in terms of science.
As told to Graem Sims
Dr David Frazier is an ARC DECRA Fellow and Associate Professor in Econometrics and Business Statistics at Monash University. He was awarded the 2023 Moran Medal by the Australian Academy of Science for his outstanding research in statistics and computational modelling.