Mathematical Approaches to Image Processing with Carola Schönlieb
We ought to start with a little bit of your background. So what did you start researching and then what are you researching now?
Okay, so I started out my research in mathematics in Austria, in Vienna, where I actually didn't look at image processing or imaging at all. I started out with so-called partial differential equations, which are equations of a function and its derivatives. They can express change over time or space, and they are models for various natural phenomena in physics and biology. Lots of things are explained by these differential equations.
My first paper, again, had nothing to do with image processing. It was actually on the Cahn-Hilliard equation, which is an equation that describes phase separation in alloys, in metallic alloys, for instance. So when you cool them down to a certain temperature, you have a mixture of two, and if you cool them down to a certain temperature, they are starting to separate from each other and coarsen out and build these larger areas. There is an equation that models this kind of phenomenon, which is the Cahn-Hilliard equation.
Okay, and my first paper was on the stability analysis of a certain type of solutions to the Cahn-Hilliard equation. Stability analysis meaning that if you perturb your initial condition a little bit, how much does your stationary solution differ? That is when you let time evolve infinitely. A stationary state is a state where the system undergoes no change. So how much do these stationary states differ from each other when you just perturb the initial condition a little bit?
This is in the context of creating alloys or building structure alloys. Was there any particular purpose?
Well, the purpose is a lot with these differential equations to simulate certain phenomena. And so if you understand how stable these stationary states are, if you are at a stationary state and then you perturb the stationary state a little bit, does it go back to the same stationary state or does it go somewhere completely different? You kind of understand how these systems react to perturbations that are naturally occurring because we are in real life, and things happen.
Gosh, okay. Yeah, so it's more an understanding of the physical processes involved in, you know, mixtures of alloys, for instance. Was this at a Technical University where you would be, like, focusing on alloys, or was this a personal interest?
Actually, you know, a lot of applied mathematics on the continent, which is everything else basically here in Europe, is applied mathematics very much. This means that what you're doing is inspired by applications, but eventually, you end up with a mathematical problem. So the driving factor was, well, we were interested in analyzing this equation, and there were techniques coming up that were kind of cool. So it was just a kind of intellectual interest in this equation.
Okay, it was the driving factor for this particular paper. But then, during writing this paper, research at UCLA... Researchers at UCLA, in particular the group of Andrea Bertozzi, used this same equation to do image restoration.
Image restoration meaning you have a digital image and there are parts of this image which are damaged for some reason, or where you have objects which are occluding some other object of interest that you want to get rid of the occlusion or something like that. You have one part in the image that you somehow want to replace with something that is suggested by the surrounding area of this region.
So is this similar to like content-aware fill in Photoshop?
Exactly! Okay, but this predates the Photoshop development, I assume.
It actually does. And I mean, also, the content-aware fill is actually very much based on some of the things that have been initiated by people like Andrea Bertozzi. The technique is different in what Photoshop is using, but it's still based on research in mathematics. In fact, yeah, it's a differential equation.
Maybe, if you want, if you wish, there is a more convenient equation. It is a different type of differential equation that is non-local, taking patches in images and kind of copying and pasting them into the region that you want to replace.
Yep. But anyway, she used the Cahn-Hilliard equation to do that, and that was a kind of eye-opening moment. And then I moved into image processing, still sticking to differential equations at the time and actually looking at image restoration. So, it is the Photoshop content-aware fill type problem.
And yeah, that was basically my PhD. My PhD was about image restoration. Okay. And during my postdoc, then I moved more and more into what is called inverse imaging problems, where what you are observing or what you're measuring in the first place is not an image. Like when you take a photo, you know, the digital image is an image, but there are certain applications like in biomedical imaging where what you're observing is not an image directly but is some transform of this image, like in imaging tomography, for instance.
Okay, think about CT, for instance, computer tomography. What you are, what the CT, what the tomograph is measuring are projections of your three-dimensional object, which is whatever you have in your body, and from that, you want to reconstruct the objects.
All right, projections meaning in the CT sense that you send X-rays through the body. What you're measuring, so you're sending them through, what you're measuring at the other end is the attenuation that they feel when they travel through the body, depending on which type of tissues they hit. And so that's what you're measuring on the other end. You can model that by saying what you're measuring is a line, and it's an integral along the line that the X-ray takes through your body, where you're integrating over the attenuation that it feels.
Yeah, and so from that—and that is a very old problem; it goes back to Radon—it's called the Radon transform. What you're measuring is not an image but it's the Radon transform of your image, which are line integrals over the image density that you want to reconstruct. Right? The gutter consists then where density is different in different parts of your body, and then you can see organs in your body and stuff like that. Right?
So, the likelihood of there being some amount of it missing that you need to fill or recreate or denoise is much higher than an image.
Yeah, that's obvious! That's quite obvious because, well, first of all, we are in a finite dimensional world. So, you know, you don't have all possible infinitely many line integrals of your body measured. And then it's not even, you know, it's not even that would be still okay if you're measuring as many line integrals as correspond to the resolution of the image that you then want to compute from these line integrals.
But then, very often, it's not like that because you want a very high-resolution image because you want to look at all the details in the body, right? But you don't want to measure so many line integrals because you don't want to radiate the patient so much. You don't want to send some X-rays through the patient, so you have a lack of data. You don't have as much data as you want for the high-resolution image to reconstruct, and then, there is noise because these are measurements, right?
And there is always noise in measurements.
And so, were you doing denoising work as well at the same time?
It's integrated into the reconstruction approach. So, in the mathematical algorithm that reconstructs an image, or denoises the three-dimensional image inside your body from these line measurements, the denoising is integrated into this reconstruction step.
Coming from these line integrals, reconstructing is for you, gosh, okay.
And so, what I know about denoising is mostly through audio—like a Fourier transform and that kind of thing. So how are you doing it with an image? How are you denoising in the algorithm?
So, with images, it depends on what you think is important in an image. That will determine how you're going to denoise it. A very successful assumption that has been made for designing image denoising approaches is, and has been, and still is, that the most important information that visually guides you of what this image is showing you, but also that helps you if you want to later quantify something in an image, are the edges in the image.
This is the most important thing—where are boundaries between different objects?
Okay, yeah. When you think about it, what really makes an impression on you of what this image shows are colors, you know, and, at the end, the boundary between these colors—where are the colors changing? These are the edges in the image.
Interesting. And to preserve those and not make them blurry is something that a lot of research in image denoising has gone into—so image denoising methods which can preserve edges in an image.
The Fourier, you know, Fourier type techniques are good; they can smooth out your noise by changing to high frequencies. But they will take away too high frequencies everywhere, right? Which means they will also take away too high frequencies that correspond to edges where the image function is changing rapidly, yeah?
So you're looking to the delta as a very high frequency component of your image, and but this is a component you would like to keep.
Yeah, okay. So you want to differentiate between the high frequency components in the image, which are just noise, and the high frequency components which correspond to these very characteristic features that you want to keep.
There are various techniques, but one very successful one is total variation regularization, for instance, which is a technique that has been used a lot by people in image denoising. You know that models just this assumption that you have sharp discontinuities.
Median filtering is maybe a simpler thing to understand or that people might have heard about, which is not exactly total variation denoising, but it's related.
Got you. So median filtering instead of Gaussian filtering?
Right. Because Gaussian filtering corresponds to your Fourier taking away high frequency stuff.
Okay, sure.
You know, it's so funny—when I was doing Photoshop of the onion, we were always actually interested in blurring edges because one of the most obvious things to spot in Photoshop is a sharp edge and a soft edge in the same photo. So, for instance, like if I were to cut you out and then put you in front of the White House, if the photo has a slight blur—like the depth of field in the photo is like, say like, a 1.4 aperture, which creates a very, very shallow depth of field.
So there's a lot of blur, but if you're crispy, someone can immediately spot that you were dropped into the photo. So it was all about blurring the edges to trick someone into thinking that it was in the same photo.
Yeah, so in your context, these algorithms that will handle the edge sharpness, are they hand-coded or are you using machine learning to create them? How does that work?
So they are classically hand-coded, and this is maybe something that is now, you know, more and more being replaced by other things where image denoising nowadays, I think the best image denoising approaches are actually coming from deep neural networks.
Okay, so you know, these hand-crafted methods get more and more beaten in terms of performance by some of these neural network approaches. They get beaten in certain scenarios, though—on the type of examples they have seen already, or similar types of images that they have seen already, right?
If you present them with something completely different, right? If they only train them on photographs of animals or whatever, right? And then you present them with a CT image from the CT scanner, they will not be able to handle that.
So that is one of the things I think where still hand-crafted models have a certain justification of existence, in a sense, because even with GPU programming and everything, there is still not enough computational power to train a machine to know everything, to learn everything about the world, right?
And so I think a lot, so while you know in certain scenarios, if you know what you want to apply your image denoising approach to, well, it's like the ImageNet thing from like almost 10 years—exactly!
If you know that, then it's fine and that's good, but if you want, you know, think about, for instance, one big thing in CT, let's say, or in different types of biomedical imaging, let's say MRI, the aromatic resonance tomography, the type of image that you get, the resolution, the contrast, and everything very much depends on how you do the acquisition.
How many, let's say in the CT case, how many X-rays you could have been shooting through the patient? But also—and that is actually connected to what I just said—also the type of scanner you're using. Are you using GE, or Siemens, or Toshiba, or whatever?
They have different settings, and they have different ways of going from the measurements to an image. And so, you know, if you train an algorithm, for instance, and your neural network on one of these scanners, it doesn't mean that it works on the images of another scanner, really.
So they're producing entirely different data.
I thought they were just like basically the same tools inside with a different logo!
Well, it's so, this is the other interesting thing; it's not entirely different. Like, you might not spot also visually what the difference is, but this is one of the things that also people start, you know, more and more hopeful—they start, you know, to do some research and understanding this.
That even small perturbations that are consistent in small differences that are consistent between the different scanners might contribute to your algorithm than failing.
You know, I don't know if you have seen these adversarial errors where you do a little perturbation, and then all of a sudden it classifies the image into something completely different.
Right, so yeah, I think the really very exciting—and for mathematicians in particular, the exciting opportunity that neural networks are now offering in contrast to these handcrafted models, yeah, is that they can go beyond just saying here, I want an algorithm that preserves edges, right? Which is a very simplistic view of the world.
But on the other hand, that there are lots of unknowns in these algorithms—on the one hand, that mathematicians I think should be exploring and try to bring some of the analysis and some of the methodologies that help us understand why these handcrafted models work because we can prove properties about the denoising abilities of these methods, how stable they are, for instance, to perturbations in the images.
We know how that works, we can prove things about them; we have error estimates and things like this.
To bring those over to neural networks, I think would be very exciting. But for that, bringing some structure into these neural networks is also important, and that might, on the other hand, when you think about these neural networks having these 100 million parameters that are adapting themselves to the data—maybe in some case, it would be better to not have a million parameters but have an intelligent structural way of reducing the search space.
Right, and as such, bring some structure into the problem, which helps you make statements about stability and things like that, and also statements about what the algorithm is actually doing.
Yeah, because that is another thing, right? Because, since when you look at these handcrafted models, you have started with a hypothesis, right? You have started with our policies of edges are important in an image, right?
And then you come up with a mathematical algorithm that is exactly doing what you wanted it to do, right? You know, then you have to make sure that it is actually doing what you want. If the code is bad, then it's not the code's bad or your model is bad. Maybe you have to change your model in a certain way, okay, but you understand why things are happening.
Yeah, if you have millions of parameters and then, you know, you train this algorithm to do something and then you get a parameterization, then there's a million different parameters, how are you ever going to interpret that?
There are ways, you know, what machine learning people are trying to interpret classification results, for instance. You have these salient features that you can detect in images. What was important for the classification to do this or this?
Yeah, but it's still limited, and I think, yeah, there are lots of very, very cool opportunities.
And so, are you guys working on hand-stitching the two together at this point? Like, what's the status with the current research?
Yeah, so there are different people who are trying to do different things. So I can first tell you what I've been doing over the last couple of years. So, the last couple of years, what I've been doing is I've been trying to, starting with these more handcrafted models, nothing to do yet with neural networks.
I started with the handcrafted models and then, for certain parts in these models where I wasn't quite sure about, are edges really the only thing I'm looking for? For instance, I've tried to parameterize them in a certain way, okay?
But not with a million parameters, but maybe with ten parameters or something like this, and then learn these parameters from actual examples that I would like my handcrafted model to spit out.
This is what we call PI-level optimization or parameter estimation. I mean, people have been doing this for a long time, but now I think the motivation comes more from, you know, there's a certain interpretation in terms of machine learning that is kind of exciting that people are even more interested in.
So this is one way, and levels of parameterization vary in this context. But the good thing is you have a handcrafted model in the end that you still understand, right? And that you can still prove things about. You still have guarantees on your solution; you know you have guarantees that if you, you don't have these adversarial errors—that if you perturb it a little bit, you get a completely different result—this is really something you don't want, right?
The other thing is that this is more blue sky, and this actually goes a little bit against what I said before, which is we have been starting to use deep neural networks for problems in computer tomography, for instance.
And there, at the moment, we cannot prove a lot of things, but we can see some ways of how to combine these more handcrafted models with neural networks in the sense of what you feed them with.
For instance, the prior information you feed them with, the data—maybe not just the measurement—or maybe also the information that the measurements are actually line integrals of the three-dimensional object that you want to reconstruct.
Yep, and doing this in a kind of iterative fashion where you always go back to the fact that, actually remember neural networks, these are line measurements that I am feeding you with. Remember this!
And then you do another sweep through the neural network with it.
But then how does that work in the context of building out a model around, say, like, I mean, I don't even know in an MRI how many images are created or lines are monitored, but like, say you have 10,000 images, but you want to create a combination of a hand-coding algorithm and machine learning system.
How do you go about tagging all that stuff?
What do you mean exactly? How are you going to—
So what I understand you're saying is like, yep, giving it more data than just like the original source material, yes?
And so how do you do that at larger scale?
Ah, computationally, you mean?
Yeah, okay.
So computationally, we're doing this in a sequential manner. So we're not—so you can do it in different ways, but in a sequential manner means that you're not feeding it to 10,000 images at the same time, but you're doing it bit by bit and you're adapting your objective towards this.
Okay, another thing about computational performance is also, of course, that the optimization that is underlying... but this is not just the problem that we have, that, you know, neural networks have in general is that you do not necessarily need to solve your optimization problem, your training exactly.
And maybe sometimes, or most of the time, you actually don't want it, want to solve it exactly because you only have a finite amount of training examples.
And so, when you think about what these neural networks are doing, they're trying to minimize the loss over the training examples that you have. But this loss is only an approximation of many, many, many more images that you want your neural network to work right for.
And so, very often, you do not want to solve that exactly; you don't want to minimize your loss exactly for this training set.
Gotcha, okay.
Okay, so, but let's say you have 10,000 of these images that you both know the clean and noisy image. If you would perfectly fit to this training set, if you would perfectly minimize this loss function, you could think—and again, you know, people are not really understanding this, and I also don't really understand this.
But conceptually, the idea is what you actually want to minimize is not the loss just over the training set, but it's the loss over an infinite amount of images, which you then want to denoise.
Right?
Okay, but you don't have all these infinite amount of images. So why would you want to very accurately minimize the loss over this finite amount of images?
Maybe you don't! Maybe you only approximately want such that you still have freedom, right? Such that it could be optimal also for more images that you don't have.
Is so… in other words, you get—you could train it on the wrong thing, and it could only work for, you know, like denoising photos of apple trees.
Exactly! Yeah!
And then you're in the same place that you were in.
Yeah, exactly.
Okay, so the idea is if you only do it approximately, you might be able to generalize it more, but all of this really—I mean there are some attempts to understand this, but all of this is not really, I’m hand waving here because I can't really say anything mathematically about that.
But have you pushed your research into practical applications at this point? Like are you working with, you know, companies or student groups or anyone else?
So my collaborations are actually with people in academia but from other disciplines. So we have been collaborating a lot in recent years with people in the hospital and the University Hospital in Cambridge—so with clinicians and medical physicists.
Different types of applications, you know, one of the things I said before is that I got more and more interested in these problems where you don't measure an image directly but only indirectly via these X-rays, for instance.
So developing algorithms which can get the most out of a very limited amount of data, the most out of it in terms of very high resolution images, is something we have been collaborating a lot with people in magnetic resonance tomography, in particular in the N Brooke's Hospital, which is the local Cambridge Hospital here.
But also with people in chemical engineering, where one of the driving factors for people in chemical engineering is, for instance, there is a group here, which is the magnetic resonance research center, where they look in particular at processes through which are dynamic.
So they have these tubes filled with water, and then they pump certain things through, and they want to understand what the dynamics of this process are.
So now if you think about not just having a static three-dimensional object but having something that changes over time as well.
And now thinking back about how many x-rays—not in magnetic resonance tomography, DS aren't X-rays—but just going back so not sending through as many x-rays means you don't have a lot of data to reconstruct.
Which now, if you want to track something dynamically, also means you're not measuring a lot per time step. If you want to have a very high-resolution, over time it means per timestamp you can't acquire as much data as if you would have, you know, if you just have one second for reconstructing your organ inside the body, uh-huh at this particular timestamp.
And then the organ is moving again, and you need to go to the next timestamp and so on—you have less data for reconstructing each timestamp as if you would have a static object, and you would have 10 seconds to acquire this instead of one second.
You can measure much more!
Right, right. And then you reconstruct just one image, but now we have maybe you want to reconstruct not just one image in 10 seconds but 10 images because we want to see something evolving over time.
Right. So here also the challenges are along these lines of getting high resolution out of limited data.
Not a thing which is not connected to indirect measurements so much, then, than these applications in magnetic resonance imaging.
Gotcha.
We have collaborations with people in plant sciences, for instance. So they are interested in monitoring forest health or forest constituencies, let's say, from airborne imaging data. So they fly—mostly in my collaboration they fly, so not so much satellite, but more flying—they fly over forest regions and then they require different types of imaging data.
They acquire just photographs, okay? Aerial photographs, hyperspectral imaging data, or multispectral imaging data, which means you do not only have RGB, but you have a broader range; you cover a broader range over the light spectrum.
So also the invisible light—so you don't have just three channels; sure you have 200 channels.
Yeah, and hyperspectral imaging is interesting because the spectral component that you get from these measurements gives you an idea of what the material properties are of these trees.
So it tells you something about what really—yeah, so this is the spectral component tells you something about the material that you are looking at.
So in other words, like the light spectrum of how they reflect light back—they have a different signature in the light spectrum, okay?
And so the intent would be to figure out, you know, say for instance, like an invasive tree that was taking over an area, they could figure that out by just by flying right over it.
Gotcha, okay.
And then the other thing—so this is one of the two aerial photographs and hyperspectral imaging.
And then the third thing that they often acquire are lidar measurements, yeah, where you do not just get kind of a planar picture of the trees but you actually get a 3D model of the trees.
Yeah, I was just watching a documentary about that, about searching for Myan ruins with lidar. Flying, I really like flying over the Yucatan Peninsula or something, essentially like saying it would take 20 years for an archaeologist to like dig around in the dirt, or you just fly over and look for the hard stuff, and let's see what happens.
Yeah, yeah, very interesting.
And our people are also looking into this in the context of, you know, for instance, like denoising camera footage from anything, you know, like security, on one hand.
Yeah, I haven't done so much work in that myself, but there are, of course, you know, the—I mean, CCTV cameras are everywhere.
Yeah, I mean, it's like, it's kind of a terrifying output of figuring out this research, right? Like being tracked everywhere.
Like in the UK in particular, like I imagine people are looking to do this, right?
You know, it's quite funny because when you think about these crime TV shows, CSI, whatever, Miami or whatever, they're always these—so you told you have very pixelated image and you press a magic button, and man, and you can zoom in, you can see everything!
So when you think it's, oh, this is ridiculous, of course you can't do that. But you can do it now, maybe! You know, if you have all these machine learning methods which have learned to look at just pixels and then know what is the motive, what is a very probable match in terms of high resolution, maybe at some point you can do it.
But then you don't know—haha!—if you're right or wrong!
Right?
Just, just by chance, I was reading a New Yorker article from I think 2010 about this guy in Montreal, allegedly finding 500-year-old fingerprints using different kinds of, like, spectral photography.
So I don't want to give away the whole thing about it, and then there was an ensuing lawsuit, actually, from him to The New Yorker saying they like, it was libel.
But basically, what happens was, like he was accused of faking these fingerprints that may or may not—oh, yeah, and like copying them from a real one, duplicating them onto the back using like proprietary methods to find them out.
But you are interested in doing it, whether or not it's legit, like you want to do it.
You want to work! So, I mean, I'm going to tell people that it's fake!
Yeah!
What direction are you going with, with art?
So, it kind of—in Cambridge, it’s not.
Well, okay, let me say a bit more. So when I, again, during my PhD in Vienna, there was a collaboration that we had with physical conservators who were looking at particular wall frescoes, at frescoes in an old apartment in the city center of Vienna, which are called the Night Hut frescoes.
I'm not going to go more into detail, but they were in the process of restoration, these frescoes.
And so that was my first-hand experience there, and there the idea was that, you know, it takes them a long time to physically restore these wall paintings and once you have restored it, there is no way back, right? You need to decide what to do.
Yeah, because then it sticks! And so what our idea was to help them by creating a virtual template of how the restoration could look if they do this or that.
Right?
Yeah, so because the important part is a fresco is actually part of the wall chemically; it's not paint!
Exactly, yeah, exactly!
But even with paintings, you know, if you do something, if you do if you manually really, you know, physically restore them, yeah, you've done it.
I mean you can still maybe, you know, try to do—I mean your treat, your—you're just, well, you're changing a historical piece, right? Of the world's price!
I mean, this is, yeah, yeah.
Anyway, so coming here to Cambridge, I got to know people in the Fitzwilliam Museum, which is the museum here in Cambridge, and they're interested in illuminated manuscripts.
So I met a very good colleague of mine who's the keeper of manuscripts in the Fitzwilliam Museum, got interested in this idea of virtual restoration because illuminated manuscripts are so fragile that the culture is, you never physically restore them; you never physically restore them.
And they, you know, if they get damaged or altered over time, you leave it!
Wow, okay! You leave them like this!
And so there, the idea was couldn't we create a virtual restoration?
And you know, kind of exhibit the original manuscript and the virtual restoration next to each other.
And so last year, there was an exhibition in the Fitzwilliam Museum, which was called Color, and in this exhibition we had one piece—this was a page of an illuminated manuscript which had been altered over time, actually manually overpainted.
Okay?
And what we did was that we exhibited the manuscript and next to it the virtual restoration where we took off the overpaint.
And that has led to other things, but I mean, this is kind of the idea, that you don't physically change something, but you, you virtually do it, which is, you know, nothing damaged—you just, yeah, virtually create a digital copy of this manuscript and you play around with it.
So you're not only going like back in time to see maybe like restoring it to its original, you know, vitality—like its original color—but you're actually like going deeper into the layers like this is an agent over.
Yeah, yeah, go further in, yeah, with imaging!
And then you kind of apply everything you might already...
Yeah, wow!
So if someone's really excited about this kind of research, if they want to get into it, what would you point them to? Where should they get started?
Depends what their background is.
Okay, yeah. Say they have like, you know, they have a CS degree, they're interested in imaging, so they're like technical, but they haven't done anything in particular like in this field.
Okay, so I think in particular when you think about the US, I think some of the cool things that came out of imaging, image processing in the last couple of years were from UCLA.
So if you look at some of the applied math faculty there and some of the online lecture material, or, you know, YouTube videos of some of their talks, I think that would be a good source to look at.
So I mean, very classically, names are, Stan Osher, and Andrea Bertozzi I mentioned, Malik from Pomona, Stefano Soatto—there are lots of people, there’s not a name is Casey. I can tell you a few more things afterward, but I think just to look for mathematical approaches to image processing, I think would be the first thing I would do.
There are very good introductory books to look at that explain a bit of the basics. Great.
But yeah, I would first start reading a little bit in these more general foundational books and then I think just citing from that you immediately come go to the more modern recent year research.
I think that would be a good way just not—you can catch up to you, maybe, or apply here.
Awesome, well, thank you so much!
Thanks for making time!
Yeah, thanks!