集微网消息，在往期的集微访谈栏目中，爱集微有幸采访了波特兰州立大学计算机科学系助理教授Atul Ingle。他专注于计算成像、计算机视觉和信号处理领域。目前研究涉及单光子图像传感器的成像硬件和算法的协同设计，是Image Sensors World作者之一。集微访谈就关于多像素、小像素、HDR技术、Spat摄像头以及图像传感器的前景等方面提出了一系列问题，并收到了十分有启发的答复。
答：这是个好问题。如果你看一下比赛中各种各样的代码，它们看起来仍然很像拜尔排列，因为它们仍有类似的 RGBG 排布。但目前他们所做的是，将一些像素分组，并将相同的滤色片放在一组像素里。
问：但是其他需要HDR处理的传感器，比如S5Cage、小米11曾使用过。有额外要求的应用传感器，如nigerian联想来处理他们的罢工SDR图像已经捕获。而且我相信San francis，它可以提高有效的电源效率。所以在未来，计算照片这块的处理将由芯片上的图像传感器isp或独立应用来完成，就像jane 2。为什么呢？
答：对于HDR，在一般情况下，对于任何计算性的摄影算法，你可以想象以几种不同的方式实现它。你可以完全交由后期处理来实现，先捕捉所有的图像数据，然后进行处理，甚至在智能手机的处理单元和内存上运行，或者也可以选择在另一个极端实现，在光线照射到传感器时就开始处理数据，做一些他们称之为像素内的处理。这是两个极端，也有在介于两者之间的处理方法。现在，一般来说，把处理移到图像传感器上越近，好处越多，因为这样你可以更快地处理东西，通常来说延迟更低。它在处理噪声、运动和其他伪像方面更加强大。当你在本地RAW域中处理这些信息时，（处理流程）要尽可能地接近图像传感器。但这也有一个缺点，会丧失一定的灵活性。比方说，明天我想为HDR发布一些其他精细调整和优化的算法。如果它在信号处理链的后期实现，对我来说更容易做到这一点。给别人的应用程序发送更新要比让他们更新整个固件——例如更改 ISP 或应用处理器中的一些底层代码——要容易得多。我认为许多计算性的摄影算法，包括HDR已经相对标准了，我想大多数人都会把其中的许多算法视为他们智能手机上的默认设置而已。
问：我的下一个问题是关于innocence的市场，因为在索尼领域，在2022年的业务部门简报会上，索尼半导体解决方案公司的首席执行官表示，到2024年，手机的静态图像预计将超过可更换镜头摄像头的图像质量。巧合的是，mac lobby领导的团队为谷歌像素的智能手机开发了hdr +技术。也有人提出了类似的观点，此举有可能取代DSAR，你怎么看这个观点?从innocence的角度和计算摄影等软件的角度来看，移动成像有哪些有前景的发展方向可以实现这一目标?
Q：Can you briefly talk about CFA technology selection and sampling method selection?
A：I think partly it's a bit of a legacy issue for sure. I think because our rgbg has become such a popular thing and it's been around for so long. Most people who work on image sensors and downstream algorithms are familiar with how to deal with that type of data. If someone comes around with a completely over the top new pattern, then you need to design basically a new low-level algorithm that takes that raw data and makes an image. I mean that's just way too much work. And perhaps the amount of gain that you get in the end might not be that much in the overall grand scheme of things. Although I think there are some other patterns where especially for low light imaging, there are patterns where in addition to the rgb you also have one that's just capturing monochrome, just like a white channel if you want to call it that way.
So there may be some specific applications where some non-standard non Bayer patterns make sense. But in general, just because of the legacy issue, and just because maybe there isn't enough gain in terms of image quality that is to be had, there is a high cost involved in completely changing your low-level processing algorithm to deal with a new type of pattern. It may not make sense, in general, to use a non rgbg pattern.
Q：Why do more and more mobile sensors in today's are using the unique cfa is called a multi pixels, such as the Quarter by Sony and non-sale or Nonapixel from the Samsung and for pixel from OmniVision instead of the Bayer in recent years?
A：That's a great question. If you look at the multi sale code from the race, they still look a lot like the Bayer pattern, because they still have the rgb and g roughly speaking. But what they are doing is they are now grouping some of the pixels together and they're putting the same color filter on the group of pixels.
Now, the reason for that is for smartphones and mobile limit sensors, there is a constant push towards making their pixel size is really small for the mobile phone market. That's just because you want your phones to be small. You want the image sensor to be really small, but you still want the high-resolution image, right? The only way you can do that is you have to make the pixel smaller.
Now, if you make the pixel smaller, each pixel will collect less noise. And if it collects less noise, then you have to deal with noises. There's just no way around it. There is that physical limitation where if the pixel is smaller than there will be night in your final image. So the way you can deal with that noise is you do what's called binning where you combine a bunch of pixels together. Right? That's a way to boost the amount of a signal that you get back.
Now, if you were to do this process with the standard rgbg Bayer pattern, then you will essentially be combining pixels that have different filters on them. I think that's why there is a push towards using these groups of pixels where they have the same color filter on them. And then it makes more sense to combine them that way. In the end, it's just a trade-off where you're improving your image quality. You're reducing the amount of night by paying a small amount of penalty in terms of the spatial resolution.
Q：Can you briefly talk about the combined sampling process of small pixels?
A：In terms of the final limits quality, it's just like a multi factor optimization problem. It becomes pretty complicated because you have so many other parameters that are playing a role in the final image quality. But with the multi filter cfa is another thing to remember is that you're doing the winning in the analog domain. I think that's the right way to do it. If you do the beginning after digitalization, then you've already paid the penalty because the a to d converter has already added some noise in there, right?
So then you lose the amount of gain that you could potentially get. If you combine that signal while you were in the analog domain. And I think that's where there may be some perceptible improvement. If you do it in that analog domain, as opposed to doing it digitally later on.
Q：So my next question is about the small pixels you have already mentioned. So why are the single dimension sensors manufacturers are so keen on the small pixel sensors, such as Sony has developed the industry first 0.8 microns. And the Samsung has developed industry first 0.9 and 0.7 and 0.64 microns. And OmniVision has developed the industry for 0.61 microns. So what are the advantage of the small pixels?
A：I mean as I mentioned before, there is a constant push towards making the pixel size smaller for the smartphone mobile phone market. Because there is just no other way of increasing image sensor resolution while also making your image sensor smaller. You have to physically fit all of those pixels in the same amount of even smaller amount of physical area, while also allowing high resolution. So these are conflicting requirements, and it's physically impossible to do it any other way. You have to make the pixel smaller. It will take an example. Let's say you have some still camera or like a video camera, right? The size of the image sensor is not as big a constraint. So you can have an image sensor, which is maybe like 2 centimeters by 1 centimeter. You've got a large enough area, hundreds of million square millimeter, where you can collect enough light even with a mega pixel sensor. But compare that with the size of the sensor that you have to put on your smartphone. It's got maybe 10 square millimeter of area, which is a small fraction of how much area you have on a video camera or a still camera.
But then you pay the penalty of collecting less light. You have to deal with that one way or another. Now, one way, one big challenge with these small pixels is that because the pixels are getting so much smaller, the amount of charge that they can collect is also much smaller. Right? So the full well capacity of these pixels is also much smaller. That limits the native dynamic range that you can get from these small pixels. But there's really no other way. I mean if you want to maintain that high resolution in the small amount of area, those are some of the challenges that will have to deal with computational in post processing or playing some other tricks in hardware.
For example, the winning idea that we just talked about. Okay?
So I my expertise is not too much on like the low level hardware details. So I can't really go into too much detail on that. But I can certainly see that if your pixels are just so small, then it will be a challenge to fit the other processing circuitry that needs to go within each pixel. Although I feel like with some of the recent advances in three d stacking that could help alleviate some of those problems where you could fit that circuitry in a second layer or maybe even a third layer. But you're right. So it limits how much you can do in inside. Each pixel of the pixel itself is so small.
That's also not easy to do, because if what I have read is some of the companies have some really special expertise in terms of aligning these are different wafers together down to like micron level precision, which is really important if your pixel sizes are so small. It's certainly true that type of technology has also benefited pixels that are slightly larger.
Q：Alright. Let's move on the next section about that. My next section is about the on chip HDR pictures or the logic chip of the image sensor. So why are the more and more mobile signal circumstances support on chip HDR features used to be our image sensors of mobile phone. It just take image. So today's more and more image sensors of mobile instances have some like the staggered HDRs or the dual confess again, HDR such as the technology.
So why did more and more image sensors of supports that features?
A：So there is strong demand for better and better image quality on smartphones for consumer applications. Right? One way you can do is HDR imaging is the traditional way where you capture a bunch of images, and then you apply some post processing algorithm that could run on your smartphone using the cpu or the ram on your phone.
But there are several down sides to doing this kind of traditional post processing approach. One is latency, because if you're doing it in post processing, you're paying the extra penalty of transferring the image data over to either your cell phone memory or some intermediate processing, compute module, which does the HDR algorithm. The other downside is noise. It's always advantageous to run your algorithms as close to the image sensor in the native data format as possible. Because all the subsequent steps you apply to it will introduce some noise and some artifacts in that signal processing chain. The third downside to doing is post processing approach is that we won't be able to deal with motion artifacts, if it's also connects back to the latency issue where if something in the scene has moved, while you're capturing these multiple images, if you're not doing the running the HDR algorithm, fast enough, if your frames are captured after too much of a delay, then you have to deal with that motion in post processing.
And that is often very difficult to do. Then you'll have to live with some motion artifacts like ghosting in your final HDR image. That is increasingly not acceptable even to an average consumer these days, right? We have all gotten used to very good image quality on our phones. I think this is the main reason why many of the image sensor manufacturers are now pushing towards implementing these HDR features as close to the image sensor as possible. And I guess it's not a very new trend, because if you trace back these pixel designs where you had a finite full well capacity, but then the overflow charge used to be captured in like a bucket on the side that has been around since like the early 2000s.
And then more recently, there are these rolling shutter sensors where they use some kind of interlacing schemes where different rows use different exposure settings when the data is being captured. And then the post processing HDR algorithm is able to generate a single high dynamic range image while accounting for those exposure setting variations between either the different rows or different groups of pixels on your image sensor and so on. In the end, it's just making sure that there are no other motion or ghosting artifacts, because all of that data was captured simultaneously at the same time with respect to the scene, right?
At the end of the day, all of these techniques are just making a tradeoff. The tradeoff is between the final spatial resolution that you can get and the dynamic range.
Q：But others required hr processing out of the sensors, such as the s5cage, which is Samsung and which is using in the xiao mi 11 ultra require additional ap sensors, such as nigerian associates to process their strike sdr image has captured. And I believe San francis, it can improve the efficient power efficiency. So in the future, the processing of the computational photograph piece will be completed by the image sensor on chip isp or by the independent aps, like the jane 2. And why?
A：For HDR or even in general, for any computational photography algorithm, right? You can imagine implementing it in a few different ways. You could either implement it as purely post processing where you capture all the image data, and then you deal with it later on. Maybe even as late as running it on the processing unit and the ram of your smartphone, or you could run it on the extreme end where you start processing the data as soon as the light is hitting the sensor, you do some in pixel processing as they call it. These are the two extremes, and then there is everything in between. Now, in general, I believe the closer you move your processing to the image sensor, there are some advantages to it, because then you can process things much faster. So there's low latency. In general, it is more robust to deal with noise and motion and other image artifacts. When you deal with that information in the native raw data domain, which is as close to the image sensor as possible. But then there is a downside where you lose some amount of flexibility, right? Let's say tomorrow I want to release some other finely tuned, optimized algorithm for HDR. It's way easier for me to do that if it's implemented later in the signal processing chain, right? It's much easier to send someone an update for their app than having them to do, like a form where update that has to change some low-level code in their isp or ap. I think many computational photography algorithms, including HDR have become so standard that I think most people would just consider many of these as just default settings on their smartphones.
And so for such algorithms, it does make sense to implement them on chip isp in a very heavily optimized way. Once it's there, you don't really have to touch it or change it that much. The final answer to your question, I that dichotomy between is ips or these independent application processors, I think that the devil is in the detail there, because there may be many application, specific requirements and many different tradeoffs that you have to consider on a case by case basis. It may make sense to implement that a on a dedicated isp that just does one thing, and does it really well or it may make sense to have a bit of flexibility, and then you run it on an application, specific processor, where HDR is not the only thing it does many other things.
Q：My next question is about the market of the innocence, because in sony space, in business segment briefings in 2022, president of ceo of the sony semiconductor solution corporation says that in 2024, the still image of the mobile phone are expected to exceed interchangeable lens camera image quality. And coincidentally, the mac lobby who lead a team to develop the hdr take hdr plus technology for google pixel smartphones. It's also posed a similar opinion that the move on have potential to replace the dsars what do you think of such a point of view? And what are the promising development directions in mobile imaging that can achieve this goat from the perspective of the innocence and the software such as computational photography?
A：Yeah. So that's a very interesting tape. And I think in some ways, they are being bold in making that statement. But then again, it's true to some extent, right? i'm just an amateur photography person. So if you consider my photography skills, that statement is already true, like my smartphone takes much better photos than what I can take with my ds in our camera right now, because i'm just not a good photographer, right?
There are so many amateurs out there where it's already true for them, and they're just taking pictures in like not eliminated conditions with really no challenging scene conditions. Most smartphones are taking, if you look at a higher smartphone today, it's already taking much better photos than a mid or low nds a lot. So to some extent that statement is already true. But I think if I want to so if you want my honest opinion on that statement would be I would make a more qualified statement instead of such a broad brush.
So I would probably say something like maybe in 5 years from now. The image quality of like a really high and smartphone with good quality optics and a good quality image sensor will be better than, I say, a low end interchangeable lens camera. So I think that would be my torn down version of what some of these companies are saying today, right? And I think the developments that will achieve that goal, they will come from both sides, both the hardware and the algorithms.
Again, this may be a slightly biased opinion, because I work in computational photography and computational imaging. But I think it's true because we need to optimize both of these things together, in some sense, to keep improving the image quality that we can get with these small size image sensors.
Q：Do you can tell us more about the computational photography technology? Because we found it's for the smartphone side, it's very important technology, I believe in the smartphone imaging technologies. And because you don't need to expand your exercise and a large, expensive ccms. And you can improve your image quality rapidly. Can you tell us more about the future of the such as the competition, photography, the organism and some development directions.
A：So I think one exciting direction in the future, from my perspective is image sensors that are capturing individual photons of light, these sensors that have extremely high sensitivity.
So I would put them under the umbrella term of single photon image sensors. How you actually implement it could be up for debate. I think spiders are one way to do it, and they have shown great promise and very rapid development in the last 5 or 10 years. On the hardware side, the resolutions of these lens based image sensors is increasing quite rapidly. But facts are not the only way of doing single photon imaging. There are other techniques. There's quantum and sensors. There is even traditional cis based image sensors, like the one we discussed earlier with the small pixel cyrus. Those are extremely sensitive, right? They may not be single food and sensitive, but there are still, they have some electron with single electron or even sub electron read noise. They are able to distinguish between just a handful of photons.
That kind of extreme sensitivity levels gives you a completely a different way of capturing scene information, because it captures scene information at the finest granularity that you can never capture. Because there is no other physically possible information that you can capture beyond the photon. In some sense, these image sensor have the ability to capture scene information to a point where there is no additional information that can be captured about the scene.
So now, in the end, you have to then couple this with smart computation to like, actually extract that scene information from this raw data. I think that's where that tight coupling between the hardware and the raw data. And then the downstream computation will play a huge role in terms of the computational photography, computational imaging advances that we need there.
In addition to single photon imaging, there are also advances in the optics side where there's a lot of work going on, designing cameras that don't have lenses. So lensless imaging or designing cameras with some more unconventional lens designs like use of meta optics. So the image quality that we get right now from these kinds of techniques is still not quite close to what we can get with the lens based camera.
So I think I won't really put a date or a year on when a lensless camera will be available in the market, but I guess it's not too far in the future. But once again, computation will play a very a huge role in image capture. If we start looking at these lensless or meta optics based unconventional, lens based sensing methods, in fact, the design of these meta lenses themselves will have to be done in a joint fashion together with the image sensor and the image reconstruction algorithm. And I think that's an exciting future direction. And there is a ton of work that's already happening in that direction.
Q：I have read an article, so maybe but here in a blog, in a sense, the world and they said the using the single proton sensors. So it can it can produce to fuse the imaging to the original rgb senses to improve the dynamic range. Can you talk more about that? Because I find the single processor is very promising directions. And it's very sensitive. It can sense only one proton. So that means it will be very sigma in low light. Performance will very well in low light, because it's can't count the number of the protons. So it seems like it's have very large the full of capacity, because it doesn't need the physical capacity. It will have a very high dynamic range. In nowadays, I think the space center has limited by its resolution. It was not very great. So do you have something comment about that?
A：So that's a great. As you mentioned, spat sensors, even though they have this single photon sensitivity, it may seem like they're good for low light, but they're actually also really good for extremely high light levels, because of the way they capture photons, they give you compression of dynamic range for free in some sense, like the pixel itself. As you said, because there is no well being filled out. In theory, there is no upper limit to how many photos you can count. In the end, you'll be limited by the speed of your readout, but it's like a slowly approaching limit. Like you, in theory, you never really get there. You get this extremely high dynamic range, but because spat sensors are so new compared to conventional cmos image sensors, their spatial resolution and their few factors and their photon detection efficiencies are still much lower than cmos image sensors.
There have been some recent news articles about some companies releasing like mega pixel resolution, spark sensors. So that direction is looking quite promising. We are still quite far away from having like a 10 or 20 mega pixel spat camera with like 90 % fill factor. But that is something you can get with a cmos camera.
Now, if you want high resolution, and if you also want extremely high dynamic range, I think it makes perfect sense to combine the best of both words and have high dynamic range scene information come in from a spat based sensor and extract a high spatial resolutions from a cmos image sensor. And you can combine these two streams of information and get the best of both words you have high dynamic range and high resolution.
The other reason why I think it's promising is because there are phones out there today that already have image sensor modules with multiple cameras on it, where one of them is a cmos camera. One of them is a spat camera that's currently being used for three d imaging for a light out like application. But it goes to show that you can, in fact, have a multi camera module where you can combine different types of image sensors. In this case, the spat and the cmos camera in a single module. And it's then up to the downstream computation to combine that information in a smart way to give you the information that it's capturing.
Q:Talking about the spec cameras, as you have already said, what's the technical challenge of the spec cameras or in the futures is have some development directions? I want to ask that because we have already talked about the resolutions. I believe you have already mentioned that camera has the readout time or a readout speed issue. So can you just expand that?
A：So i don't think spark cameras have read out issues per se, but the raw data that this spat camera will essentially contain a spike for every almost every photon that arrives on every pixel, right? The volume of the raw data that will come out or come out of this spat camera will be way too big to download off of the sensor and deal with in post processing. Right? So for spark cameras, I think it is absolutely essential that we come up with smart methods for doing some kind of in pixel processing, where we extract some information as quickly as possible, so that we don't have to transfer all of this photon data of the image sensor, because that's going to require way too much bandwidth and will also consume a lot of power.
But that is possible. I you really don't need to have send every single photo on off to your isp or ap, you can extract some low level information inside each pixel itself, or maybe by doing some groups of pixel processing, and then send that information off to the isp or the ap and there are companies out there today that are already working on some really exciting applications in the real world for just passive imaging using spark cameras, even without fusing it with cmos cameras. And it does make sense to do that in some challenging scenarios.
For example, if you have extremely low light, then it absolutely makes sense to have a single photon sensitivity there. And if you combine low light with high speed motion, that's really where conventional cmos cameras do struggle to give a good scene information. That's where spat cameras really have an advantage.
Q:That point of view is very instructive. So my next question is, as the performance of logic chip of stack modes make sense to become more and more powerful. Because the processing node of the larger chip is more and more advanced used to be, I believe, using the 65 nm nodes and today's in tsmc just falls Sony image sensors.
The logic chip didn't have produced some Sony image sensors logic wafer. They have advanced to the 22 nm node. So the larger chip of the stacking methods to become more and more powerful. Well, the images sensors have more and more functions other than just imaging in the future, such as Sony IMX500 AI process building and which have it can do some edge air processing like the face recognition such as that.
A：And I think this trend will continue to grow. I feel pretty confident about that, because there are so many applications of image sensors where the image itself is not the end goal. It's just the starting point for some kind of smart processing algorithm that needs to extract some kind of meaningful, higher-level information from these images. Right? Think of all the advances we are seeing in a computer vision these days, things that we can do with these learning based data driven approaches that use deep neural networks on large image data sets. Right? For such applications, the image is just the starting point. The image is not the end goal. You could imagine so many applications where, for example, let's take the application of a robot trying to navigate through a building, right? It doesn't really matter if I show aesthetically pleasing like really beautiful looking images to that robot, just doesn't matter.
All that matters in the end is did the robot avoid making collisions? And if it get to the destination quickly and smoothly, right? And maybe beautiful images, nice looking images are not needed for achieving this task. Then there is tons of industrial automation tasks where you need to run one very specific thing in a very controlled environment. Let's say you're doing object recognition for a very controlled set of objects. It is going to be just one out of these 10 or hundred objects. Again, in such cases, it totally makes sense to just run that object recognition or object detection algorithm right there on the image sensor. Right? There is no need to download all these images and then send them off to some compute module that's downstream. Makes perfect sense to run them on the image sensor. In that way, you also avoid extra latency and bandwidth cost of transferring these images over to a host machine. You can potentially run these even faster, much higher frames per second.
Overall, I think this trend of integrating machine learning algorithms on the image sensor will keep expanding and will be much bigger than what it is right now. Okay.
Q:Or the hyper running. So that's such some new technologies. And in the image inside and in mobile instances, it seems like you just already mentioned three d transistor from sony, also, based on the hyper running, because you need to transfer that, you transist output some signals. You must transfer to the next level transistors. So you need the hybrid running such as that. Is that have any new direction? I remember that you have mentions about the meta lens.
You can expand more tell us more details about the methods because I found that it's very promising, also promising in the mobile phone because it can reduce the size of the compact camera modules, because nowadays, the smartphone cameras has more and more larger sensor size and they required more and more lens. You can see a giant bump in the smartphone. So if we have the methods of the lenses cameras technologies, maybe we can using a large sensors and still fit in the small form factors of the smartphone. So can you tell us more about the meta lenses or the lenses camera?
A：I think it's certainly a really promising future direction, but the type of image quality that you get from meta optic space cameras today. It's still quite far away from the type of image quality that a smartphone consumer will expect to see like a lens based camera still gives you much higher image quality and the image reconstruction algorithm themselves for all these lensless emerging techniques.
They can be more complicated than conventional lens based cameras, because you're not capturing the scene information through a lens anymore. The raw data itself is mixing together the light signals that are coming from many different parts of the scene. The raw data itself does not look like the image at all. Now, the onus is on the downstream computational algorithm to unmix this information, if you will, and produce a final nice looking image. Right? So the computation can be much heavier in that case to produce the final image, but that I think that could be other applications.
As I was mentioning earlier, there are applications where the final image is not the main goal. Those applications could be interesting again for lensless cameras. one thing that comes to my mind is there are cameras on smartphones today that are essentially just doing security related things like face id or something like that. Maybe a lensless camera might be okay for that kind of application, right? You are looking for is some kind of unique signature for that person's face.
And then maybe you don't need to actually take a high resolution, a nice looking image of that person. There's also a lot of work in these privacy preserving algorithms that are using lensless methods, basically, right? There is no way to invert that data and figure out who the person was. But you're still able to do some computer vision tasks on that data. Like you can maybe figure out where the person is or what other objects there might be around it. You could do some kind of object recognition or pose recognition. You could figure out if the person is waving their hands or clapping or some other action recognition, kind of application. So I think those could be interesting applications for meta lens and lensless cameras.