Space Variant Imaging
Center for Perceptual Systems
The University of Texas at Austin
1. What is foveated imaging?
Foveated imaging refers to the creation and display of static or video imagery where the resolution varies across the image. The highest resolution region is called the foveation region. Typically, the viewer has dynamic control of the location of the foveation region. It is possible to have more than one foveation region.2. What is the value of foveated imaging?
The primary value of foveated imaging is in image compression: high resolution information is only transmitted in the regions of the image that are selected as important by the viewer. Foveated imaging exploits the fact that the resolution of the human visual system declines away from the direction of gaze; it is only necessary to transmit fine detail in the direction of gaze; humans cannot perceive fine detail away from the direction of gaze.
Foveated imaging can also be used to highlight regions of an image. This can be useful because the viewer's gaze tends to be drawn to high resolution regions of the image.
Foveated imaging can also be used in vision research. For example, it can be used in combination with eye tracking to precisely control (in real time) the spatial information available across the retina of the eye. This makes foveated imaging a potentially valuable tool for analyzing the contributions of different retinal regions to task performance.3. How does your foveated imaging software work?
It is a bit technical, but here is the basic idea of how the CPS foveated imaging software works. There is a foveation encoder and a foveation decoder.
Foveation Encoder: First the image is encoded into a low-pass pyramid. That is, the original image is low-pass filtered (slightly blurred) and then down-sampled to create a second image with half the resolution of the original in each direction. This half-resolution image is then low-passed filtered and down sampled to create a third image with one quarter the resolution of the original in each direction. This process is repeated until a low-pass pyramid of typically 5 or 6 images is obtained. Second, regions are selected from each image in the low-pass pyramid to create a foveation pyramid. Specifically, from the original image a region is selected for the foveation region; from the half-resolution image a region is picked for the first ring around the foveation region; from the quarter-resolution image a region is picked for the second ring around foveation region; and so on. Of course, this foveation pyramid is itself just a collection of small images. The foveation pyramid is the output of the encoder. The user has a great deal of control over how the resolution changes across the image, and the user can select to receive the output either as separate images or packed into a single larger image. Typically the total number of pixels in the foveation pyramid is far less than in the original image.
Foveation Decoder: The foveation pyramid is decoded into a displayable image. Specifically, the low-pass images are up-sampled, interpolated and blended to create a smoothly foveated displayable image.4. What is the real-time speed of your foveated imaging software?
The speed of encoding and decoding depends upon image size and the degree of foveation, the smaller the image and greater the degree of foveation the faster the encoding and decoding. For 352 x 288 24 bit color images the encoding runs at approximately 190 frames/sec and decoding at approximately 52 frames/sec on a 400 MHz PC with YUV to RGB hardware conversion. For 704 x 576 24 bit color images, the encoding runs at approximately 46 frames/sec and decoding at approximately 25 frames/sec.5. How much image compression can be obtained with foveated imaging?
There is not a simple answer to this question. Foveated imaging is a form of lossy compression. Therefore, the amount of compression can be arbitrarily high depending upon how much loss of visual quality is acceptable. The faster the fall off in resolution from the direction of gaze the greater the compression.6. How does image compression obtained with foveated imaging compare to more conventional forms of compression such as JPEG or MPEG?
Foveated imaging does not compete with other forms of compression but layers on top of them. Our foveated imaging software is completely compatible with virtually all other forms of image compression. Recall that the output of the foveation encoder is a small collection of images (or a single combined image), where the total number of pixels is a small faction of those in the original image. These output images can be passed onto any other image encoder/decoder. For MPEG/H.263 (and our own custom video compression software) we have demonstrated that foveation often increases the compression by a multiplicative factor. For example, if the foveated imaging produces a compression factor of 3 and MPEG produces a compression factor of 100 then putting the two together produces a compression factor of approximately 300.7. Is foveated imaging always beneficial when used in combination with other forms image compression?
In most situations foveated imaging provides a substantial multiplicative increase in compression. However, as might be expected, foveated imaging is most effective when it eliminates data bits that would not be eliminated by the other image compression procedure. If the only fine detail in the original image is located in the foveation region, then obviously foveation will not provide any additional compression (nor any loss of image quality). Similarly, if the only motion in an video sequence occurs in the foveation region then foveation will add little compression when combined with a compression procedure which uses motion compensation (except for the first frame and other interframes where motion compensation is not applied).8. How does foveation affect the speed of subsequent image processing?
Foveation increases the speed of subsequent image processing. This is a simple consequence of the fact that the number of pixels in the foveation pyramid is generally a small fraction of that in the input image. In fact, because the foveation pyramid computes so quickly, it is generally faster to apply foveation encoding plus another image processing procedure than to apply the other image processing procedure alone. For example, applying foveation encoding followed by software MPEG encoding is often several times faster than applying MPEG alone. We have developed our own real-time software video coder (similar to MPEG) which runs at very usable frame rates, for moderate size images and a moderate degree of foveation.9. Is it possible to obtain perceptually lossless compression with foveated imaging?
Perceptually lossless compression is possible with foveated imaging. In general, the greater the resolution of the original image, the greater the compression factor that can be obtained with foveated imaging, while maintaining a perceptually lossless image. A 1024 x 768 image can typically be compressed by a factor of 3-5 without visible loss, when the foveation region is centered on the direction of gaze.10. How can you keep the foveation region centered on the direction of gaze since the viewer is bound to move his/her eyes around?
In general, the only way to guarantee perceptually lossless compression would be to dynamically measure the gaze direction of the eyes and shift the foveation region accordingly. Our foveation software is compatible with, and has been tested with, several different commercial eye tracking systems.11. Aren't commercial eye tracking systems expensive and difficult to use?
Currently most commercial eye tracking systems are rather expensive. However, this is not because of an inherent cost in the labor or materials, but because of the currently small market. A larger market would drive the price down quickly. Commercial eye trackers have been getting easier to use. There are several desk top and helmet mounted devices that are reliable and essentially invisible to the viewer.12. Because the viewer's eye position is needed for the encoding, won't there be substantial time delays associated with transmitting the eye position?
For either very fast or dedicated communication links the time delay is quite small (a half frame on average) which is quite acceptable to the viewer. However, for some communication links, such as satellite links or long-distance internet links, the time delay could produce noticeable effects. Therefore, foveated imaging with eye tracking is most practical for applications with dedicated communication links or with entirely local communications, such as flight simulators and virtual reality systems.13. Is eye tracking required for foveated imaging to be useful?
Some of the most promising applications of foveated imaging do not require eye tracking. Any pointing device, such as a mouse or a touch pad, can be used to control the foveation. The list of useful applications for mouse-controlled foveation includes video teleconferencing, video surveillance, telenavigation, telemedicine, and image data base retrieval. Whenever there is a limited bandwidth for communication, mouse-controlled foveated imaging provides a simple method for the viewer to direct high resolution to regions of interest.
Consider being confronted with the choice between two nearly equivalent video communications systems. Suppose both systems send video with equal resolution, at the same frame rate, in a non-foveation mode. However, suppose the second system gives the viewer the option of switching to a foveation mode where spatial resolution can be dynamically increased in regions of interest without affecting frame rate, or alternatively, where the frame rate can be increased while maintaining the resolution in the regions of interest. Clearly there are many situations where this second system would be very valuable. Which system would you want if the cost was nearly identical?14. Is it necessary for the viewer to control the foveation?
No. There are two other possibilities. The foveation could be controlled by the sender or by an automatic algorithm.15. Without eye tracking won't the viewer be able to see that the images have variable resolution?
In general, if the viewer is not looking at a foveation region, the reduced resolution due to the foveation will be visible. However, keep in mind that the gain to the viewer is increased resolution in the regions of interest (or increased frame rate). Often the increased resolution (or frame rate) is much more important that the loss of resolution away from the regions of interest. Our experience is that mouse-controlled foveation is a smooth and natural operation for the viewer. Furthermore, time delays in transmitting the mouse coordinates to the sender is not a problem here, because the eye movements are uncoupled from the position of the foveation region.16. Why not just window the image to a smaller size and let the user move the window around?
This has been done and certainly is simpler, but there are two problems:
First, without a full view, the viewer does not know where to put the window. The effect is much like trying to find a bird with a pair of binoculars. In other words, the user cannot easily find the regions of interest. Perceptual experiments have shown that viewers do not function efficiently with this type of display. Such displays are completely unusable for tasks such as remote navigation of a vehicle. It might be possible to first find a region of interest and then switch to a windowed image, but then other events that might be of interest are not available in the viewer's peripheral vision (e.g., another person or object entering the camera's field of view).
Second, simple windowing is inefficient; it does not match the resolution of the display to the resolution of the human visual system. Matching the display to the encoding properties of the eye is the most efficient way to allocate image data bits.17. Since bandwidth is getting cheaper and more available won't the need for foveated imaging disappear soon?
One can ask exactly the same question about any form of image compression, yet no one questions the general need for better image compression techniques. The reason is simply that users always want bigger, higher quality images and higher frame rates, and bandwidth always costs something. Thus, it will always be a benefit to have better compression. As the available bandwidth increases user demand for higher resolution and/or higher frame rates will increase. In fact, foveated imaging becomes more and more useful (bigger compression ratios) as the image size and resolution increase. Right now, foveated imaging would be particularly useful for teleconferencing, surveillance or telenavigation, using point-to-point POTS, ISDN, or wireless, as well as basic internet communications. It can easily increase compression by a factor of 3-5 for image sizes in the range of 320 x 240 to 640 x 480.
Our conclusion is that foveated imaging would be useful far into the future.18. But, what is the cost-benefit ratio of foveated imaging?
Foveated imaging can be very inexpensive both to use and to integrate into existing or coming technology. Our foveated imaging software requires no special hardware, and runs very quickly on a standard PCs using the Windows95/NT OS. The encoder takes as input an image or a video stream and outputs an image or video stream. This output can easily be directed to any other image coding or processing software or hardware. For example, the output images can be directed to a hardware MPEG coder . After transmission, the MPEG stream is decoded and sent to the foveation decoder for display. Because the foveation coder/decoder is all software and is compatible with other forms of video compression, the cost-benefit ratio is quite favorable.19. Suppose I already have a product that is useable within the available bandwidth. Why should I be interested in foveated imaging?
There are several possibilities. First, you might want to provide the user with the option of increasing frame rate without sacrificing resolution in the regions of interest. For example, if you are running at 10 frames per second at a given bandwidth, foveation might increase your framerate to 30 frames per second at the same bandwidth. Many users prefer foveated video sequences with higher framerates over non-foveated video sequences with lower framerates, and a video application could easily allow the user to switch between foveated and non-foveated modes at the user's discretion.
Second, you might want to increase the performance and options of the current product by allowing higher resolution images to be viewed. For example, it might be desirable to step up to a higher resolution camera. When run in the non-foveation mode the camera images could be transmitted at the resolution of the original camera (i.e., the system would behave exactly as before). However, in the foveation mode, the viewer could have access to the high resolution video information available from the high resolution camera (without sacrificing frame rate).
Finally, you might want to enable some usage of your product at lower bandwidth. This might occur if there is a temporary need to switch to a backup communication link, or if a potential user could not afford the appropriate bandwidth link.20. What are the advantages of your method of creating foveated images as compared with other methods?
A number of methods have been explored in the past. An early method of foveation involved increasing the size of the pixels away for the direction of gaze. A related method is to subsample the image away from the direction of gaze. Both of these methods suffer from a serious problem; namely, aliasing, which produces shimmering and illusory motion in the low resolution regions of the foveated image. Variable pixel size also produces visible blocking at the edges of the larger pixels. To eliminate aliasing and blocking effects is it necessary to low pass filter before sampling and to appropriately interpolate the samples when reconstructing the foveated image for display. Our method of foveation does just this.
Many methods of foveation do not incorporate the actual fall off in resolution of the human visual system (as measured in perception experiments). As mentioned earlier, matching the display to the encoding properties of the eye is the most efficient way to allocate image data bits. Matching the foveation pyramid to the fall off in resolution of the human visual system is one of the options in our foveation software.
It is possible to match the fall off in resolution of the human visual system with slightly greater precision. Rather than compute a foveation pyramid, a different low pass filter can be applied at each distance from the gaze point, and the sampling can be matched to the low-pass filter at each distance. However, this "continuous foveation" method (which has been considered in the past) provides minimal improvement in the foveated image quality. More importantly it is computationally complex and intensive, and hence is not practical for real-time software, and would be difficult to implement in real-time hardware. The foveation pyramid method is very simple and very fast, allowing a software implementation (for small to large image sizes) and a simple hardware implementation (for very large image sizes). Further, a software implementation is much more portable, compatible, and upgradeable than a hardware implementation. In other words, using the foveation pyramid method, foveated imaging can be incorporated into a new application very quickly for very little expense. Overall the foveation pyramid method is the best method available.
Copyright (C) 2002-2018, Center for Perceptual Systems|