Modern video projectors are remarkable devices that can display large imagery with a high resolution, brightness, and contrast. The latest high-end models even incorporate light sensors for controlling auto-focus and auto-iris objective lenses. Auto-iris lenses can greatly enhance the temporal contrast of projected images by adjusting the aperture opening to the average brightness of the displayed content. Their flexibility and low cost make projectors irreplaceable for many applications including professional presentations, home entertainment, scientific visualization, as well as museum and art installations. We envision future generations of these displays as fully integrated systems with cameras, dynamically adjustable apertures and intelligent control mechanisms.
|
|
With coded aperture projection, we present solutions for taking projectors to the next level. By placing static as well as dynamically coded masks at a projector’s aperture plane we show how the depth-of-fieldof a projection can be greatly enhanced. This allows focussed imagery to be shown on complex screens with varying distances to the projector’s focal plane, such as projection domes as in planetariums or cylindrical canvases as in IMAX theaters. We demonstrate that static as well as adaptive dynamic apertures outperform previous methods of defocus compensation for objective lenses with static circular apertures. In addition, our dynamic apertures can perform the type of contrast enhancement employed by common auto-iris projection lenses, and also produce high-quality de-pixelated images. The latter is beneficial for rear-projection TV sets and other close-view displays. Several approaches have been proposed to increase the depth of field of conventional projectors. Multiple projector units with differently adjusted focal planes but overlapping image areas ([1]) can be applied to increase the depth-of-field of a projection on the cost of an uneconomically complex system. Compensation of defocus using a single device is also possible by computing and projecting a compensation image that neutralizes the optical blur. As images can be digitally sharpened by convolving them with the inverse of a known blur function (called deconvolution, optical defocus of a video projector can be compensated in the same way[2, 5]. The blur function is defined by the aperture and referred to as point spread function (PSF). The PSF produced by a projector applying a regular circular aperture, is Gaussian. Due to its low-pass deconvolving Gaussian PSF, it sets clear limitations in terms of recovering fine image details. This problem has been addressed ([8]) by re-formulating the computation as an optimization problem that constraints the solution to the actual dynamic range of the projector while minimizing local optical defocus. All of these approaches share two limitations: Firstly, they are far from being able to reach real-time performance – even not if the time necessary for measuring the local blur functions is not considered. This prevents them from displaying dynamic content. Secondly, the amount of defocus that can be compensated through deconvolution is clearly limited when the PSF is Gaussians. Ringing artifacts will dominate if the blur becomes too large. In fact, only little defocus can be compensated efficiently with such techniques. Coded aperture imaging has been presented recently in the context of computational photography [4, 7], and has been applied previously in astronomy and medical imaging. In contrast to conventional apertures, coded apertures (i.e., apertures that encode a more complex binary or intensity pattern, rather than a simple round opening) in cameras enable post-exposure refocusing, reconstructing scene dept, or recording light fields. We introduce coded aperture projection, and show that if coded apertures are applied instead of simple circular ones, ringing artifacts as a result from deconvolution can be reduced and more image details can be recovered from optical defocus. Furthermore, our implementation uses the graphics hardware for computation and thus achieves interactive frame-rates of currently 8-16 fps at XGA resolution.
|
|
As explained above, optical defocus of a projected image can be mathematically described as a
convolution of the original image with a filter kernel that corresponds to the PSF of the aperture.
The scale of the kernel is directly proportional to the degree of defocus:
,where
is
the displayed image,
the aperture kernel at scale
, and
the optically blurred projection.
Deconvolution will digitally sharpen an image and consequently compensate optical defocus:
Here
is the inverse aperture kernel. Convolution and deconvolution can
be modeled easier in the frequency domain, rather than in the spatial domain, where a
convolution corresponds to a multiplication
and deconvolution equals a division
,
, and
are the Fourier transforms of
,
, and
respectively. In
general, this principle is also applied by related approaches, such as [5] and [2]. Once the
deconvolution has been computed in frequency domain, the result
is inverse Fourier
transformed to spatial domain and projected as compensation image. Low magnitudes in the
Fourier transform of the aperture kernels lead to divisions by small values in frequency
domain, and consequently to intensities in spatial domain that exceed the displayable
range of the projector. These intensities are clipped and therefore the corresponding
frequencies are not considered – which finally results in visible ringing artifacts. As already
mentioned, this is the main limitation of previous projector defocus compensation approaches,
since in frequency domain the Gaussian PSF of circular apertures is a low pass and does
contain a large fraction of low Fourier magnitudes. Applying only small kernel scales in
combination with wide aperture openings, on the one hand, will reduce the number of low
Fourier magnitudes (and consequently the ringing artifacts) – but will also lead to only
minor focus improvements. Using narrow aperture openings (up to pinhole size), on the
other hand, will naturally increase the focal depth, but will decrease the light throughput
significantly. To overcome this problem, we integrate a static coded aperture inside a
projector’s objective lens (cf. figure 1-left). The aperture is more broadband in frequency
domain and its Fourier transform has initially less low magnitudes than a circular aperture.
Consequently, more frequencies are retained and more image details are reconstructed
with less ringing artifacts. A comparative example of defocus compensation with and
without coded aperture is shown in figures 2). Increasing the depth-of-field with such a
static broadband aperture, however, comes at the cost of decreased light transmission,
which is one of the most crucial aspects of all projector-based display systems. Therefore,
we also present an approach for computing and displaying a dynamic aperture pattern,
based on the analysis of the projected image content and on limitations of human visual
perception. This analysis employs an intuitive model of the human visual system (HVS) and
allows us to determine and filter out spatial frequencies of the input image that cannot be
perceived by a human observer. An adaptive aperture can then be computed by maximizing
its light transmission while preserving the perceivable frequencies, rather than being
restricted to support a constant and broad frequency band. We will show that our adaptive
dynamic apertures produce better results than previous methods with the same or even an
increased amount of light transmission. The sensitivity variations of the HVS according to
spatial frequencies
are well studied and mathematically defined by the contrast
sensitivity function (CSF)
. Various definitions of this function appear in the
literature; we use the one described in [3]. The CSF depends on the viewing conditions
only, not on the actual content. The sensitivity is defined as the inverse of the contrast
required to produce a threshold response
, with
being the threshold contrast. Using the definition of Michelson contrast, this is given as
, where
is the necessary luminance difference given in
and
is the mean image luminance. An absolute luminance threshold map can be
computed as:
![]() | (1) |
The threshold map is show in figure 3. For computing our dynamic apertures, we wish to eliminate
all frequencies that do not contribute to perceivable image fidelity. The Fourier transform
magnitudes of an image converted to absolute luminance values
correspond to the
amount of spatial frequencies in the image. With this information, we can calculate a binary
importance mask for the image frequencies as:
![]() | (2) |
|
|
As illustrated in Figure 3, filtering the Fourier transform of an image with the binary
importance mask
allows us to remove spatial frequencies that do not modify the perceived
image content for specific viewing conditions that include a fixed adaptation luminance, viewer
position, and screen size. Now let’s take a look at how to compute the dynamic aperture itself. We
define the aperture as the sum of its individual pixels
, where
is the pixel
at
and
(with a total of
pixels) and
is its transmissivity. The Fourier
transform of the aperture is
. Our dynamic apertures should
support all important frequencies in the input image with a minimal variance of their
Fourier transform. In addition, they should maximize light throughput. The variance
of the aperture’s modulation transfer function (MTF) is a measure for how different
frequencies are attenuated. Minimizing it for all important frequencies ensures that they
are all supported. A similar criterion was employed in [6] for a one-dimensional binary
temporal mask. The minimization can be mathematically expressed as an optimization
problem:
![]() | (3) |
where
is a vector containing only 1s and
are the aperture pixel intensities. We do not
enforce the pixel intensities to be below 1 in this formulation, but simply scale the resulting values
so that the maximum is 1. This is equivalent with a scaling of the MTF and does not affect the
variance criterion.
is a diagonal matrix containing the binary frequency importance mask values
described above.
is a matrix with orthogonal basis functions in its columns which represent the
optical transfer function (OTF) of the
individual aperture pixels
. This results in a linear
system of the form
. Solving this heavily over-determined system in a least-squared
error sense with the additional constraint to minimize
will minimize the variance
of the Fourier transform of the aperture for important frequencies. This formulation
also intrinsically maximizes the light transmittance of the resulting aperture, because a
small squared 2-norm of
(
) also minimizes the variance of the normalized
pixel intensities in the spatial domain. The linear system can certainly be solved with
standard approaches, such as the conjugate gradient method for the normal equations
or non-negative least squares solutions. However, this would not allow sufficiently high
frame rates on commonly available computer hardware for standard image resolutions of
and higher. Thus, we propose to solve the system using the pseudo-inverse
matrix.Computing solutions of linear problems using the pseudo-inverse minimizes the
least-squared error and the 2-norm of the resulting vector, thus solving the variance and the light
transmittance problem at the same time. Reformulating our problem results in
,
where
denotes the pseudo-inverse matrix. Since
is a binary diagonal matrix then
.
comprises the set of orthogonal Fourier basis functions as its columns, thus
. We need to employ the conjugate transpose
, because
is complex,
hence:
![]() | (4) |
In this formulation
can be easily pre-computed. During run-time we solve the system with a
matrix-vector multiplication. Since the solution
can contain negative values we clip these values
and scale the result so that the maximum value is 1.
For the static coded aperture, the selected near-optimal aperture code was printed on transparencies
(Kodak film didn’t resist the heat), and was inserted into the objective lens at the projector’s
aperture plane (cf. 1-left). We applied the near-optimal 7x7 broadband pattern that has been found
in [7] using an optimization approach.Static coded aperture it’s low cost and easy manufacturing,
however limited as previously explained. Adaptive coded apertures lead to a higher image quality,
but are slightly more complex. For implementing them, we integrating a programmable liquid crystal
array (LCA) into the projector’s aperture plane, as illustrated in figure 1-right.For achieving
interactive compensation rates, we have developed an optimized software algorithm: In
principle, each pixel (or a very small image region) of the original image would have to be
deconvolved individually depending on its defocus. Our method partitions the image into a
non-uniform grid, based on the actual distribution of the kernel scales and on the capabilities of
the graphics hardware being used. The entire partitioning operation can be carried out
off-line, since it is independent from the image content. The Fourier transformation and
its inverse require the larges amount of computation time for deconvolution. We apply
CUDA’s GPU implementation of the Fast Fourier Transformation (FFT).First, we measure
the efficiency of the entire deconvolution –including FFT, division and IFFT– for each
possible patch size directly on the GPU, and compute the average time that is required
to process one pixel in each case. In theory, this should be in the order of
for
patches with
pixels. In practice, however, the overhead of hardware and software
specific implementations of the FFT/IFFT (e.g. through caching, etc.) can be significant.
We start a simple asymmetric quad-tree subdivision until we reach a predefined lowest
level of a highest partitioning resolution. For each atom patch, we look up the measured
efficiency that corresponds to its size. When traversing the quad-tree bottom-up, we
successively merge patches in each level if this leads to more efficient results. We look up the
efficiency of each merge possibility based on their patch sizes and compare them with the
total efficiency of the subdivision achieved for the same area in the previous level. If one
merge possibility becomes more efficient than the previous subdivision, it will be used,
its total efficiency is computed and passed to the next quad-tree level for supporting
upcoming merge decisions. In contrast to the above image partitioning, which depends on the
defocus values rather than on the image content and is therefore carried out offline, the
following deconvolution steps are processed entirely on the GPU for each frame. Since the
perception of focussed details is optioned mainly from the image luminance, we apply
deconvolution to the luminance channel rather than to the RGB channels. In the next step,
the partitioning result that has been pre-computed is used for dividing the luminance
channel of the image into the desired patch structure. As mentioned earlier, CUDA’s
FFT implementation is applied to each patch in this array. We then divide all Fourier
transformed patches by Fourier transformed aperture kernels of different scales to perform the
deconvolution. Based on the partitioning results, each patch can have a different number of
scale levels while the individual scale values can locally vary.The final steps reverse the
initial steps.Each pixel’s final luminance value is selected only from the patch that was
deconvolved with the necessary aperture scale (i.e., the scale that corresponds to the pixel’s
amount of defocus). The patches are then blended in spatial domain (i.e., in image space)
and the new luminance values are recombined with the original chrominance values.
The resultant compensation image is finally projected. Additional steps are required in
case a dynamic coded aperture is used. First of all, the LCA that is currently applied
in the prototype is limited to only one bit depth, thus a simple binarization has to be
carried out. As previously described, the aperture can be computed with a simple matrix
multiplication. To reach interactive framerates, the
matrix is precomputed and uploaded
onto the graphics hardware memory. Using NVIDIA’s BLAS implementation for CUDA,
the matrix multiplication can be carried out directly on the graphics hardware, and we
benefit from a parallel SIMD processing of the GPU. For this, the current input image is
uploaded to the graphics hardware memory then the Fourier transform is calculated and the
binary importance mask for the image is determined. The resulting importance mask is
multiplied with
, resulting in the aperture mask. Finally, the mask is binarized and
rendered to the LCA. The projected image is deconvolved as explained earlier, before being
displayed.
With known parameters of the projector’s objective lens and adjusted aperture pattern, we measure the PSF’s scale in a deconvolved image that is projected. This can be done by finding the best match between the camera-captured projection and different simulated versions of the original image that is convolved with multiple scales of the PSF, as explained in [5]. This allows us to automatically measure the pixel-individual defocus on the screen and drive the corresponding kernel scales.
|
|
With the determined scales, we compute the
-numbers of an objective lens with a circular
aperture (and constant focal length) that would lead to the same depth-of-field (
) or the same
light throughput (
) as the corresponding coded aperture. To achieve the depth-of-field of the
adaptive coded aperture that is used for displaying the “lenna” image in figure 2, for
instance, an
stop is needed. For achieving the same light throughput, however, a
stop would be required. In terms of light throughput, the gain is approximately
= x4.3. The table in figure 4 shows that the depth-of-field versus
light throughput property of unscaled adaptive coded apertures is in almost all cases
significantly better than the application of broadband masks or a purely digital defocus
compensation. Therefore, coded apertures (and in particular adaptive coded apertures)
outperform static circular apertures. The corresponding input images are shown in figure
5.
|
|
Our technique is also useful for planar screens that do not require a large depth-of-field: Defocussing the projector optically to make the pixel structure vanish, and applying deconvolution to recover the image details leads to better image quality. This is known as projector de-pixelation, and can be applied for close-view displays with limited resolution, such as rear-projected TV sets. Our technique enhances projector de-pixelation significantly, as shown in figure 6 (left half). For video frames with significantly different brightness, our dynamic aperture can be scaled with respect to the mean image brightness for an increasing temporal contrast as conventional auto-iris projection lenses (cf. figure 6-right half).
|
|
The frame-rate that we can currently achieve with our approach, depends on the number of required defocus scales. Using the static coded aperture, frame-rates of 12-16 fps are possible. Due to a higher computational demand in case of the dynamic adaptive coded aperture, only a frame-rate of 8 fps is possible so far. These frame-rates are all measured using an NVIDA GeForce 8800 Ultra graphics board. This is a clear limitation, but will improve with next generation graphics hardware, or with customized integrated image processors. The main limitations of our approach are currently imposed by the employed LCAs. The low transmittance (only 30% when completely transparent) of current LCAs, for instance, results in a tremendous loss of light. Therefore, we trade light throughput for depth-of-field. As spatial light modulators (SLMs), such as a high contrast continuously valued LCA with higher transmittance, or a reflective SLM, such as a DMD, become more widely available we expect better results with these displays. Being able to use intensity masks will not only improve defocus compensation and de-pixelation, but will also allow the control of temporal contrast by scaling the transmittance intensity rather than the size of the aperture. This, however, requires higher contrast LCAs and film material. We also believe, that high brightness at low power consumption and heat development will become feasible with light engines that apply upcoming LED technology.
[1] Oliver Bimber and Andreas Emmerling. Multifocal Projection: A Multiprojector Technique for Increasing Focal Depth. IEEE TVCG, 12(4):658–667, 2006.
[2] Michael S. Brown, Peng Song, and Tat-Jen Cham. Image Pre-Conditioning for Out-of-Focus Projector Blur. In Proc. IEEE CVPR, volume II, pages 1956–1963, 2006.
[3] S. Daly. The Visible Differences Predictor: An Algorithm for the Assessment of Image Fidelity. In A.B. Watson, editor, Digital Image and Human Vision, pages 179–206. Cambridge, MA: MIT Press, 1993.
[4] Anat Levin, Rob Fergus, Frédo Durand, and William T. Freeman. Image and depth from a conventional camera with a coded aperture. ACM Trans. Graph. (Siggraph), 26(3):70, 2007.
[5] Yuji Oyamada and Hideo Saito. Focal Pre-Correction of Projected Image for Deblurring Screen Image. In Proc. IEEE ProCams, 2007.
[6] Ramesh Raskar, Amit Agrawal, and Jack Tumblin. Coded exposure photography: motion deblurring using fluttered shutter. ACM Trans. Graph., 25(3):795–804, 2006.
[7] Ashok Veeraraghavan, Ramesh Raskar, Amit Agrawal, Ankit Mohan, and Jack Tumblin. Dappled photography: mask enhanced cameras for heterodyned light fields and coded aperture refocusing. ACM Trans. Graph. (Siggraph), 26(3):69, 2007.
[8] L. Zhang and S. K. Nayar. Projection Defocus Analysis for Scene Capture and Image Display. ACM Trans. Graph. (Siggraph), 25(3):907–915, 2006.