Converting stroked primitives to filled primitives

Nehab, D. 2020. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH 2020), On-line due to COVID-19, August 2020, 39(4):137, DOI: 10.1145/3386569.3392392


Abstract: Vector graphics formats offer support for both filled and stroked primitives. Filled primitives paint all points in the region bounded by a set of outlines. Stroked primitives paint all points covered by a line drawn over the outlines. Editors allow users to convert stroked primitives to the outlines of equivalent filled primitives for further editing. Likewise, renderers typically convert stroked primitives to equivalent filled primitives prior to rendering. This conversion problem is deceivingly difficult to solve. Surprisingly, it has received little to no attention in the literature. Existing implementations output too many segments, do not satisfy accuracy requirements, or fail under a variety of conditions, often spectacularly. In this paper, we present a solution to the stroke-to-fill conversion problem that addresses these issues. One of our key insights is to take into account the evolutes of input outlines, in addition to their offsets, in regions of high curvature. Furthermore, our approach strives to maintain continuity between the input and the set of painted points. Our implementation is available in open source.

The Core of Cartesi

Teixeira, A.; Nehab, D. 2018. Cartesi. Whitepaper


Abstract: Cartesi is a layer-2 platform for the development and deployment of scalable decentralized applications. Cartesi DApps are composed of both blockchain and off-chain components. Off-chain components run inside Cartesi Nodes that represent the interests of each DApp user. Cartesi Nodes provide DApp developers with reproducible Cartesi Machines, where large scale verifiable computations can be run. These verifiable computations are easily integrated into smart contracts by powerful primitives that provide strong conflict-resolution guarantees. More precisely, any dispute arising over the result of computations run inside Cartesi Machines can be fairly adjudicated at negligible cost on the blockchain. Cartesi Nodes also allow DApp developers to run native code. Native computations can leverage the node's full processing power, including any available GPUs. Whether performed natively by the node or inside Cartesi Machines, off-chain components run under a complete Linux operating system that provides the full ecosystem required by complex computations. Cartesi enables DApp developers to use all the programming languages, tools, libraries, software, and services they are already familiar with. By moving most of the complex logic of their DApps to portable off-chain components, developers are freed from the limitations and idiosyncrasies imposed by blockchains. In this way, Cartesi empowers developers to select the best run-time environment in which to host each part of their DApps.

The Replate

Chen, G.; Nehab, D.; Sander, P. V. 2018. Proceedings of the ACM on Computer Graphics and Interactive Techniques, 1(1):4, DOI: 10.1145/3203205


Abstract: Replates are a new way to experience replays of sporting events. We use computer vision tools, such as camera tracking and background subtraction, to align all frames in the original video with regard to each other. Using this alignment, we create a single plate as it would be seen from a wide-angle static camera. Key players and events are then reinserted into this arena using an interactive tool, recovering the big picture that is typically lost due to camera panning and zoom. Replates are played in continuous loop. We allow replate authors to include multiple instances of each player, offset by regular intervals, so viewers can quickly and repeatedly inspect their favorite moments. Authors can also choose from a variety of effects that help them construct a narrative of their favourite plays. The interactive system gives immediate feedback allowing authors to adjust the rendered replates interactively. Replates take a fraction of the size of the original input video, and produce an effect reminiscent of video game reenactments of famous sporting events. The footage, however, is real.

In-Depth Buffers

Han, S; Chen, G.; Nehab, D.; Sander, P. V. 2018. Proceedings of the ACM on Computer Graphics and Interactive Techniques, 1(1):2, DOI: 10.1145/3203194


Abstract: We present a new approach for rendering all triangles in a model in front-to-back order without the need for sorting at runtime. The method can be used for rendering order-dependent transparency effects, or to minimize overdraw, for example. The key distinguishing component in the approach is its negligible runtime cost and therefore the ease with which it can be incorporated into rendering engines. More specifically, given a viewpoint, the runtime simply selects a presorted triangle list, which we call in-depth buffers, to be rendered at full speed. These in-depth buffers are even optimized for post-transform vertex cache efficiency. The result is unmatched in front-to-back rendering performance. The difficulty is computing the smallest set of in-depth buffers required. This reduces to a graph problem that we prove to be NP-hard. Nevertheless, we have found an optimization heuristic that produces good results, particularly when visually imperceptible fragment ordering mistakes are allowed.

Real-Time Continuous Image Processing

Sacht, L.; Nehab, D.; Lima, R. S. 2018. International Journal of Image and Graphics, 18(3):19, DOI: 10.1142/S021946781850016X


Abstract: In this work, we propose a framework that performs a number of popular image-processing operations in the continuous domain. This is in contrast to the standard practice of defining them as operations over discrete sequences of sampled values. The guiding principle is that, in order to prevent aliasing, nonlinear image-processing operations should ideally be performed prior to prefiltering and sampling. This is of course impractical, as we may not have access to the continuous input. Even so, we show that it is best to apply image-processing operations over the continuous reconstruction of the input. This transformed continuous representation is then prefiltered and sampled to produce the output. The use of high-quality reconstruction strategies brings this alternative much closer to the ideal than directly operating over discrete values. We illustrate the advantages of our framework with several popular effects. In each case, we demonstrate the quality difference between continuous image-processing, their discrete counterparts and previous anti-aliasing alternatives. Finally, our GPU implementation shows that current graphics hardware has enough computational power to perform continuous image processing in real-time.

Rigorous bounds for polynomial Julia sets

de Figueiredo, L. H.; Nehab, D.; Stolfi, J.; Oliveira, J. B. 2016. Journal of Computational Dynamics, 3(2):113–137, DOI: 10.3934/jcd.2016006


Abstract: We present an algorithm for computing images of polynomial Julia sets that are reliable in the sense that they carry mathematical guarantees against sampling artifacts and rounding errors in floating-point arithmetic. We combine cell mapping based on interval arithmetic with label propagation in graphs to avoid function iteration and rounding errors. As a result, our algorithm avoids point sampling and can reliably classify entire rectangles in the complex plane as being on either side of the Julia set. The union of the rectangles that cannot be so classified is guaranteed to contain the Julia set. Our algorithm computes a refinable quadtree decomposition of the complex plane adapted to the Julia set which can be used for rendering and for approximating geometric properties such as the area of the filled Julia set and the fractal dimension of the Julia set.

Parallel recursive filtering of infinite input extensions

Nehab, D.; Maximo, A. 2016. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH Asia 2016), Macao, December 2016, 35(6), DOI: 10.1145/2980179.2980222


Abstract: Filters with slowly decaying impulse responses have many uses in computer graphics. Recursive filters are often the fastest option for such cases. In this paper, we derive closed-form formulas for computing the exact initial feedbacks needed for recursive filtering infinite input extensions. We provide formulas for the constant-padding (e.g. clamp-to-edge), periodic (repeat) and even-periodic (mirror or reflect) extensions. These formulas were designed for easy integration into modern block-parallel recursive filtering algorithms. Our new modified algorithms are state-of-the-art, filtering images faster even than previous methods that ignore boundary conditions.

New controls for combining images in correspondence

Liao, J.; Nehab, D.; Hoppe, H.; Sander, P. V. 2016. IEEE Transactions on Visualization and Computer Graphics, 22(7):1875–1885, DOI: 10.1109/tvcg.2015.2476794


Abstract: When interpolating images, for instance in the context of morphing, there are myriad approaches for defining correspondence maps that align structurally similar elements. However, the actual interpolation usually involves simple functions for both geometric paths and color blending. In this paper we explore new types of controls for combining two images related by a correspondence map. Our insight is to apply recent edge-aware decomposition techniques, not just to the image content but to the map itself. Our framework establishes an intuitive low-dimensional parameter space for merging the shape and color from the two source images at both low and high frequencies. A gallery-based user interface enables interactive traversal of this rich space, to either define a morph path or synthesize new hybrid images. Extrapolation of the shape parameters achieves compelling effects. Finally we demonstrate an extension of the framework to videos.

Optimized quasi-interpolators for image reconstruction

Sacht, L.; Nehab, D. 2015. IEEE Transactions on Image Processing, September 2015, 24(12):5249–5259, DOI: 10.1109/TIP.2015.2478385


Abstract: We propose new quasi-interpolators for the continuous reconstruction of sampled images, combining a narrowly-supported piecewise-polynomial kernel and an efficient digital filter. In other words, our quasi-interpolators fit within the generalized sampling framework and are straightforward to use. We go against standard practice and optimize for approximation quality over the entire Nyquist range, rather than focusing exclusively on the asymptotic behavior as the sample spacing goes to zero. In contrast to previous work, we jointly optimize with respect to all degrees of freedom available in both the kernel and the digital filter. We consider linear, quadratic, and cubic schemes, offering different trade-offs between quality and computational cost. Experiments with compounded rotations and translations over a range of input images confirm that, due to the additional degrees of freedom and the more realistic objective function, our new quasi-interpolators perform better than the state-of-the-art, at a similar computational cost.

Massively-Parallel Vector Graphics

Ganacim, F.; Lima, R. S.; de Figueiredo, L. H.; Nehab, D. 2014. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH Asia 2014), Shenzhen, December 2014, 33(6):229, DOI: 10.1145/2661229.2661274


Abstract: We present a massively parallel vector graphics rendering pipeline that is divided into two components. The preprocessing component builds a novel adaptive acceleration data structure, the shortcut tree. Tree construction is efficient and parallel at the segment level, enabling dynamic vector graphics. The tree allows efficient random access to the color of individual samples, so the graphics can be warped for special effects. The rendering component processes all samples and pixels in parallel. It was optimized for wide antialiasing filters and a large number of samples per pixel to generate sharp, noise-free images. Our sample scheduler allows pixels with overlapping antialiasing filters to share samples. It groups together samples that can be computed with the same vector operations using little memory or bandwidth. The pipeline is feature-rich, supporting multiple layers of filled paths, each defined by curved outlines (with linear, rational quadratic, and integral cubic Bézier segments), clipped against other paths, and painted with semi-transparent colors, gradients, or textures. We demonstrate renderings of complex vector graphics in state-of-the-art quality and performance. Finally, we provide full source-code for our implementation as well as the input data used in the paper.

Semi-automated video morphing

Liao, J.; Lima, R. S.; Nehab, D.; Hoppe, H.; Sander, P. V. 2014. Computer Graphics Forum (Proceedings of Eurographics Symposium on Rendering 2014), Lyon, June 2014, 33(4):51–60, DOI: 10.1111/cgf.12412


Abstract: We explore creating smooth transitions between videos of different scenes. As in traditional image morphing, good spatial correspondence is crucial to prevent ghosting, especially at silhouettes. Video morphing presents added challenges. Because motions are often unsynchronized, temporal alignment is also necessary. Applying morphing to individual frames leads to discontinuities, so temporal coherence must be considered. Our approach is to optimize a full spatiotemporal mapping between the two videos. We reduce tedious interactions by letting the optimization derive the fine-scale map given only sparse user-specified constraints. For robustness, the optimization objective examines structural similarity of the video content. We demonstrate the approach on a variety of videos, obtaining results using few explicit correspondences.

Automating image morphing using structural similarity on a halfway domain

Liao, J.; Lima, R. S.; Nehab, D.; Hoppe, H.; Sander, P. V.; Yu, J. 2014. ACM Transactions on Graphics (Presented at ACM SIGGRAPH 2014), Vancouver, August 2014, 33(5):168, DOI: 10.1145/2629494


Abstract: The main challenge in achieving good image morphs is to create a map that aligns corresponding image elements. Our aim is to help automate this often tedious task. We compute the map by optimizing the compatibility of corresponding warped image neighborhoods using an adaptation of structural similarity. The optimization is regularized by a thin-plate spline, and may be guided by a few user-drawn points. We parameterize the map over a halfway domain and show that this representation offers many benefits. The map is able to treat the image pair symmetrically, model simple occlusions continuously, span partially overlapping images, and define extrapolated correspondences. Moreover, it enables direct evaluation of the morph in a pixel shader without mesh rasterization. We improve the morphs by optimizing quadratic motion paths and by seamlessly extending content beyond the image boundaries. We parallelize the algorithm on a GPU to achieve a responsive interface and demonstrate challenging morphs obtained with little effort.

A Fresh Look at Generalized Sampling

Nehab, D.; Hoppe, H. 2014. Foundations and Trends in Computer Graphics and Vision, March 2014, 8(1):1–84, DOI: 10.1561/0600000053


Abstract: Discretization and reconstruction are fundamental operations in computer graphics, enabling the conversion between sampled and continuous representations. Major advances in signal-processing research have shown that these operations can often be performed more efficiently by decomposing a filter into two parts: a compactly supported continuous-domain function and a digital filter. This strategy of "generalized sampling" has appeared in a few graphics papers, but is largely unexplored in our community. This survey broadly summarizes the key aspects of the framework, and delves into specific applications in graphics. Using new notation, we concisely present and extend several key techniques. In addition, we demonstrate benefits for prefiltering in image downscaling and supersample-based rendering, and analyze the effect that generalized sampling has on the noise due to Monte Carlo estimation. We conclude with a qualitative and quantitative comparison of traditional and generalized filters.

Depth-Presorted Triangle Lists

Chen, G.; Sander, P. V.; Nehab, D.; Yang, L.; Hu, L. 2012. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH Asia 2012), Singapore, November 2012, 31(6):160, DOI: 10.1145/2366145.2366179


Abstract: We present a novel approach for real-time rendering of static 3D models front-to-back or back-to-front relative to any viewpoint outside its bounding volume. The approach renders depth-sorted triangles using a single draw-call. At run-time, we replace the traditional sorting strategy of existing algorithms with a faster triangle selection strategy. The selection process operates on an extended sequence of triangles annotated by test planes, created by our off-line preprocessing stage. Based on these test planes, a simple run-time procedure uses the given viewpoint to select a subsequence of triangles for rasterization. Selected subsequences are statically presorted by depth and contain each input triangle exactly once. Our method runs on legacy hardware and renders depth-sorted static models significantly faster than previous approaches. We conclude demonstrating the real-time rendering of order-independent transparency effects.

Temporal Coherence Methods in Real-Time Rendering

Scherzer, D.; Yang, L.; Mattausch, O.; Nehab, D.; Sander, P. V.; Wimmer, M.; Eisemann, E. 2012. Computer Graphics Forum, December 2012, 31(8):2378-2408, DOI: 10.1111/j.1467-8659.2012.03075.x


Abstract: Nowadays, there is a strong trend towards rendering to higher-resolution displays and at high frame rates. This development aims at delivering more detail and better accuracy, but it also comes at a significant cost. Although graphics cards continue to evolve with an ever-increasing amount of computational power, the speed gain is easily counteracted by increasingly complex and sophisticated shading computations. For real-time applications, the direct consequence is that image resolution and temporal resolution are often the first candidates to bow to the performance constraints (e.g., although full HD is possible, PS3 and XBox often render at lower resolutions).
In order to achieve high-quality rendering at a lower cost, one can exploit temporal coherence (TC). The underlying observation is that a higher resolution and frame rate do not necessarily imply a much higher workload, but a larger amount of redundancy and a higher potential for amortizing rendering over several frames. In this survey, we investigate methods that make use of this principle and provide practical and theoretical advice on how to exploit temporal coherence for performance optimization. These methods not only allow incorporating more computationally intensive shading effects into many existing applications, but also offer exciting opportunities for extending high-end graphics applications to lower-spec consumer-level hardware. To this end, we first introduce the notion and main concepts of TC, including an overview of historical methods. We then describe a general approach, image-space reprojection, with several implementation algorithms that facilitate reusing shading information across adjacent frames. We also discuss data-reuse quality and performance related to reprojection techniques. Finally, in the second half of this survey, we demonstrate various applications that exploit TC in real- time rendering.

GPU-Efficient Recursive Filtering and Summed-Area Tables

Nehab, D.; Maximo, A.; Lima, R. S.; Hoppe, H. 2011. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH Asia 2011), Hong Kong, December 2011, 30(6):176, DOI: 10.1145/2024156.2024210


Abstract: Image processing operations like blurring, inverse convolution, and summed-area tables are often computed efficiently as a sequence of 1D recursive filters. While much research has explored parallel recursive filtering, prior techniques do not consider optimization across the entire filter sequence. Typically, a separate filter (or often a causal-anticausal filter pair) is required in each dimension. Computing these filter passes independently results in significant traffic to global memory, a crucial bottleneck in GPU systems. We present a new algorithmic framework for parallel evaluation. It partitions the image into 2D blocks, with a small band of data buffered along each block perimeter. We show that these perimeter bands are sufficient to accumulate the effects of the successive filters. A remarkable result is that the image data is read only twice and written just once, independent of image size, and thus total memory bandwidth is reduced even compared to the traditional serial algorithm. We demonstrate significant speedups in GPU computation.

Image-Based Bidirectional Scene Reprojection

Yang, L.; Tse, Y.-C.; Sander, P. V.; Lawrence, J. D.; Nehab, D.; Hoppe, H.; Wilkins, C. L. 2011. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH Asia 2011), Hong Kong, December 2011, 30(6):150, DOI: 10.1145/2070781.2024184


Abstract: We introduce a method for increasing the framerate of real-time rendering applications. Whereas many existing temporal upsampling strategies only reuse information from previous frames, our bidirectional technique reconstructs intermediate frames from a pair of consecutive rendered frames. This significantly improves the accuracy and efficiency of data reuse since very few pixels are mutually occluded in both frames. We present two versions of this basic algorithm. The first is appropriate for fill-bound scenes as it limits the number of expensive shading calculations, but involves rasterization of scene geometry at each intermediate frame. The second version, our more significant contribution, reduces both shading and geometry computations by performing reprojection using only image-based buffers. It warps and combines the adjacent rendered frames using an efficient iterative search on their stored scene depth and flow. Bidirectional reprojection introduces a small amount of lag. We perform a user study to investigate this lag, and find that its effect is minor. We demonstrate substantial performance improvements (3–4×) for a variety of applications, including vertex-bound and fill-bound scenes, multi-pass effects, and motion blur.

Beam casting implicit surfaces on the GPU with interval arithmetic

Ganacim, F.; de Figueiredo, L. H.; Nehab, D. 2011. In Proceedings of the Brazilian Symposium on Computer Graphics and Image Processing, pages 72–77, Maceió, Brazil, August 2011, DOI: 10.1109/SIBGRAPI.2011.5


Abstract: We present a GPU-based adaptive method for rendering implicit surfaces with beam casting. We use interval arithmetic to model the beams and to detect their intersections with the surface. We show how beams can be used to quickly discard large empty regions in the image, thus reducing the number of operations needed to render the surface. Better performance is achieved using an adaptive space subdivision process. We are able to beam-cast algebraic surfaces of degree up to 8, at better than real-time rates, with anti-aliasing.

A survey on temporal coherence methods in real-time rendering

Scherzer, D.; Yang, L.; Mattausch, O.; Nehab, D.; Sander, P. V.; Wimmer, M.; Eisemann, E. 2011. In Eurographics State of the Art Reports, Llandudno, UK, May 2011, DOI: 10.2312/EG2011/stars/101-126


Abstract: Nowadays, there is a strong trend towards rendering to higher-resolution displays and at high frame rates. This development aims at delivering more detail and better accuracy, but it also comes at a significant cost. Although graphics cards continue to evolve with an ever-increasing amount of computational power, the processing gain is counteracted to a high degree by increasingly complex and sophisticated pixel computations. For real-time applications, the direct consequence is that image resolution and temporal resolution are often the first candidates to bow to the performance constraints (e.g., although full HD is possible, PS3 and XBox often render at lower resolutions).
In order to achieve high-quality rendering at a lower cost, one can exploit temporal coherence (TC). The underlying observation is that a higher resolution and frame rate do not necessarily imply a much higher workload, but a larger amount of redundancy and a higher potential for amortizing rendering over several frames. In this state-of-the-art report, we investigate methods that make use of this principle and provide practical and theoretical advice on how to exploit temporal coherence for performance optimization. These methods not only allow incorporating more computationally intensive shading effects into many existing applications, but also offer exciting opportunities for extending high-end graphics applications to lower-spec consumer-level hardware. To this end, we first introduce the notion and main concepts of TC, including an overview of historical methods. We then describe a key data structure, the so-called reprojection cache, with several supporting algorithms that facilitate reusing shading information from previous frames, and finally illustrated its usefulness in various applications.

Fast Capacity Constrained Voronoi Tessellation

Li, H.; Nehab, D.; Wei, L.-Y.; Sander, P. V.; Fu, C.-W. 2010. In Symposium on Interactive 3D Graphics and Games, Washington, DC, February 2010, DOI: 10.1145/1730804.1730985


Abstract: Capacity constrained Voronoi tessellation (CCVT) addresses a crucial quality issue of Lloyd relaxation but at the expense of slower computation, which could hinder its potential wide adoption. We present a fast capacity constrained Voronoi tessellation algorithm which is orders of magnitude faster than the original method proposed by Balzer et al. [2009] (and 10× faster than a previous accelerated implementation of the same technique) while maintaining excellent distribution quality and scaling very well as the number of points increase.

Amortized Supersampling

Yang, L.; Nehab, D.; Sander, P. V.; Sitthi-amorn, P.; Lawrence, J. D.; Hoppe, H. 2009. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH Asia 2009), Yokohama, Japan, December 2009, 28(5):135, DOI: 10.1145/1618452.1618481


Abstract: We present a real-time rendering scheme that reuses shading samples from earlier time frames to achieve practical antialiasing of procedural shaders. Using a reprojection strategy, we maintain several sets of shading estimates at subpixel precision, and incrementally update these such that for most pixels only one new shaded sample is evaluated per frame. The key difficulty is to prevent accumulated blurring during successive reprojections. We present a theoretical analysis of the blur introduced by reprojection methods. Based on this analysis, we introduce a nonuniform spatial filter, an adaptive recursive temporal filter, and a principled scheme for locally estimating the spatial blur. Our scheme is appropriate for antialiasing shading attributes that vary slowly over time. It works in a single rendering pass on commodity graphics hardware, and offers results that surpass 4x4 stratified supersampling in quality, at a fraction of the cost.

Random-access rendering of general vector graphics

Nehab, D.; Hoppe, H. 2008. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH Asia 2008), Singapore, December 2008, 27(5):135, DOI: 10.1145/1409060.1409088
Previously Microsoft Research Technical Report MSR-TR-2007-95, July 2007


Abstract: We introduce a novel representation for random-access rendering of antialiased vector graphics on the GPU, along with efficient encoding and rendering algorithms. The representation supports a broad class of vector primitives, including multiple layers of semitransparent filled and stroked shapes, with quadratic outlines and color gradients. Our approach is to create a coarse lattice in which each cell contains a variable-length encoding of the graphics primitives it overlaps. These cell-specialized encodings are interpreted at runtime within a pixel shader. Advantages include localized memory access and the ability to map vector graphics onto arbitrary surfaces, or under arbitrary deformations. Most importantly, we perform both prefiltering and supersampling within a single pixel shader invocation, achieving inter-primitive antialiasing at no added memory bandwidth cost. We present an efficient encoding algorithm, and demonstrate high-quality realtime rendering of complex, real-world examples.

Efficient traversal of mesh edges using adjacency primitives

Sander, P. V.; Nehab, D.; Chlamtac, E.; Hoppe, H. 2008. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH Asia 2008), Singapore, December 2008, 27(5):144, DOI: 10.1145/1409060.1409097


Abstract: Processing of mesh edges lies at the core of many advanced realtime rendering techniques, ranging from shadow and silhouette computations, to motion blur and fur rendering. We present a scheme for efficient traversal of mesh edges that builds on the adjacency primitives and programmable geometry shaders introduced in recent graphics hardware. Our scheme aims to minimize the number of primitives while maximizing SIMD parallelism. These objectives reduce to a set of discrete optimization problems on the dual graph of the mesh, and we develop practical solutions to these graph problems. In addition, we extend two existing vertex cache optimization algorithms to produce cache-efficient traversal orderings for adjacency primitives. We demonstrate significant runtime speedups for several practical real-time rendering algorithms.

Automated reprojection-based pixel shader optimization

Sitthi-amorn, P.; Lawrence, J. D.; Yang, L.; Sander, P. V.; Nehab, D.; Xi, J. 2008. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH Asia 2008), Singapore, December 2008, 27(5):127, DOI: 10.1145/1409060.1409080


Abstract: We present a framework and supporting algorithms to automate the use of data reprojection as a general tool for optimizing procedural shaders. Although the general strategy of caching and reusing expensive intermediate shading calculations across consecutive frames has previously been shown to provide an effective trade-off between speed and accuracy, the critical choices of what to reuse and at what rate to refresh cached entries have been left to a designer. The fact that these decisions require a deep understanding of a procedure's semantic structure makes it challenging to select optimal candidates among possibly hundreds of alternatives. Our automated approach relies on parametric models of the way possible caching decisions affect the shader's performance and visual fidelity. These models are trained using a sample rendering session and drive an interactive profiler in which the user can explore the error/performance trade-offs associated with incorporating temporal reprojection. We evaluate the proposed models and selection algorithm with a prototype system used to optimize several complex shaders and compare our approach to current alternatives.

A System for High-Volume Acquisition and Matching of Fresco Fragments:
Reassembling Theran Wall Paintings

Brown, B. J.; Toler-Franklin, C.; Nehab, D.; Burns, M.; Dobkin, D.; Vlachopoulos, A.; Doumas, C.; Weyrich, T.; Rusinkiewicz, S. 2008. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH 2008), Los Angeles, California, August 2008, 27(3):84, DOI: 10.1145/1360612.1360683


Abstract: Although mature technologies exist for acquiring images, geometry, and normals of small objects, they remain cumbersome and time-consuming for non-experts to employ on a large scale. In an archaeological setting, a practical acquisition system for routine use on every artifact and fragment would open new possibilities for archiving, analysis, and dissemination. We present an inexpensive system for acquiring all three types of information, and associated metadata, for small objects such as fragments of wall paintings. The acquisition system requires minimal supervision, so that a single, non-expert user can scan at least 10 fragments per hour. To achieve this performance, we introduce new algorithms to robustly and automatically align range scans, register 2-D scans to 3-D geometry, and compute normals from 2-D scans. As an illustrative application, we present a novel 3-D matching algorithm that efficiently searches for matching fragments using the scanned geometry.

Dense 3D Reconstruction from Specularity Consistency

Nehab, D.; Weyrich, T.; Rusinkiewicz, S. 2008. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, Anchorage, Alaska, June 2008, DOI: 10.1109/CVPR.2008.4587683


Abstract: In this work, we consider the dense reconstruction of glossy objects. We propose the use of a specularity constraint, based on surface normal consistency, to define a matching cost function that can drive standard stereo reconstruction methods. We also present an aggregation method based on anisotropic diffusion that is particularly suitable for this matching cost function.
Following a theoretical discussion on the types of ambiguity that can arise from the proposed constraint, we present a controlled illumination setup that includes a stereo camera pair, and one LCD monitor used as a calibrated, variable-position light source. We then use the setup to evaluate the proposed method on real and synthetic data, and demonstrate its capacity to recover high-quality depth and orientation from specular objects.

An improved shading cache for modern GPUs

Sitthi-amorn, P.; Lawrence, J. D.; Yang, L.; Sander, P. V.; Nehab, D. 2008. In ACM SIGGRAPH/Eurographics Symposium on Graphics Hardware, pages 95–101, Sarajevo, Bosnia-Herzegovina, June 2008, DOI: 10.2312/EGGH/EGGH08/095-101


Abstract: Several recently proposed techniques based on the principle of data reprojection allow reusing shading information generated in one frame to accelerate the calculation of the shading in the following frame. This strategy can significantly reduce the average rendering cost for many important real-time effects at an acceptable level of approximation error. This paper analyzes the overhead associated with incorporating temporal data reprojection on modern GPUs. Based on this analysis, we propose an alternative algorithm to those previously described in the literature and measure its efficiency for multiple scenes and hardware platforms.

Accelerating Real-Time Shading with Reverse Reprojection Caching

Nehab, D.; Sander, P. V.; Lawrence, J. D.; Tatarchuk, N.; Isidoro, J. R. 2007. In ACM SIGGRAPH/Eurographics Symposium on Graphics Hardware, pages 25–35, San Diego, California, August 2007, DOI: 10.5555/1280094.1280098


Abstract: Evaluating pixel shaders consumes a growing share of the computational budget for real-time applications. How- ever, the significant temporal coherence in visible surface regions, lighting conditions, and camera location al- lows reusing computationally-intensive shading calculations between frames to achieve significant performance improvements at little degradation in visual quality. This paper investigates a caching scheme based on reverse reprojection which allows pixel shaders to store and reuse calculations performed at visible surface points. We provide guidelines to help programmers select appropriate values to cache and present several policies for keeping cached entries up-to-date. Our results confirm this approach offers substantial performance gains for many com- mon real-time effects, including precomputed global lighting effects, stereoscopic rendering, motion blur, depth of field, and shadow mapping.

Fast Triangle Reordering for Vertex Locality and Reduced Overdraw

Sander, P. V.; Nehab, D.; Barczak, J. 2007. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH 2007), San Diego, California, July 2007, 26(3):89, DOI: 10.1145/1276377.1276489


Abstract: We present novel algorithms that optimize the order in which triangles are rendered, to improve post-transform vertex cache efficiency as well as for view-independent overdraw reduction. The resulting triangle orders perform on par with previous methods, but are orders magnitude faster to compute.
The improvements in processing speed allow us to perform the optimization right after a model is loaded, when more information on the host hardware is available. This allows our vertex cache optimization to often outperform other methods. In fact, our algorithms can even be executed interactively, allowing for re-optimization in case of changes to geometry or topology, which happen often in CAD/CAM applications. We believe that most real-time rendering applications will immediately benefit from these new results.

Triangle Order Optimization for Graphics Hardware Computation Culling

Nehab, D.; Barczak, J.; Sander, P. V. 2006. In Symposium on Interactive 3D Graphics and Games, pages 207–211, Redwood City, California, March 2006, DOI: 10.1145/1111411.1111448
Released as ATI Tootle library
One of three papers invited for the I3D/UIST/Sandbox Reprise session in SIGGRAPH 2006


Abstract: We describe an automatic preprocessing algorithm that reorders triangles in a mesh so as to enable the graphics hardware to efficiently cull vertex and pixel processing at rendering time.
Our method starts by dividing the mesh into planar clusters which are subsequently sorted into a view-independent order which greatly reduces overdraw. The result is an increase in the opportunities for early Z-culling, reducing pixel processing time. The clusters are then optimized for mesh locality. This produces high rates of vertex cache hits, reducing vertex processing time.
We have found that our method brings the overdraw rates of a wide range of models close to that of front-to-back order, while preserving state of the art vertex cache performance. This results in higher frame-rates for pixel bound applications with no penalty to vertex-bound applications.

Improved Sub-pixel Stereo Correspondences through Symmetric Refinement

Nehab, D.; Rusinkiewicz, S.; Davis, J. E. 2005. In IEEE International Conference on Computer Vision, volume 1, pages 557–563, Beijin, China, October 2005, DOI: 10.1109/ICCV.2007.4409212


Abstract: Most dense stereo correspondence algorithms start by establishing discrete pixel matches and later refine these matches to sub-pixel precision. Traditional sub-pixel refinement methods attempt to determine the precise location of points, in the secondary image, that correspond to discrete positions in the reference image. We show that this strategy can lead to a systematic bias associated with the violation of the general symmetry of matching cost functions. This bias produces random or coherent noise in the final reconstruction, but can be avoided by refining both image coordinates simultaneously, in a symmetric way. We demonstrate that the symmetric sub-pixel refinement strategy results in more accurate correspondences by avoiding bias while preserving detail.

Efficiently Combining Positions and Normals for Precise 3D Geometry

Nehab, D.; Rusinkiewicz, S.; Davis, J. E.; Ramamoorthi, R. 2005. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH 2005), Los Angeles, California, July 2005, 24(3):536–543, DOI: 10.1145/1186822.1073226


Abstract: Range scanning, manual 3D editing, and other modeling approaches can provide information about the geometry of surfaces in the form of either 3D positions (e.g., triangle meshes or range images) or orientations (normal maps or bump maps). We present an algorithm that combines these two kinds of estimates to produce a new surface that approximates both. Our formulation is linear, allowing it to operate efficiently on complex meshes commonly used in graphics. It also treats high- and low-frequency components separately, allowing it to optimally combine outputs from data sources such as stereo triangulation and photometric stereo, which have different error-vs.-frequency characteristics. We demonstrate the ability of our technique to both recover high-frequency details and avoid low-frequency bias, producing surfaces that are more widely applicable than position or orientation data alone.

Spacetime Stereo: A Unifying Framework for Depth from Triangulation

Davis, J. E.; Nehab, D.; Ramamoorthi, R.; Rusinkiewicz, S. 2005. IEEE Transactions on Pattern Analysis and Machine Intelligence, February 2005, 27(2):296–302, DOI: 10.1109/TPAMI.2005.37


Abstract: Depth from triangulation has traditionally been investigated in a number of independent threads of research, with methods such as stereo, laser scanning, and coded structured light considered separately. In this paper, we propose a common framework called spacetime stereo that unifies and generalizes many of these previous methods. Viewing specific techniques as special cases of this general framework leads to insights regarding solutions to many of the traditional problems of individual techniques. Specifically, we discuss a number of possible applications such as improved recovery of static scenes under variable illumination, spacetime stereo for moving objects, structured light and laser scanning with multiple simultaneous stripes or patterns, and laser scanning of shiny objects. To suggest the practical utility of the framework, we use it to analyze two of these applications - recovery of static scenes under variable, but uncontrolled and unstructured illumination, and depth estimation in dynamic scenes. Based on our analysis, we show that methods derived from the spacetime stereo framework can be used to recover depth in situations in which existing methods perform poorly.

Stratified Point Sampling of 3D Models

Nehab, D.; Shilane, P. 2004. In Eurographics Symposium on Point-Based Graphics, pages 49–56, Zurich, Switzerland, February 2004, DOI: 10.2312/SPBG/SPBG04/049-056


Abstract: Point sampling is an important intermediate step for a variety of computer graphics applications, and specialized sampling strategies have been developed to satisfy the requirements of each problem. In this article, we present a technique to generate a stratified sampling of 3D models that is applicable across many domains. The algorithm voxelizes the model and selects one sample per voxel, restricted to the original model's surface. Parameters allow control of the uniformity of the sample placement and the minimum distance between samples. We demonstrate the effectiveness of this technique in selecting stroke locations for painterly rendering models and for producing sampled geometry used as input to shape descriptors.

Schemata Theory for the Real Coding and Arithmetical Operators

Nehab, D.; Pacheco, M. A. C. 2004. In ACM Symposium on Applied Computing, pages 1006–1012, Nicosia, Cyprus, March 2004, DOI: 10.1145/967900.968105


Abstract: The Schemata Theory analyzes the effect of the selection process, mutation and crossover over the number of individuals that belong to a given schema, within generations. This analysis considers, in it's original form, the binary coding and operators. In this article, we present an analogous study, focusing on the real number coding and arithmetical operators. Unfortunately, the conventional schema definition is tightly dependent on discrete alphabets. Therefore, following a generalization of the concept of schema, we present a particular definition that suits better the continuous domain. Using this new definition, we reach an expression similar to the Fundamental Theorem of Genetic Algorithms, valid for the real coding of chromosomes.

Multiscale Moment-Based Painterly Rendering

Nehab, D.; Velho, L. 2002. In Proceedings of the Brazilian Symposium on Computer Graphics and Image Processing, pages 244–251, Fortaleza, Brazil, October 2002


Abstract: In this paper we present a new method for painterly rendering of images. Our method extends the image-moment stroke placement algorithm in two ways: we employ a multiscale scheme for computing strokes and we provide a parametrized mechanism for controlling stroke distribution. In addition, we present a specialized image abstraction for the algorithm.

A linguagem de programação Sloth

Nehab, D.; Ierusalimschy, R. 2002. In Anais do VI Simpósio Brasileiro de Linguagens de Programação, pages 270–282, Rio de Janeiro, Brazil, June 2002


Abstract: Sloth is a simple, easy to learn, non-strict, purely functional programming language, designed to be a test bed for research on the compilation of functional programming languages. Its hybrid architecture innovates on the different programming languages chosen for its construction and resulted in the creation of a concise, self contained and portable implementation. This article introduces Sloth as a programming language, describes part of the research work currently taking advantage of the system and details its implementation, about to be completed.

Ray Path Categorization

Nehab, D.; Gattass, M. 2000. In Proceedings of the Brazilian Symposium on Computer Graphics and Image Processing, pages 227–234, Gramado, Brazil, October 2000, DOI: 10.1109/SIBGRA.2000.883917


Abstract: Edge detection and image segmentation algorithms usually operate on an image to extract geometrical information based on pixel colors. For ray-traced images, the presence of geometrical information on the scene from which the image was rendered allows for a completely different approach. We present an algorithm that divides rays into equivalence classes, or categories. The category information is generated during the rendering process and used to determine edges in the resulting image. Detected edges can later be used to help determine areas subject to aliasing. Little effort is needed to implement the described algorithms over an existing ray-tracer. Furthermore, the extra computational and memory requirements are modest.