Nehab, D.;
Teixeira, A.
2022. Cartesi. Whitepaper
|
Abstract:
Scalability problems in programmable blockchains have created a strong demand
for secure methods that move the bulk of computation outside the blockchain. One
of the preferred solutions to this problem involves off-chain computers that
compete interactively to prove to the limited blockchain that theirs is the
correct result of a given intensive computation. Each off-chain computer spends
effort linear on the cost of the computation, while the blockchain adjudicates
disputes spending only logarithmic effort. However, this effort is multiplied by
the number of competitors, rendering disputes that involve a significant number
of parties impractical and susceptible to Sybil attacks. In this paper, we
propose a practical dispute resolution algorithm by which a single honest
competitor can win disputes while spending effort linear on the cost of the
computation, but only logarithmic on the number of dishonest competitors. This
algorithm is a novel, stronger primitive for building permissionless fraud-proof
protocols, which doesn't rely on complex economic incentives to be enforced.
Links:
whitepaper [pdf] ·
arXiv [html] ·
bibtex [bib]
Nehab, D.
2020. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH 2020), On-line due to COVID-19, August 2020, 39(4):137, DOI: 10.1145/3386569.3392392
|
Abstract:
Vector graphics formats offer support for both filled and stroked primitives.
Filled primitives paint all points in the region bounded by a set of outlines.
Stroked primitives paint all points covered by a line drawn over the outlines.
Editors allow users to convert stroked primitives to the outlines of equivalent filled primitives for further editing.
Likewise, renderers typically convert stroked primitives to equivalent filled primitives prior to rendering.
This conversion problem is deceivingly difficult to solve.
Surprisingly, it has received little to no attention in the literature.
Existing implementations output too many segments, do not satisfy accuracy requirements, or fail under a variety of conditions, often spectacularly.
In this paper, we present a solution to the stroke-to-fill conversion problem that addresses these issues.
One of our key insights is to take into account the evolutes of input outlines, in addition to their offsets, in regions of high curvature.
Furthermore, our approach strives to maintain continuity between the input and the set of painted points.
Our implementation is available in open source.
Links:
paper [pdf] ·
project page [html] ·
video [youtube] ·
bibtex [bib]
Teixeira, A.;
Nehab, D.
2018. Cartesi. Whitepaper
|
Abstract:
Cartesi is a layer-2 platform for the development and deployment of scalable decentralized applications.
Cartesi DApps are composed of both blockchain and off-chain components.
Off-chain components run inside Cartesi Nodes that represent the interests of each DApp user.
Cartesi Nodes provide DApp developers with reproducible Cartesi Machines, where large scale verifiable computations can be run.
These verifiable computations are easily integrated into smart contracts by powerful primitives that provide strong conflict-resolution guarantees.
More precisely, any dispute arising over the result of computations run inside Cartesi Machines can be fairly adjudicated at negligible cost on the blockchain.
Cartesi Nodes also allow DApp developers to run native code.
Native computations can leverage the node's full processing power, including any available GPUs.
Whether performed natively by the node or inside Cartesi Machines, off-chain components run under a complete Linux operating system that provides the full ecosystem required by complex computations.
Cartesi enables DApp developers to use all the programming languages, tools, libraries, software, and services they are already familiar with.
By moving most of the complex logic of their DApps to portable off-chain components, developers are freed from the limitations and idiosyncrasies imposed by blockchains.
In this way, Cartesi empowers developers to select the best run-time environment in which to host each part of their DApps.
Links:
whitepaper [pdf] ·
bibtex [bib]
Chen, G.;
Nehab, D.;
Sander, P. V.
2018. Proceedings of the ACM on Computer Graphics and Interactive Techniques, 1(1):4, DOI: 10.1145/3203205
|
Abstract:
Replates are a new way to experience replays of sporting events. We use computer vision tools, such as camera tracking and background subtraction, to align all frames in the original video with regard to each other. Using this alignment, we create a single plate as it would be seen from a wide-angle static camera. Key players and events are then reinserted into this arena using an interactive tool, recovering the big picture that is typically lost due to camera panning and zoom. Replates are played in continuous loop. We allow replate authors to include multiple instances of each player, offset by regular intervals, so viewers can quickly and repeatedly inspect their favorite moments. Authors can also choose from a variety of effects that help them construct a narrative of their favourite plays. The interactive system gives immediate feedback allowing authors to adjust the rendered replates interactively. Replates take a fraction of the size of the original input video, and produce an effect reminiscent of video game reenactments of famous sporting events. The footage, however, is real.
Links:
paper [pdf] ·
project page [html] ·
bibtex [bib]
Han, S;
Chen, G.;
Nehab, D.;
Sander, P. V.
2018. Proceedings of the ACM on Computer Graphics and Interactive Techniques, 1(1):2, DOI: 10.1145/3203194
|
Abstract:
We present a new approach for rendering all triangles in a model in front-to-back order without the need for sorting at runtime. The method can be used for rendering order-dependent transparency effects, or to minimize overdraw, for example. The key distinguishing component in the approach is its negligible runtime cost and therefore the ease with which it can be incorporated into rendering engines. More specifically, given a viewpoint, the runtime simply selects a presorted triangle list, which we call in-depth buffers, to be rendered at full speed. These in-depth buffers are even optimized for post-transform vertex cache efficiency. The result is unmatched in front-to-back rendering performance. The difficulty is computing the smallest set of in-depth buffers required. This reduces to a graph problem that we prove to be NP-hard. Nevertheless, we have found an optimization heuristic that produces good results, particularly when visually imperceptible fragment ordering mistakes are allowed.
Links:
paper [pdf] ·
bibtex [bib]
Sacht, L.;
Nehab, D.;
Lima, R. S.
2018. International Journal of Image and Graphics, 18(3):19, DOI: 10.1142/S021946781850016X
|
Abstract:
In this work, we propose a framework that performs a number
of popular image-processing operations in the continuous
domain. This is in contrast to the standard practice of
defining them as operations over discrete sequences of
sampled values. The guiding principle is that, in order to
prevent aliasing, nonlinear image-processing operations
should ideally be performed prior to prefiltering and
sampling. This is of course impractical, as we may not have
access to the continuous input. Even so, we show that it is
best to apply image-processing operations over the
continuous reconstruction of the input. This transformed
continuous representation is then prefiltered and sampled to
produce the output. The use of high-quality reconstruction
strategies brings this alternative much closer to the ideal
than directly operating over discrete values. We illustrate
the advantages of our framework with several popular
effects. In each case, we demonstrate the quality difference
between continuous image-processing, their discrete
counterparts and previous anti-aliasing alternatives.
Finally, our GPU implementation shows that current graphics
hardware has enough computational power to perform
continuous image processing in real-time.
Links:
paper [pdf] ·
code [git] ·
bibtex [bib]
de Figueiredo, L. H.;
Nehab, D.;
Stolfi, J.;
Oliveira, J. B.
2016. Journal of Computational Dynamics, 3(2):113–137, DOI: 10.3934/jcd.2016006
|
Abstract:
We present an algorithm for computing images of polynomial
Julia sets that are reliable in the sense that they carry
mathematical guarantees against sampling artifacts and
rounding errors in floating-point arithmetic. We combine
cell mapping based on interval arithmetic with label
propagation in graphs to avoid function iteration and
rounding errors. As a result, our algorithm avoids point
sampling and can reliably classify entire rectangles in the
complex plane as being on either side of the Julia set. The
union of the rectangles that cannot be so classified is
guaranteed to contain the Julia set. Our algorithm computes
a refinable quadtree decomposition of the complex plane
adapted to the Julia set which can be used for rendering and
for approximating geometric properties such as the area of
the filled Julia set and the fractal dimension of the Julia
set.
Links:
paper [pdf] ·
bibtex [bib]
Nehab, D.;
Maximo, A.
2016. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH Asia 2016), Macao, December 2016, 35(6), DOI: 10.1145/2980179.2980222
Abstract:
Filters with slowly decaying impulse responses have many
uses in computer graphics. Recursive filters are often the
fastest option for such cases. In this paper, we derive
closed-form formulas for computing the exact initial
feedbacks needed for recursive filtering infinite input
extensions. We provide formulas for the constant-padding
(e.g. clamp-to-edge), periodic (repeat) and even-periodic
(mirror or reflect) extensions. These formulas were
designed for easy integration into modern block-parallel
recursive filtering algorithms. Our new modified algorithms
are state-of-the-art, filtering images faster even than
previous methods that ignore boundary conditions.
Links:
paper [pdf] ·
presentation [pptx] ·
code [git] ·
bibtex [bib]
Liao, J.;
Nehab, D.;
Hoppe, H.;
Sander, P. V.
2016. IEEE Transactions on Visualization and Computer Graphics, 22(7):1875–1885, DOI: 10.1109/tvcg.2015.2476794
|
Abstract:
When interpolating images, for instance in the context of morphing,
there are myriad approaches for
defining correspondence maps that align structurally similar
elements. However, the actual interpolation usually
involves simple functions for both geometric paths and color
blending. In this paper we explore new types of controls
for combining two images related by a correspondence map.
Our insight is to apply recent edge-aware decomposition
techniques, not just to the image content but to the map
itself. Our framework establishes an intuitive
low-dimensional parameter space for merging the shape and
color from the two source images at both low and high
frequencies. A gallery-based user interface enables
interactive traversal of this rich space, to either define a
morph path or synthesize new hybrid images. Extrapolation
of the shape parameters achieves compelling effects.
Finally we demonstrate an extension of the framework to
videos.
Links:
paper [pdf] ·
bibtex [bib]
Sacht, L.;
Nehab, D.
2015. IEEE Transactions on Image Processing, September 2015, 24(12):5249–5259, DOI: 10.1109/TIP.2015.2478385
|
Abstract:
We propose new quasi-interpolators for the continuous
reconstruction of sampled images, combining a
narrowly-supported piecewise-polynomial kernel and
an efficient digital filter.
In other words, our quasi-interpolators fit within the
generalized sampling framework and are straightforward to
use. We go against standard practice and optimize for
approximation quality over the entire Nyquist range, rather
than focusing exclusively on the asymptotic behavior as the
sample spacing goes to zero. In contrast to previous work,
we jointly optimize with respect to all degrees of freedom
available in both the kernel and the digital filter. We
consider linear, quadratic, and cubic schemes, offering
different trade-offs between quality and computational cost.
Experiments with compounded rotations and translations over
a range of input images confirm that, due to the additional
degrees of freedom and the more realistic objective
function, our new quasi-interpolators perform
better than the state-of-the-art, at a similar
computational cost.
Links:
paper [pdf] ·
data [zip] ·
bibtex [bib]
Ganacim, F.;
Lima, R. S.;
de Figueiredo, L. H.;
Nehab, D.
2014. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH Asia 2014), Shenzhen, December 2014, 33(6):229, DOI: 10.1145/2661229.2661274
|
Abstract:
We present a massively parallel vector graphics rendering
pipeline that is divided into two components. The
preprocessing component builds a novel adaptive acceleration
data structure, the shortcut tree. Tree construction
is efficient and parallel at the segment level, enabling
dynamic vector graphics. The tree allows efficient random
access to the color of individual samples, so the graphics
can be warped for special effects. The rendering component
processes all samples and pixels in parallel. It was
optimized for wide antialiasing filters and a large number
of samples per pixel to generate sharp, noise-free images.
Our sample scheduler allows pixels with overlapping
antialiasing filters to share samples. It groups together
samples that can be computed with the same vector operations
using little memory or bandwidth. The pipeline is
feature-rich,
supporting multiple layers of filled paths, each defined by
curved outlines (with linear, rational quadratic, and
integral cubic Bézier segments), clipped against other
paths, and painted with semi-transparent colors, gradients,
or textures. We demonstrate renderings of complex vector
graphics in state-of-the-art quality and performance.
Finally, we provide full source-code for our implementation
as well as the input data used in the paper.
Links:
paper [pdf] ·
presentation [pptx] ·
project page [html] ·
bibtex [bib]
Liao, J.;
Lima, R. S.;
Nehab, D.;
Hoppe, H.;
Sander, P. V.
2014. Computer Graphics Forum (Proceedings of Eurographics Symposium on Rendering 2014), Lyon, June 2014, 33(4):51–60, DOI: 10.1111/cgf.12412
|
Abstract:
We explore creating smooth transitions between videos of
different scenes. As in traditional image morphing, good
spatial correspondence is crucial to prevent ghosting,
especially at silhouettes. Video morphing presents added
challenges. Because motions are often unsynchronized,
temporal alignment is also necessary. Applying morphing to
individual frames leads to discontinuities, so temporal
coherence must be considered. Our approach is to optimize a
full spatiotemporal mapping between the two videos. We
reduce tedious interactions by letting the optimization
derive the fine-scale map given only sparse user-specified
constraints. For robustness, the optimization objective
examines structural similarity of the video content. We
demonstrate the approach on a variety of videos, obtaining
results using few explicit correspondences.
Links:
paper [pdf] ·
bibtex [bib]
Liao, J.;
Lima, R. S.;
Nehab, D.;
Hoppe, H.;
Sander, P. V.;
Yu, J.
2014. ACM Transactions on Graphics (Presented at ACM SIGGRAPH 2014), Vancouver, August 2014, 33(5):168, DOI: 10.1145/2629494
|
Abstract:
The main challenge in achieving good image morphs is to
create a map that aligns corresponding image elements. Our
aim is to help automate this often tedious task. We compute
the map by optimizing the compatibility of corresponding
warped image neighborhoods using an adaptation of structural
similarity. The optimization is regularized by a thin-plate
spline, and may be guided by a few user-drawn points. We
parameterize the map over a halfway domain and show that
this representation offers many benefits. The map is able to
treat the image pair symmetrically, model simple occlusions
continuously, span partially overlapping images, and define
extrapolated correspondences. Moreover, it enables direct
evaluation of the morph in a pixel shader without mesh
rasterization. We improve the morphs by optimizing quadratic
motion paths and by seamlessly extending content beyond the
image boundaries. We parallelize the algorithm on a GPU to
achieve a responsive interface and demonstrate challenging
morphs obtained with little effort.
Links:
paper [pdf] ·
bibtex [bib]
Nehab, D.;
Hoppe, H.
2014. Foundations and Trends in Computer Graphics and Vision, March 2014, 8(1):1–84, DOI: 10.1561/0600000053
|
Abstract:
Discretization and reconstruction are fundamental
operations in computer graphics, enabling the conversion
between sampled and continuous representations. Major
advances in signal-processing research have shown that these
operations can often be performed more efficiently by
decomposing a filter into two parts: a compactly supported
continuous-domain function and a digital filter. This
strategy of "generalized sampling" has appeared in a few
graphics papers, but is largely unexplored in our community.
This survey broadly summarizes the key aspects of the
framework, and delves into specific applications in
graphics. Using new notation, we concisely present and
extend several key techniques. In addition, we demonstrate
benefits for prefiltering in image downscaling and
supersample-based rendering, and analyze the effect that
generalized sampling has on the noise due to Monte Carlo
estimation. We conclude with a qualitative and quantitative
comparison of traditional and generalized filters.
Links:
paper [pdf] ·
bibtex [bib]
Chen, G.;
Sander, P. V.;
Nehab, D.;
Yang, L.;
Hu, L.
2012. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH Asia 2012), Singapore, November 2012, 31(6):160, DOI: 10.1145/2366145.2366179
|
Abstract:
We present a novel approach for real-time rendering of
static 3D models front-to-back or back-to-front relative to
any viewpoint outside its bounding volume. The approach
renders depth-sorted triangles using a single draw-call. At
run-time, we replace the traditional sorting strategy of
existing algorithms with a faster triangle selection
strategy. The selection process operates on an extended
sequence of triangles annotated by test planes, created by
our off-line preprocessing stage. Based on these test
planes, a simple run-time procedure uses the given viewpoint
to select a subsequence of triangles for rasterization.
Selected subsequences are statically presorted by depth and
contain each input triangle exactly once. Our method runs on
legacy hardware and renders depth-sorted static models
significantly faster than previous approaches. We conclude
demonstrating the real-time rendering of order-independent
transparency effects.
Links:
paper [pdf] ·
bibtex [bib]
Scherzer, D.;
Yang, L.;
Mattausch, O.;
Nehab, D.;
Sander, P. V.;
Wimmer, M.;
Eisemann, E.
2012. Computer Graphics Forum, December 2012, 31(8):2378-2408, DOI: 10.1111/j.1467-8659.2012.03075.x
|
Abstract:
Nowadays, there is a strong trend towards rendering to
higher-resolution displays and at high frame rates. This
development aims at delivering more detail and better
accuracy, but it also comes at a significant cost. Although
graphics cards continue to evolve with an ever-increasing
amount of computational power, the speed gain is easily
counteracted by increasingly complex and sophisticated
shading computations. For real-time applications, the direct
consequence is that image resolution and temporal resolution
are often the first candidates to bow to the performance
constraints (e.g., although full HD is possible, PS3 and
XBox often render at lower resolutions).
In order to achieve high-quality rendering at a lower cost,
one can exploit temporal coherence (TC). The underlying
observation is that a higher resolution and frame rate do
not necessarily imply a much higher workload, but a larger
amount of redundancy and a higher potential for amortizing
rendering over several frames. In this survey, we
investigate methods that make use of this principle and
provide practical and theoretical advice on how to exploit
temporal coherence for performance optimization. These
methods not only allow incorporating more computationally
intensive shading effects into many existing applications,
but also offer exciting opportunities for extending high-end
graphics applications to lower-spec consumer-level hardware.
To this end, we first introduce the notion and main concepts
of TC, including an overview of historical methods. We then
describe a general approach, image-space reprojection, with
several implementation algorithms that facilitate reusing
shading information across adjacent frames. We also discuss
data-reuse quality and performance related to reprojection
techniques. Finally, in the second half of this survey, we
demonstrate various applications that exploit TC in real-
time rendering.
Links:
paper [pdf] ·
bibtex [bib]
Nehab, D.;
Maximo, A.;
Lima, R. S.;
Hoppe, H.
2011. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH Asia 2011), Hong Kong, December 2011, 30(6):176, DOI: 10.1145/2024156.2024210
|
Abstract:
Image processing operations like blurring, inverse
convolution, and summed-area tables are often computed
efficiently as a sequence of 1D recursive filters. While
much research has explored parallel recursive filtering,
prior techniques do not consider optimization across the
entire filter sequence. Typically, a separate filter (or
often a causal-anticausal filter pair) is required in each
dimension. Computing these filter passes independently
results in significant traffic to global memory, a crucial
bottleneck in GPU systems. We present a new algorithmic
framework for parallel evaluation. It partitions the image
into 2D blocks, with a small band of data buffered along
each block perimeter. We show that these perimeter bands are
sufficient to accumulate the effects of the successive
filters. A remarkable result is that the image data is read
only twice and written just once, independent of image size,
and thus total memory bandwidth is reduced even compared to
the traditional serial algorithm. We demonstrate significant
speedups in GPU computation.
Links:
paper [pdf] ·
presentation [pptx] ·
project page [html] ·
bibtex [bib]
Yang, L.;
Tse, Y.-C.;
Sander, P. V.;
Lawrence, J. D.;
Nehab, D.;
Hoppe, H.;
Wilkins, C. L.
2011. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH Asia 2011), Hong Kong, December 2011, 30(6):150, DOI: 10.1145/2070781.2024184
|
Abstract:
We introduce a method for increasing the framerate of
real-time rendering applications. Whereas many existing
temporal upsampling strategies only reuse information from
previous frames, our bidirectional technique reconstructs
intermediate frames from a pair of consecutive rendered
frames. This significantly improves the accuracy and
efficiency of data reuse since very few pixels are mutually
occluded in both frames. We present two versions of this
basic algorithm. The first is appropriate for fill-bound
scenes as it limits the number of expensive shading
calculations, but involves rasterization of scene geometry
at each intermediate frame. The second version, our more
significant contribution, reduces both shading and geometry
computations by performing reprojection using only
image-based buffers. It warps and combines the adjacent
rendered frames using an efficient iterative search on their
stored scene depth and flow. Bidirectional reprojection
introduces a small amount of lag. We perform a user study
to investigate this lag, and find that its effect is minor.
We demonstrate substantial performance improvements
(3–4×) for a variety of applications, including
vertex-bound and fill-bound scenes, multi-pass effects, and
motion blur.
Links:
paper [pdf] ·
presentation [pptx] ·
bibtex [bib]
Ganacim, F.;
de Figueiredo, L. H.;
Nehab, D.
2011. In Proceedings of the Brazilian Symposium on Computer Graphics and Image Processing, pages 72–77, Maceió, Brazil, August 2011, DOI: 10.1109/SIBGRAPI.2011.5
Abstract:
We present a GPU-based adaptive method for rendering
implicit surfaces with beam casting. We use interval
arithmetic to model the beams and to detect their
intersections with the surface. We show how beams can be
used to quickly discard large empty regions in the image,
thus reducing the number of operations needed to render the
surface. Better performance is achieved using an adaptive
space subdivision process. We are able to beam-cast
algebraic surfaces of degree up to 8, at better than
real-time rates, with anti-aliasing.
Links:
paper [pdf] ·
bibtex [bib]
Scherzer, D.;
Yang, L.;
Mattausch, O.;
Nehab, D.;
Sander, P. V.;
Wimmer, M.;
Eisemann, E.
2011. In Eurographics State of the Art Reports, Llandudno, UK, May 2011, DOI: 10.2312/EG2011/stars/101-126
|
Abstract:
Nowadays, there is a strong trend towards rendering to
higher-resolution displays and at high frame rates. This
development aims at delivering more detail and better
accuracy, but it also comes at a significant cost. Although
graphics cards continue to evolve with an ever-increasing
amount of computational power, the processing gain is
counteracted to a high degree by increasingly complex and
sophisticated pixel computations. For real-time
applications, the direct consequence is that image
resolution and temporal resolution are often the first
candidates to bow to the performance constraints (e.g.,
although full HD is possible, PS3 and XBox often render at
lower resolutions).
In order to achieve high-quality rendering at a lower cost,
one can exploit temporal coherence (TC). The underlying
observation is that a higher resolution and frame rate do
not necessarily imply a much higher workload, but a larger
amount of redundancy and a higher potential for amortizing
rendering over several frames. In this state-of-the-art
report, we investigate methods that make use of this
principle and provide practical and theoretical advice on
how to exploit temporal coherence for performance
optimization. These methods not only allow incorporating
more computationally intensive shading effects into many
existing applications, but also offer exciting opportunities
for extending high-end graphics applications to lower-spec
consumer-level hardware. To this end, we first introduce the
notion and main concepts of TC, including an overview of
historical methods. We then describe a key data structure,
the so-called reprojection cache, with several supporting
algorithms that facilitate reusing shading information from
previous frames, and finally illustrated its usefulness in
various applications.
Links:
paper [pdf] ·
bibtex [bib]
Li, H.;
Nehab, D.;
Wei, L.-Y.;
Sander, P. V.;
Fu, C.-W.
2010. In Symposium on Interactive 3D Graphics and Games, Washington, DC, February 2010, DOI: 10.1145/1730804.1730985
Abstract:
Capacity constrained Voronoi tessellation (CCVT) addresses a
crucial quality issue of Lloyd relaxation but at the expense
of slower computation, which could hinder its potential
wide adoption. We present a fast capacity constrained
Voronoi tessellation algorithm which is orders of magnitude
faster than the original method proposed by Balzer et al.
[2009] (and 10× faster than a previous accelerated
implementation of the same technique) while maintaining
excellent distribution quality and scaling very well as the
number of points increase.
Links:
short paper [pdf] ·
technical report [pdf] ·
bibtex [bib]
Yang, L.;
Nehab, D.;
Sander, P. V.;
Sitthi-amorn, P.;
Lawrence, J. D.;
Hoppe, H.
2009. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH Asia 2009), Yokohama, Japan, December 2009, 28(5):135, DOI: 10.1145/1618452.1618481
|
Abstract:
We present a real-time rendering scheme that reuses shading
samples from earlier time frames to achieve practical
antialiasing of procedural shaders. Using a reprojection
strategy, we maintain several sets of shading estimates at
subpixel precision, and incrementally update these such that
for most pixels only one new shaded sample is evaluated per
frame. The key difficulty is to prevent accumulated blurring
during successive reprojections. We present a theoretical
analysis of the blur introduced by reprojection methods.
Based on this analysis, we introduce a nonuniform spatial
filter, an adaptive recursive temporal filter, and a
principled scheme for locally estimating the spatial blur.
Our scheme is appropriate for antialiasing shading
attributes that vary slowly over time. It works in a single
rendering pass on commodity graphics hardware, and offers
results that surpass 4x4 stratified supersampling in
quality, at a fraction of the cost.
Links:
paper [pdf] ·
bibtex [bib]
Nehab, D.;
Hoppe, H.
2008. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH Asia 2008), Singapore, December 2008, 27(5):135, DOI: 10.1145/1409060.1409088
Previously Microsoft Research Technical Report MSR-TR-2007-95, July 2007
|
Abstract:
We introduce a novel representation for random-access
rendering of antialiased vector graphics on the GPU,
along with efficient encoding and rendering algorithms. The
representation supports a broad class of vector primitives,
including multiple layers of semitransparent filled and
stroked shapes, with quadratic outlines and color gradients.
Our approach is to create a coarse lattice in which each
cell contains a variable-length encoding of the graphics
primitives it overlaps. These cell-specialized encodings are
interpreted at runtime within a pixel shader. Advantages
include localized memory access and the ability to map
vector graphics onto arbitrary surfaces, or under arbitrary
deformations. Most importantly, we perform both prefiltering
and supersampling within a single pixel shader invocation,
achieving inter-primitive antialiasing at no added memory
bandwidth cost. We present an efficient encoding algorithm,
and demonstrate high-quality realtime rendering of complex,
real-world examples.
Links:
paper [pdf] ·
presentation [pptx] ·
technical report [pdf] ·
demo [html] ·
bibtex [bib]
Sander, P. V.;
Nehab, D.;
Chlamtac, E.;
Hoppe, H.
2008. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH Asia 2008), Singapore, December 2008, 27(5):144, DOI: 10.1145/1409060.1409097
|
Abstract:
Processing of mesh edges lies at the core of many advanced
realtime rendering techniques, ranging from shadow and
silhouette computations, to motion blur and fur rendering.
We present a scheme for efficient traversal of mesh edges
that builds on the adjacency primitives and programmable
geometry shaders introduced in recent graphics hardware. Our
scheme aims to minimize the number of primitives while
maximizing SIMD parallelism. These objectives reduce to a
set of discrete optimization problems on the dual graph of
the mesh, and we develop practical solutions to these graph
problems. In addition, we extend two existing vertex cache
optimization algorithms to produce cache-efficient traversal
orderings for adjacency primitives. We demonstrate
significant runtime speedups for several practical real-time
rendering algorithms.
Links:
paper [pdf] ·
bibtex [bib]
Sitthi-amorn, P.;
Lawrence, J. D.;
Yang, L.;
Sander, P. V.;
Nehab, D.;
Xi, J.
2008. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH Asia 2008), Singapore, December 2008, 27(5):127, DOI: 10.1145/1409060.1409080
|
Abstract:
We present a framework and supporting algorithms to automate
the use of data reprojection as a general tool for
optimizing procedural shaders. Although the general strategy
of caching and reusing expensive intermediate shading
calculations across consecutive frames has previously been
shown to provide an effective trade-off between speed and
accuracy, the critical choices of what to reuse and at what
rate to refresh cached entries have been left to a designer.
The fact that these decisions require a deep understanding
of a procedure's semantic structure makes it challenging to
select optimal candidates among possibly hundreds of
alternatives. Our automated approach relies on parametric
models of the way possible caching decisions affect the
shader's performance and visual fidelity. These models are
trained using a sample rendering session and drive an
interactive profiler in which the user can explore the
error/performance trade-offs associated with incorporating
temporal reprojection. We evaluate the proposed models and
selection algorithm with a prototype system used to optimize
several complex shaders and compare our approach to current
alternatives.
Links:
paper [pdf] ·
bibtex [bib]
Brown, B. J.;
Toler-Franklin, C.;
Nehab, D.;
Burns, M.;
Dobkin, D.;
Vlachopoulos, A.;
Doumas, C.;
Weyrich, T.;
Rusinkiewicz, S.
2008. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH 2008), Los Angeles, California, August 2008, 27(3):84, DOI: 10.1145/1360612.1360683
|
Abstract:
Although mature technologies exist for acquiring images,
geometry, and normals of small objects, they remain
cumbersome and time-consuming for non-experts to employ on a
large scale. In an archaeological setting, a practical
acquisition system for routine use on every artifact and
fragment would open new possibilities for archiving,
analysis, and dissemination. We present an inexpensive
system for acquiring all three types of information, and
associated metadata, for small objects such as fragments of
wall paintings. The acquisition system requires minimal
supervision, so that a single, non-expert user can scan at
least 10 fragments per hour. To achieve this performance, we
introduce new algorithms to robustly and automatically align
range scans, register 2-D scans to 3-D geometry, and compute
normals from 2-D scans. As an illustrative application, we
present a novel 3-D matching algorithm that efficiently
searches for matching fragments using the scanned geometry.
Links:
paper [pdf] ·
bibtex [bib]
Nehab, D.;
Weyrich, T.;
Rusinkiewicz, S.
2008. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, Anchorage, Alaska, June 2008, DOI: 10.1109/CVPR.2008.4587683
|
Abstract:
In this work, we consider the dense reconstruction of glossy
objects. We propose the use of a specularity constraint,
based on surface normal consistency, to define a matching
cost function that can drive standard stereo reconstruction
methods. We also present an aggregation method based on
anisotropic diffusion that is particularly suitable for this
matching cost function.
Following a theoretical discussion on the types of ambiguity
that can arise from the proposed constraint, we present a
controlled illumination setup that includes a stereo camera
pair, and one LCD monitor used as a calibrated,
variable-position light source. We then use the setup to
evaluate the proposed method on real and synthetic data, and
demonstrate its capacity to recover high-quality depth and
orientation from specular objects.
Links:
paper [pdf] ·
bibtex [bib]
Sitthi-amorn, P.;
Lawrence, J. D.;
Yang, L.;
Sander, P. V.;
Nehab, D.
2008. In ACM SIGGRAPH/Eurographics Symposium on Graphics Hardware, pages 95–101, Sarajevo, Bosnia-Herzegovina, June 2008, DOI: 10.2312/EGGH/EGGH08/095-101
Abstract:
Several recently proposed techniques based on the principle
of data reprojection allow reusing shading information
generated in one frame to accelerate the calculation of the
shading in the following frame. This strategy can
significantly reduce the average rendering cost for many
important real-time effects at an acceptable level of
approximation error. This paper analyzes the overhead
associated with incorporating temporal data reprojection on
modern GPUs. Based on this analysis, we propose an
alternative algorithm to those previously described in the
literature and measure its efficiency for multiple scenes
and hardware platforms.
Links:
paper [pdf] ·
bibtex [bib]
Nehab, D.;
Sander, P. V.;
Lawrence, J. D.;
Tatarchuk, N.;
Isidoro, J. R.
2007. In ACM SIGGRAPH/Eurographics Symposium on Graphics Hardware, pages 25–35, San Diego, California, August 2007, DOI: 10.5555/1280094.1280098
|
Abstract:
Evaluating pixel shaders consumes a growing share of the
computational budget for real-time applications. How- ever,
the significant temporal coherence in visible surface
regions, lighting conditions, and camera location al- lows
reusing computationally-intensive shading calculations
between frames to achieve significant performance
improvements at little degradation in visual quality. This
paper investigates a caching scheme based on reverse
reprojection which allows pixel shaders to store and reuse
calculations performed at visible surface points. We provide
guidelines to help programmers select appropriate values to
cache and present several policies for keeping cached
entries up-to-date. Our results confirm this approach offers
substantial performance gains for many com- mon real-time
effects, including precomputed global lighting effects,
stereoscopic rendering, motion blur, depth of field, and
shadow mapping.
Links:
paper [pdf] ·
presentation [ppt] ·
siggraph'06 sketch [pdf] ·
technical report [pdf] ·
bibtex [bib]
Sander, P. V.;
Nehab, D.;
Barczak, J.
2007. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH 2007), San Diego, California, July 2007, 26(3):89, DOI: 10.1145/1276377.1276489
|
Abstract:
We present novel algorithms that optimize the order in which
triangles are rendered, to improve post-transform vertex
cache efficiency as well as for view-independent overdraw
reduction. The resulting triangle orders perform on par
with previous methods, but are orders magnitude faster to
compute.
The improvements in processing speed allow us to perform the
optimization right after a model is loaded, when more
information on the host hardware is available. This allows
our vertex cache optimization to often outperform other
methods. In fact, our algorithms can even be executed
interactively, allowing for re-optimization in case of
changes to geometry or topology, which happen often in
CAD/CAM applications. We believe that most real-time
rendering applications will immediately benefit from these
new results.
Links:
paper [pdf] ·
presentation [ppt] ·
bibtex [bib]
Nehab, D.;
Barczak, J.;
Sander, P. V.
2006. In Symposium on Interactive 3D Graphics and Games, pages 207–211, Redwood City, California, March 2006, DOI: 10.1145/1111411.1111448
Released as ATI Tootle library
One of three papers invited for the I3D/UIST/Sandbox Reprise session in SIGGRAPH 2006
|
Abstract:
We describe an automatic preprocessing algorithm that
reorders triangles in a mesh so as to enable the graphics
hardware to efficiently cull vertex and pixel processing at
rendering time.
Our method starts by dividing the mesh into planar clusters
which are subsequently sorted into a view-independent order
which greatly reduces overdraw. The result is an increase in
the opportunities for early Z-culling, reducing pixel
processing time. The clusters are then optimized for mesh
locality. This produces high rates of vertex cache hits,
reducing vertex processing time.
We have found that our method brings the overdraw rates of a
wide range of models close to that of front-to-back order,
while preserving state of the art vertex cache performance.
This results in higher frame-rates for pixel bound
applications with no penalty to vertex-bound applications.
Links:
paper [pdf] ·
code [git] ·
siggraph'06 reprise [pdf] ·
bibtex [bib]
Nehab, D.;
Rusinkiewicz, S.;
Davis, J. E.
2005. In IEEE International Conference on Computer Vision, volume 1, pages 557–563, Beijin, China, October 2005, DOI: 10.1109/ICCV.2007.4409212
Abstract:
Most dense stereo correspondence algorithms start by
establishing discrete pixel matches and later refine these
matches to sub-pixel precision. Traditional sub-pixel
refinement methods attempt to determine the precise location
of points, in the secondary image, that correspond to
discrete positions in the reference image. We show that this
strategy can lead to a systematic bias associated with the
violation of the general symmetry of matching cost
functions. This bias produces random or coherent noise in
the final reconstruction, but can be avoided by refining
both image coordinates simultaneously, in a symmetric way.
We demonstrate that the symmetric sub-pixel refinement
strategy results in more accurate correspondences by
avoiding bias while preserving detail.
Links:
paper [pdf] ·
bibtex [bib]
Nehab, D.;
Rusinkiewicz, S.;
Davis, J. E.;
Ramamoorthi, R.
2005. ACM Transactions on Graphics (Proceedings of the ACM SIGGRAPH 2005), Los Angeles, California, July 2005, 24(3):536–543, DOI: 10.1145/1186822.1073226
|
Abstract:
Range scanning, manual 3D editing, and other modeling
approaches can provide information about the geometry of
surfaces in the form of either 3D positions (e.g., triangle
meshes or range images) or orientations (normal maps or bump
maps). We present an algorithm that combines these two
kinds of estimates to produce a new surface that
approximates both. Our formulation is linear, allowing it
to operate efficiently on complex meshes commonly used in
graphics. It also treats high- and low-frequency components
separately, allowing it to optimally combine outputs from
data sources such as stereo triangulation and photometric
stereo, which have different error-vs.-frequency
characteristics. We demonstrate the ability of our
technique to both recover high-frequency details and avoid
low-frequency bias, producing surfaces that are more widely
applicable than position or orientation data alone.
Links:
paper [pdf] ·
presentation [ppt] ·
code [html] ·
bibtex [bib]
Davis, J. E.;
Nehab, D.;
Ramamoorthi, R.;
Rusinkiewicz, S.
2005. IEEE Transactions on Pattern Analysis and Machine Intelligence, February 2005, 27(2):296–302, DOI: 10.1109/TPAMI.2005.37
|
Abstract:
Depth from triangulation has traditionally been investigated
in a number of independent threads of research, with methods
such as stereo, laser scanning, and coded structured light
considered separately. In this paper, we propose a common
framework called spacetime stereo that unifies and
generalizes many of these previous methods. Viewing specific
techniques as special cases of this general framework leads
to insights regarding solutions to many of the traditional
problems of individual techniques. Specifically, we discuss
a number of possible applications such as improved recovery
of static scenes under variable illumination, spacetime
stereo for moving objects, structured light and laser
scanning with multiple simultaneous stripes or patterns, and
laser scanning of shiny objects. To suggest the practical
utility of the framework, we use it to analyze two of these
applications - recovery of static scenes under variable, but
uncontrolled and unstructured illumination, and depth
estimation in dynamic scenes. Based on our analysis, we show
that methods derived from the spacetime stereo framework can
be used to recover depth in situations in which existing
methods perform poorly.
Links:
paper [pdf] ·
technical report [pdf] ·
bibtex [bib]
Nehab, D.;
Shilane, P.
2004. In Eurographics Symposium on Point-Based Graphics, pages 49–56, Zurich, Switzerland, February 2004, DOI: 10.2312/SPBG/SPBG04/049-056
Abstract:
Point sampling is an important intermediate step for a
variety of computer graphics applications, and specialized
sampling strategies have been developed to satisfy the
requirements of each problem. In this article, we present a
technique to generate a stratified sampling of 3D models
that is applicable across many domains. The algorithm
voxelizes the model and selects one sample per voxel,
restricted to the original model's surface. Parameters
allow control of the uniformity of the sample placement and
the minimum distance between samples. We demonstrate the
effectiveness of this technique in selecting stroke
locations for painterly rendering models and for producing
sampled geometry used as input to shape descriptors.
Links:
paper [pdf] ·
code [cpp] ·
bibtex [bib]
Nehab, D.;
Pacheco, M. A. C.
2004. In ACM Symposium on Applied Computing, pages 1006–1012, Nicosia, Cyprus, March 2004, DOI: 10.1145/967900.968105
Abstract:
The Schemata Theory analyzes the effect of the selection
process, mutation and crossover over the number of
individuals that belong to a given schema, within
generations. This analysis considers, in it's original
form, the binary coding and operators. In this article,
we present an analogous study, focusing on the real
number coding and arithmetical operators.
Unfortunately, the conventional schema definition is
tightly dependent on discrete alphabets. Therefore,
following a generalization of the concept of schema, we
present a particular definition that suits better the
continuous domain. Using this new definition, we
reach an expression similar to the Fundamental
Theorem of Genetic Algorithms, valid for the real
coding of chromosomes.
Links:
paper [pdf] ·
bibtex [bib]
Nehab, D.;
Velho, L.
2002. In Proceedings of the Brazilian Symposium on Computer Graphics and Image Processing, pages 244–251, Fortaleza, Brazil, October 2002
Abstract:
In this paper we present a new method for painterly
rendering of images. Our method extends the image-moment
stroke placement algorithm in two ways: we employ a
multiscale scheme for computing strokes and we provide a
parametrized mechanism for controlling stroke distribution.
In addition, we present a specialized image abstraction for
the algorithm.
Links:
paper [pdf] ·
presentation [ppt] ·
bibtex [bib]
Nehab, D.;
Ierusalimschy, R.
2002. In Anais do VI Simpósio Brasileiro de Linguagens de Programação, pages 270–282, Rio de Janeiro, Brazil, June 2002
Abstract:
Sloth is a simple, easy to learn, non-strict, purely
functional programming language, designed to be a test bed
for research on the compilation of functional programming
languages. Its hybrid architecture innovates on the
different programming languages chosen for its construction
and resulted in the creation of a concise, self contained
and portable implementation. This article introduces Sloth
as a programming language, describes part of the research
work currently taking advantage of the system and details
its implementation, about to be completed.
Links:
paper [pdf] ·
bibtex [bib]
Nehab, D.;
Gattass, M.
2000. In Proceedings of the Brazilian Symposium on Computer Graphics and Image Processing, pages 227–234, Gramado, Brazil, October 2000, DOI: 10.1109/SIBGRA.2000.883917
Abstract:
Edge detection and image segmentation algorithms usually
operate on an image to extract geometrical information based
on pixel colors. For ray-traced images, the presence of
geometrical information on the scene from which the image
was rendered allows for a completely different approach. We
present an algorithm that divides rays into equivalence
classes, or categories. The category information is
generated during the rendering process and used to determine
edges in the resulting image. Detected edges can later be
used to help determine areas subject to aliasing. Little
effort is needed to implement the described algorithms over
an existing ray-tracer. Furthermore, the extra computational
and memory requirements are modest.
Links:
paper [pdf] ·
presentation [ppt] ·
bibtex [bib]