A Realtime Anti-Aliased Soft-Shadow Casting Renderer


Tsuneo Ikedo
Computer and Information Sciences, Dept. of Digital Media Science
Hosei University
Kajino-cho, Koganei-City
Tokyo 184-8584; Japan
voice: [+81](42)387-4554;



Abstract: A renderer for realtime soft shadow casting on the basis of a two-pass z buffer (shadow mapping) has been developed, embedded within an ASIC. Functions for erosive and penumbra effects, along with a filtering method for shadow polygons to generate soft shadows and shadows from transparent objects, are newly defined. The renderer consists purely of hardware modules, including multiple shadow buffers, bi-directional IIR filters, and intensity modulation circuits, involving textured and bump-mapped light-reflection shaders. It produces a shadowed pixel at 0.8ns per pixel (1.2 billion pixels), comprising a fully unidirectional pipeline architecture.


The visual and spatial perceptions of an artificial world are driven by two fundamental and inseparable functions: light-reflected shading and shadow casting. Radiosity or distributed ray tracing are considered the best schemes to produce subtle and vital scenes with a light-reflection model. However, the computational costs inherent in these algorithms preclude their applications to walk-through virtual reality (VR) systems, which require realtime performance. Polygon-interpolative rendering has predominated for realtime visualization systems up to the present. However, it has not yet been fully implemented in hardware combined with true three-dimensional light-reflected shading and shadow casting, due to the complexity of algorithms requiring global illumination modeling and normal-vector computation that avoids pipeline processing. With this background, hardware implementation of shadow casting in particular has been limited up to the present, despite the fact that several algorithms have been proposed, e.g., scan line [1], shadow polygon and clipping [2], shadow volume [3], and z-buffer [4]. Current implementations are based mainly in pure software or combinations of software and specific hardware using stencil or accumulation buffers [5]. We classify accumulation or stencil buffer technologies using multiple shadow volumes [6] as a hybrid approach, due to remaining heavy workload in a geometric accelerator that runs by software. Casting of anti-aliased soft shadows with transparent objects is still performed by softwaree.g., cone tracing [7], distributed ray-tracing [8], radiosity [9], and backward ray tracing [10]. Principal challenges regarding algorithmic implementation in hardware lie with the following functions:

  1. Global illumination between viewing objects, light-sources, and occluders;

  2. Multiple coordinate transformations and shadow depth-comparisons at every interpolation on polygon interiors; and

  3. Umbra/penumbra and anti-aliased projections including shadows from transparent objects.

    Meanwhile, to support the functionality of recently proposed gaming chips [11], the renderer must perform more than 0.1 billion polygons per second, namely a billion pixels per sec. To maintain sufficient performance that the functions of light-reflected shading, soft shadow casting, and full-scene antialiasing run together, the renderer must generate a shaded pixel much faster than a single clock cycle (chip clock frequency). This cannot be obtained by hybrid methods using multiple shadow volumes in which performance is related to the numbers of polygons and levels of shadow gradations. The only successful approach to obtain such performance may use complete pipeline processing relying on pure hardware, according to the LSI (large-scale integration) roadmap of next half decade [12]. Our goal is to render soft shadows with a pixel per single clock tick, developing five new techniques to satisfy the above conditions. These are:

    1. Simultaneous interpolations of eye-point and light-source coordinates on the second pass with definitions of a shadow polygon flag and polygon identifier for eliminating error-projection of shadows;

    2. Definition of light-erosive and penumbra functions for soft shadowing;

    3. Bi-directional IIR (infinite impulse response) filtering for antialiasing penumbra effects;

    4. Specific buffers and light-reflected shading for shadows cast from transparent objects; and

    5. Pipeline processing to output shadow pixels within a single clock cycle.

    This paper proposes a practical implementation technology for realtime soft shadow casting.

    Shadow Casting Renderer

    Our system embodies various processorse.g., Phong, Cook/Torrance, anisotropic reflections, bump-mapping, texture mapping, environment mapping and pixel cache combined with multidimensional arrayed frame buffer [13]. The shadow casting renderer is one of these pure hardware modules. Figure 1 shows the circuit diagram for light-reflection and shadow casting, including modules for texture mapping, bump-mapped rough surface shading, environment mapping and soft shadow casting. The shadow casting and light-reflection modules share an outline interpolator and span processor. The visible pixel is stored into the frame buffer after merging the intensities from the bump-mapped shader, environment mapping and shadow casting modules.

    Figure 1. Internal Block Diagram of Light-Reflection Shader

    Hardware Problems in Shadow Casting Algorithms:

    Two-pass Z buffer (Shadow Mapping) Algorithm

    A two-pass z-buffer (shadow mapping) algorithm is the most popular scheme for shadow casting suitable for both hardware and software implementations, as it can render shadows with only a small workload imposed on a geometry accelerator. However, this algorithm has various problems for practical realtime rendering, as described below.

    1. Coordinates Transformation

      A shadow mapping algorithm defines an object in a light-source coordinates (LSCs) systemL1x, L1y, L1zand renders shadow polygons with regular interpolative procedures and hidden surface removal at the first pass. The distance (depth) L1z of a (shadow) polygon from the light-source position is stored into specified memory, called the shadow buffer, addressed as (L1x, L1y).

      At the second pass, the polygon vertices are defined in an eye-point coordinates (EPCs) system: Vx, Vy, Vz, Vw. In this pass, interpolated coordinates (Vxi, Vyi, Vzi, Vwi) at point i on a polygon interior must be transformed to the specified point (L2xi, L2yi, L2zi) in LSCs simultaneously in order to read the depth-value L1zi of the shadow buffer and to compare L2zi with L1zi. If the value L2zi - L1zi is negative (L2zi is nearer to the light-source), the interpolated point is regarded as being lightened; otherwise, it is shadowed. The coordinates transformation from EPCs to LSCs at point i has the following relation:

      where D, N1, R, P and N2 are the 4 x 4 matrices of workstation (viewport), coordinates normalization of the first pass, rotation to light-source, perspective projection, and coordinates normalization of the second pass, respectively. Realtime rendering may not be realized if the above transformation is applied for every interpolation of polygon interior. A projective texture [14] is a candidate to avoid the transformation. However, this method needs to identify occluding and shadow receiving objects or to combine shadow volume methods. Moreover, this method imposes a heavy workload for copying shadow textures to texture memories when the light-source or occluding or receiving object moves every frame cycle [15].

    2. Shadow buffer resolution

      The shadow mapping scheme manifests a serious error during depth comparison on the first pass. The z-buffer stores a single depth-value per pixel, at the nearest position to the light-source. If the polygon surface is placed nearly parallel to the Lz axis, almost the entire interior of a surface is behind the polygon-edge. Only depth values along polygon-edges are stored. On the second pass, if this polygon is placed nearly perpendicular to the Vz axis, almost all depth-values of interpolated interiors are regarded as if they were located behind the stored depths, and the renderer flags those points as shadowed pixels in spite of the lightened surface. This problem is called �self-shadow aliasing.� P1 shows an instance of this problem, manifested as dark striped patterns on the checkerboard and stains on the statue. A slight increase of shadow buffer resolutions cannot solve this problem.

      P1. Self-Shadow Aliasing

    3. Pipeline processing

      An interpolative rendering limits definable data on polygon verticese.g., homogeneous coordinates, light-source incidence, surface normal, view direction and transparency. Shadow casting needs light-source shape, geometric relations with light-source, occluder, shadow receivers and environmental light-diffusive conditions, in addition to the fields mentioned above. Global illumination on the basis of this data, at a billion pixels per second, may be impossible, so that realtime soft shadows must be generated by an empirical approach that implements a pipeline processing.

      To address these problems, we developed algorithms and architectures described in the following sections.

      New Definitions and Procedures for Anti-aliased Soft-Shadow Casting

      Definition of Polygon

      1. Polygon Definition

        Each polygon vertex is defined with 24 variables, shown in Table 1, which allows shadow casting from transparent objects and textured bump-mapped rough surface shading. On the first pass, each polygon vertex is defined with homogeneous coordinates L1x, L1y, L1z and L1w, while light-source coordinates (L2x, L2y, L2z) and (Vx, Vy, Vz, Vw) are associated on the second pass. These coordinates define geometry and shadow polygons. The surface normal (Nh, Nv), light-source incidence (Lh, Lv), bump-up vector Nu, and viewing direction (Eh, Ev) determine light reflective intensity on the bump-mapped rough surface. Coordinates u and v address texture and bump mapping patterns. All variables are interpolated along outlines of a polygon and then between two edges crossed by horizontal axis in the span processor. Simultaneous definitions of light-source and eye-point coordinates eliminate the transformation of equation (1). The span-processor in our system renders pixels on 4 scan-lines simultaneously. Totally, 144 interpolators (48 for the right/left outline interpolator and 96 for the span processor) run together for performing bump-mapped rough surface shading, texture mapping, shadow casting, and color blending. Figure 2 shows the overall architecture for shadow casting and light-reflected shading processors.

        Table 1. Vertex variables




        Coordinates in EPCs

        Vx, Vy, Vz, Vw

        homogeneous coordinates

        Texture Coordinates

        u, v


        Surface Normals

        Nh, Nv

        polar coordinates

        Light-source Angles

        Lh0, Lv0 Lh3, Lv3

        polar coordinates

        Viewing Direction

        Eh, Ev

        polar coordinates




        Light Absorption


        transparent object

        Bump-map Up-vector


        per polygon

        Coordinates in LSCs

        L1x, L1y, L1z, L1w

        L2x, L2y, L2z

        first pass

        second pass

        Figure 2. Light-Reflection and Shadow Casting Architecture

        In Figure 2, LSCs from the span processor are reversed to render the shadow polygons with parallel projection on the first pass. On the second pass, LSCs address the shadow polygon buffer, reading the shadow polygons into shadow intensity modulator in order to determine shadow intensity. The output from shadow casting renderer scales the intensity of textured bump-mapped and shaded pixels.

      2. Shadow-flag and Polygon ID

        A shadow shade-flag and polygon identifier (ID) are newly defined in this system. The shade-flag and ID are stored in an ID buffer with 8 and 24 significant bits, respectively. The shade-flag comprises a two-bit (edge and interior) field for showing the interior of a shadow polygon and a 6-bit field for shadow gradation (initial value is3F[hex]). The ID is a code uniquely identifying every object. Depth-comparison using a shadow buffer manifests self-shadow aliasing, described in the previous section, caused by the limitation of buffer capacity (resolution). Various methods have been proposed to avoid self-shadow aliasing, importing such ideas as bias factor [16] and an intermediate surface [17]. However, these still have limitations in the object locations. Our system attaches a polygon ID to the shadow polygon on the first pass. If the ID of the shadow buffer coincides with the ID of shadow receiver on the second pass, the receiver is regarded as a lightened surface. This scheme cannot cast a shadow on the rear part of object, away from the light-source, because polygons in the same object share a common ID. However, our system runs together with bump-mapped rough surface shader, as shown in Figure 2, which provides a self-shading effect on the rear part of object using light-source direction and surface normal. We also solved a self-shadow of a torus shape using surface normals (described in a later section).

      Soft-Shadow Processing

      A shadow in the real world has regions of umbra and penumbral, corresponding to the light-source shape, ambient light, and distances between light-sources, occluders, and shadow receivers. Some proposals regarding soft-shadow rendering notwithstanding, we cannot find technologies suitable for hardware implementation with a single pipeline procedure. Conventional technologies for soft shadow casting are based on accumulation buffering with multiple shadow volumes. Although those can produce a blurring effect with theoretical nicety, 10s times full rendering are needed for shadow polygons to reduce a banding artifact. For example, rendering of 50 times generates simply 50 levels of gradation, so that the performance decreases to 1/50. The performance of this method further depends on the number of polygons. The realtime rendering of a billion pixels per second must be carried out on a blurred image within a single frame cycle no matter what complex processinge.g., shadows from transparent occludersis included.

      The following eight sub-procedures are applied to obtain soft shadows in our system. All of these sub-processes are implemented in pure hardware with a single clock cycle pipeline.

      On the first pass:

      1. Storing shadow polygons, including depth values (the nearest and second-nearest objects), shadow shade-flags, and polygon IDs with parallel projection in LSCs;

      2. Storing shadow image and light-absorption coefficients of transparent objects, and;

      3. Filtering the shade-flag (interior and gradation) and storing the filtered values in ID buffer after all objects are rendered on the first pass.

        On the second pass:

      4. Rendering polygons with light-reflected shading in EPCs and LSCs simultaneously;

      5. Detecting a shadowed pixel by comparison of depths and polygon IDs between first and second passes;

      6. Blending colors of the receiver and the shadow image using light-absorption coefficients in case of the shadow of a transparent object (if the receiver is lightened, the pass skips to sub-process (5));

      7. Modulating intensity of receiver using light-erosion and penumbra functions defined with distances of objects (receiver, occluder, and light-source) and filtered shade flag; and

      8. Blending shadowed pixels of 7) with destination pixels of the visible image buffer.

      Filtering for Anti-alias and Soft Shadow

      Filtering of shadow polygons has two purposes: antialiasing of shadow edge and blurring for penumbra effects. An indirect illumination effect (soft edge of shadow) may be obtained by filtering the shadow around its edge. However, a wide-ranged blur cannot be obtained by general anti-aliasing techniquee.g., percentage closer filtering [16]applying filtering after the first pass. The projective texture using tri- or bi-linear filtering also cannot produce a wide-ranged blur. Meanwhile, application of direct filtering to L1z before the second pass can not be successful in general, since shade filtering for depth-values in particular results in unrelated geometrical positions. We implement the following new procedures to fix this error:

      1. Filtering depth and shadow gradation flag. If the sampling point is placed outside the shadow polygon, the edge flag is reset to 0; otherwise, it is set to 1; and

      2. Determining the filtered values at the sampling point according to the following schemes:

          1. If the polygon ID is uniquely included in a sampling area, depth and shade gradation are filtered in the sampling area and only the depth L1zs at interior flag-on in the sampling area are averaged; and

          2. If multiple IDs are included in the sampling area, only L1zs of the same polygon ID which includes the nearest depth-value to the light-source are selected and averaged, while shade gradations are filtered resetting other polygon IDs to 0. The filtered value is put on a sampling point with that polygon ID and interior flag (but not with an edge flag).

    Figure 3 shows an example of shade filtering of 5 x 5 sampling area according to the above rules. The filtering is carried out only with the depth-values of selected points attached to the same polygon ID. Other values are not used. Thus, the value of L1zs which are grouped nearest to a light-source around a polygon edge are not influenced by values of other polygon IDs or any values where the flag is off. In Figure 3, the map at the top left shows the shadow buffer storing Qp, depth, and a polygon ID. There are five polygon ID groups in the example, each with its own Qp and depth values. Shown in the top right is filtering using the values of ID0. One of the depth-values in ID0 includes the nearest value from the light-source. Depth filtering is carried out using only Z0- Z5, while shade filtering is done with all Qps after resetting Qps of other IDs to 0.

    The maps at the bottom left and right show similar examples. To apply this method, the shadow polygon located nearest to a light-source spreads its area and penumbra grading in light-source coordinates (perpendicular to light-source direction). The blurring effect around the edge of shadow polygon is thus obtained by the filtering.

    Figure 3. Selective Area Sampling

    Several methods are available for blurring: (1) FIR (finite impulse response) filtering, (2) mapping a blur-pattern onto a shadow polygon, and (3) IIR (infinite impulse response) filtering. The first method must apply a filtering of a 5 x 5 or 7 x 7 sampling area more than eight times in order to get a sufficient blur spreading. Scanning the shadow buffer more than eight times carries a high computation cost and results in similar performance to multiple shadow volumes. The second method is to map two-dimensional blur-patterne.g., 5 x 5onto every interpolated point of a shadow polygon (which may need at least one filtering after mapping). It is also difficult to get a smoothed gradation with this method. The IIR filter can produce a wide spread of gradation, but uni-directional IIR filtering shifts the blurred region in a specific direction. We implemented an IIR filter that scans the shadow polygon buffer in diagonal directions from two or four corners of the frame. The shadow polygons are filtered at 3.3ns per sampling point (at 300MHz operation frequency). Figure 4 shows distributed curves of a shadow polygon resulting from 7 x 7 filtering of diagonal directions. Figures 4a, b and c show the hard edge (without filtering), bi-directional (diagonal), and crossed bi-directional IIR filtering, respectively. Horizontal and vertical axes of the top figures show numbers of pixels and intensity levels, respectively. After all areas in an ID buffer are filtered, the second pass is started. Rendering on the second pass is carried out by interpolations of the coordinates in EPCs and LSCs and light-reflected shading with texture mapping simultaneously. The cost of filtering is constant and negligible in an embedded frame buffer system. This is the most outstanding feature in our system, compared to multiple shadow volumes using an accumulation buffer. However, despite the fact that bidirectional IIR filtering produces a wide-range blur, the penumbra effect is not available without special processing, as described in the next section.

    Figure 4. Bi-Directional 7 x7 IIR Filtering

    Umbra and Penumbra Estimations

    Erosive effect

    A penumbra area is generated corresponding to light-source properties and geometric convolution among shadow receivers, occluders, and light-sources. We define two components to produce penumbra and umbra effects: light-erosion and penumbra-shade. The light-erosion D is an ad hoc function of ΔLz/L1z, given as:

    where ΔLz and L1z are distances between occluder and receiver, and occluder and light-source, respectively, and k0 , k1, and k2 are the coefficient determined by ambient light, respectively. Equation (2) yields the curves shown in Figure 4. The coefficients (k0, k1, k2) are set at (3/2, 1, 1), (7/6, 3, 3), (21/20, 20, 20), and (101/100, 50, 50) for the (a), (b), (c) and (d) curves in Figure 4.

    Figure 4. Light-Erosive Function

    At a point far from an occluder, darkness is merged with the light of receiver. P2 shows light-erosive images with perspective projection, changing the coefficients k0 and k1 of equation (2).

    P2. Erosive Effect

    Equation (2) has a problem if the depth value of the nearest surface of the nearest object to the light-source is stored and the occluder has large width in the light-ray direction. ΔLz must use the distance from the rear (not front) surface of occluder to the receiver, in order to calculate erosive intensity correctly. Our system stores the furthest depth of the nearest object, collating the polygon IDs. Further problems occur when multiple occluders are overlapped in a light-ray direction. P3a shows an instance of these problems. In picture P3a, the apple and calyx have different IDs, so that the shadow of calyx is cast on the surface of apple. If a single shadow buffer is implemented and light-source incidence is given as shown in P3, the calyx is stored in the shadow buffer overlapping with the apple. As the result, the erosion effect on the receiver is calculated using the distance from the calyx, not from the rear surface of apple. Because the distance between calyx and receiver is further than the distance from the rear of apple, the shadow darkness from the calyx is weakened rather than the apple body as shown in P3a. Multiple shadow buffering solves this problem. P3b shows images using two shadow buffers.

    Another problem is that the calyx shadow may cast on the rear surface of apple body. This is because the rear part of the apple becomes the projected surface for the calyx in view coordinates. The shadow casting onto the rear surface can be avoided using a surface normal flag (single bit). This flag is set when the depths of a polygon are located far from the destination pixel at the first pass. Our system computes a diffuse reflection angle at the shader in parallel. On the second pass, the normal (cosine between light-source incidence and surface normal) flag is checked and the shadow is eliminated if the IDs of receiver and shadow polygons coincide or the normal is negative (or zero) in LSCs.

    P3a. Single Depth Buffering

    P3b. Double Depth Buffering

    Penumbra Effect

    Penumbra-shade softens the intensity around a shadow edge, depending on the light-source type and distance between receiver, occluder and light-source. Our system generates umbra and penumbra by shade-value Qp. This value ranges 0Qp1 with 64 quanta using IIR filtering, and is multiplied with the shaded intensity given by the bump-mapped rough surface shader, in order to make a fine shadow grading. This is equivalent to the effect of accumulation buffering 64 times. However, blurring of Qp on two-spatial dimensions cannot give an appropriate soft-shadow effect because the filter spreads the blurred area uniformly. In nature, the shadow edge hardens at positions nearer the occluder. The penumbra area (half-width h) corresponds to ΔLz / L1z and the light-source angle and type, given by the following equation [18]:

    where cosis the cosine of light-source incidence and r is the radius of the light-source (assumed to be spherical). Cos is obtained from the rough surface shader. The spreading of blur corresponding to equation (3) is carried out by modifying Qp, using a specific function f(Qp, h) as shown in Figure 4. The uppermost graph shows hard-edged curves of Qp changing abruptly between 0 and 1 at the boundary of a shadow polygon edge.

    Figure 4. f(Qp, h) Function and Inverse Transformation of Blurred Polygon

    In Figure 4, the horizontal and vertical axes show input variable Qp and modified penumbra (scale of intensity), respectively. The penumbra half-width h is a parameter to choose the curve. Smaller h chooses a curve that changes Qp more abruptly between 0 and 1. At a shadow near the occluder, Qp changes abruptly and the boundary between lit and umbra areas is displayed clearly. A function yielding such an effect is defined by the following equation:

    where La is normalized(h) and k0k3 are grading coefficients (k1 are given at ranges of 15 to 100 in general). Putting this curve into RAM (random access memory) tables enables control of gradation, corresponding to the predefined light-source type, depth-distance, and environmental conditions. The lower figures in Figure 4 show inverse transformation using equation (3) in which the blurred area converges to the original shadow polygon. Penumbra gradation generated by IIR filtering independent from light-source type is regarded as a fake technology. However, our system determines the penumbra area empirically using f(Qp, h), related to the light-source size and the distance between light-source, occluder, and receiver.

    The above scheme is appropriate only if a single occluder is in front of the receiver, because Qp is generated only for the shadow polygon nearest to the light-source. Multiple shadow polygons being overlapped manifests a problem that the correct Qp at a sampling point cannot be detected. To cast a shadow more accurately, our system provides two shadow buffers to store depths of the nearest and second-nearest occluders separately. On the second pass, these depths test localization of the receiver relative to the occluder. If more than two occluders overlap in front of the receiver, the penumbra shade derived from Qp of the nearest occluder to the light-source must be ignored to use the Qp of the second-nearest. The relations of receiver and occluder are shown in Table 2.

    Table 2. Relations of Object-Localization

    ΔLaz ΔLbz

    Location of Receiver

    0 0

    located in front of occluders

    0 1


    1 0

    located between two occluders (case 1)

    1 1

    located behind both occluders (case 2)

    ΔLaz is the comparison result with the nearest shadow polygon to the light-source, and ΔLbz is the comparison with the second-nearest. In case 1, when the receiver is located between two occluders, the occluder nearest to the light-source casts the shadow onto the receiver. Light-erosion and penumbra-shade of the nearest shadow polygon are used. In case 2, when the receiver is located behind both shadow polygons, light-erosion and penumbra-shade of the second-nearest polygon are used. Exterior or interior of a shadow polygon at an interpolated point on a receiver is determined by testing the flag and ID, which are put on the shadow polygon on the first pass.

    A further problem with this scheme occurs on the overlapped area of the penumbra region of the nearest occluder and umbra region of the second-nearest occluder from a light-source. This causes erosion by penumbra gradation on the umbra area. P4 shows an instance of this problem.

    P4. Penumbra Error

    Figure 5 shows a top-view of locations of three planar objects, located perpendicular to a light-source direction. In such a case, the shadow polygon projected onto R1 + R2 areas by occluder 1 and R3 + R4 areas of occluder 2 are stored in the first shadow buffer, while R2 of occluder 2 is stored in the second buffer. Due to the filtering based upon the described rule, the blurred area R3 of occluder 1 eliminates the depth values of occluder 2. As the result, a blur intensity of occluder 1 is projected onto area R3 of the receiver, despite the fact that the shadow of occluder 2 should be cast. This causes an unnatural effect. Moving the eliminated depths at the sampling point to the second shadow buffer solves this problem.

    As described above, we generate the following data every interpolation (in a pipeline procedure) on the second pass by pure hardware:

    1. Filtered value Qp to determine penumbra-area and intensity,

    2. Shadow shade-flag,

    3. Source and destination polygon IDs,

    4. Depths of nearest and second-nearest shadow objects,

    5. Distance between occluder, receiver and light-source,

    6. Image color Irmshade and transparent coefficient α of receiver,

    7. Shadow image S and light-absorption coefficient β, and

    8. Light-source type and environmental light-diffusion coefficients (defined in advance).

    P4a- c show the penumbra effects with coefficients of k0 = and k1 =. P5a- c show these sample images combined with erosion and penumbra effects.

    Figure 5. Relation of Occluders and Shadow Blur Area

    P4a. Penumbra Effect (La: )   P4b. Penumbra Effect (La: )  P4c. Penumbra Effect (La: )

    P5a. Soft shadow (La: )  P5b. Soft shadow (La: )  P5c. Soft shadow (La: )

    P6a. Image sample of antialiasing effect

      P6b. Image sample of penumbra effect

    P6c. Image sample III


    P6d. Image sample IV

    Shadow from Transparent Object

    Our system handles shadows from transparent objects using additional buffers for shadow images and absorption coefficients, as shown in Table 1. This function uses the same procedure for the first and second passes, combining texture mapping, shading, and color blending. The shadow image of a transparent object with texture and shading is stored in a specific (shadow image) buffer. The effect of refraction in this system is not considered. The differences between the procedure on the first pass and the regular shading on the second pass are the replacement of the texture-coordinates with u / L1w and v / L1w and transparency value α with light absorption value β. The absorption value is related to the surface normal and light-source incidence, so it can be calculated using cosθ as described below. The polygon vertex is defined with 13 variables (Vw : L1w and α:β can commonly use interpolators) on the first pass. The intensity Ip of bump-mapped shadow image in parallel illumination is given by the following equations:

    where Nh, Nv, Bh, Bv, Lh and Lv are the horizontal and vertical angles of surface, bump and light-source normals, respectively and Nu and Id are bump-up vector and diffuse coefficient, respectively.

    Absorption β is also interpolated across the interior of polygon. Thus, if a rendered object on the second pass has intensity Ip and transparency property α, then the target intensity Irmimage of the visible pixel merged with a background object is given as:

    The shadow mapping scheme has problems with shadows from transparent occluders, in that shadows can be projected properly if the receiver is located just behind a transparent occluder, but not if an opaque occluder is located between a transparent occluder and receiver. In the latter case, despite the fact that a solid (dark) shadow should be cast, the transparent-object image is projected onto the receiver because the shadow buffer stores the nearest depth from the light-source on the first pass. In order to detect the positional relation of receiver between the transparent and opaque occluders, two or more shadow buffers are needed to store the depth-values of transparent (L1z-transp) and opaque (L1z-opaq) occluders nearest to the light-source separately. On the second pass at every interpolation, the source depth L2z compares with L1z-opaq and L1z-transp simultaneously, selecting either the solid or the transparent shadows. P7a and b show shadows of transparent objects. These operations are not complex in hardware but carry a large hardware cost in general.


    P7a. Shadow of Transparent Occluder P7b. Shadow of Transparent and Opaque Occluders

    P7b shows that a transparent occluder (dog picture) casts its shadow onto both the twisted object and the plane. The twisted opaque object also casts its shadow onto both itself and the plane.

    Shadow of Special Shape

    Shadow mapping using polygon ID and self-shadow by light-reflection shader address self-shadow aliasing on convex shapes. However, a special operation is needed for concave objects such as a torus that casts a shadow of parts of the object onto the self-surface. Our system stores the depth of a shadow polygon at the rearmost part of the same polygon ID and identifies the polygon as a lightened surface. Thus, the shadow cannot be cast on visible parts that face the rear surface in a concave object due to sharing the same ID. Two data are newly used to solve this problem: surface normal and distance between surfaces of negative normals in LSCs. If the normals of shadow polygons are the same and two polygons are separated with an appropriate distance, two surfaces are stored separately into two shadow buffers on the first pass. If the normals of receiver and occluder (shadow polygon) are directly opposite but have the same IDs on the second pass, the receiver is regarded as a shadowed surface. P8 shows the shadow of a torus that is mapped with a bump-pattern. The front part of ring casts a shadow onto both the inside of ring and the checkerboard. The practical implementation uses the cosine of light-source incidence and surface normal as a normal flag instead of the normal in LSCs on the second pass.

    P8. Shadow of Torus Shape

    Multiple Light-Source Shadows

    Shadow casting under multiple light-sources can be implemented by multiple buffering of shadow polygons. To get the same performance as a single light-source with multiple light-sources, n sets of shadow and image buffers are needed for n light-sources. A practical implementation of this approach may be limited to a few light-sources, due to the increase of hardware cost. For multiple light-sources, accumulation buffering using a single set of buffers may be more practical, though performance decreases 1/n times. Even combined with the accumulation of shadow intensity for penumbra effect of multiple light-sources, rendering time is reduced 1/n x (m-1) in our system (m is the sampling numbers per area light-source). P7 shows the shadow image of double light-sources using double buffering.

    P7. Shadows by Two Light-Sources

    Hardware Implementation

    Our shadow casting renderer is one of the graphics modules embedded within a single chip, as shown in Figure 1, consisting of following sub-modules of unique hardware:

    1. Filter for depth and shadow region;

    2. Shadow polygon, ID, and shadow image buffers; and,

    3. Shadow intensity modulator.

    A soft shadow with light-reflected shading, transparent objects, and light-diffusion effects must be defined by variables shown in Table 3 on polygon vertices. The polygon outline interpolator of Figure 2 receives these variables and the span processor interpolates all polygon interiors. Light-source coordinates L2x, L2y, and L2z on the second pass are defined with 32-bit fields each.

    Filter for shadow polygon flag

    Depth value, polygon ID, shadow image, and the absorption coefficient in the 1K x 1K ID buffer are filtered and blurred according to the rules described in the previous section. Single-step filtering is sufficient to get an anti-aliased effect around a shadow polygon. However, sufficient blurring for a soft shadow needs a bidirectional or quad directional IIR filtering. Light-source type and environmental (light-diffuse) conditions determine the number of filtering iterations. A filter module for the shadow polygon flag is shown in Figure 6.

    Table 3. Vertex variables for two-pass algorithm


    Opaque Occluder

    Transparent Occluder

    First pass:

    Coordinates in LSCs

    Coordinates in Texture

    Surface Normals

    Light-source Angles


    Bump-map Up-vector

    L1x, L1y, L1z, L1w

    Nh, Nv

    Lh, Lv



    L1x, L1y, L1z, L1w

    u, v

    Nh, Nv

    Lh, Lv



    Second pass:

    Coordinates in EPCs

    Coordinates in Texture

    Surface Normals

    Light-source Angles


    Bump-map Up-vector

    Coordinates in LSCs

    Vx, Vy, Vz, Vw

    u, v

    Nh, Nv

    Lh, Lv



    L2x, L2y, L2z

    Vx, Vy, Vz, Vw

    u, v

    Nh, Nv

    Lh, Lv



    L2x, L2y, L2z

    Figure 6. Filter for Shadow-flag

    An input line-buffer consisting of seven sets of 64 x 32 bit DRAMs and double 8 x 32 shift registers is configured for average filtering. This system reads 64 shadow-flags (maximum 8 bits per pixel) from the frame buffer every 3ns, loading them into the line buffer. An output line-buffer consists of double 64 x 32 bit registers and multiplexers. Buffered data are written into the frame buffer in 32-pixel slices every 3ns. The mask circuit in Figure 6 selects flags belonging to the same polygon ID nearest to the light-source.

    The filter has 64 quanta to handle multiple-stage filtering. The computation cost for sampling is about 1.5ns per pixel (double buffering), including transmission between the filter and the shadow polygon buffer. It takes about a quarter frame cycle due to full hardware implementation. Thus, four times filtering, for example, can be achieved within a single frame cycle (1/60 sec.). This cost is negligible compared to the first or second pass. Table 4 shows the frame buffer size and stored data for soft-shadow casting. Table 4 gives the physical size of shadow polygon buffer as 1K x 1K, but it can be addressed virtually (logically) across a wide range. All data in Table 4 must be double buffered for displaying animation in realtime.

    Bump-mapped rough surface shading circuit

    Shadows from transparent objects are generated using light-reflection and texture mapping. Our system implements a bump-mapped rough surface shader, as shown in Figure 7. The transparent occluder is rendered through the shader on the first pass and stores the shadow image into a specific buffer called the �shadow image buffer.� Further details of this processor are described in [19] and [20]. Only diffuse reflection is used for the shadow image on the first pass, while self-shading on the second pass is carried out involving diffuse and specular reflections on the basis of surface normals, view direction, and light-source incidence.

    Figure 7. Bump-mapped Rough Surface Shader

    Intensity modulator

    Figure 8 shows a shadow intensity modulator based on equation (5). The arithmetic of the equation is performed within a single clock cycle per pixel if the span processor in Figure 2 runs without a wait-cycle. The circuit of Figure 8 consists of a complete pipeline structure of three stages (pipeline-registers are not shown in the figure). Two RAM-blocks implement the erosive and penumbra functions. The RAM to output light-erosion (1-D), where D is given by equation (1), consists of a 128 x 8-bit table, while the RAM for penumbra value f(Qp, h) consists of 256 x 16-bit memory, 8-bit multiplier, and 16-bit adder. Input variables Qp and h are allocated 7 (6-bit for Qp and single bit for edge) and 8 bits, respectively. Qp x h is quintile to 64 levels in 512 functions. In this system, four sets of intensity modulator corresponding to the four sets of shading processors are deployed.

    Table 4. Frame buffer allocation for single light-source




    Image (visible object)

    Blending value

    Image Depth

    Polygon ID

    Shadow polygon

    Penumbra value

    Shadow image

    Light absorption

    Bump Normal

    Texture Pattern

    2K x 1K x 24-bit

    2K x 1K x 8-bit

    2K x 1K x 32-bit

    3 x (1K x 1K x 24-bit)

    3 x (1K x 1K x 24-bit)

    3 x (1K x 1K x 8-bit)

    1K x 1K x 24-bit

    1K x 1K x 8-bit

    0.5 K x 0.5 K x 10-bit

    0.5 K x 0.5 K x 24-bit

    double buffers

    double buffers

    double buffers

    double buffers

    double buffers

    double buffers

    double buffers

    double buffers

    Figure 8. Shadow Intensity Modulator


    Performance and Circuit Simulation

    In general, system performance is dominated by the workload balance of two factors: middle- (geometry accelerator) and fine- (graphics renderer) grained processing. To obtain the best performance in fine-grain processing requires a pixel per single clock cycle because no other technologies exists faster than the pixel generation per clock cycle. A pixel per single clock cycle means that the performance changes across neither functional combinations nor polygon sizes. Our design concept is based upon this specification, and the renderer produces a pixel per single clock cycle (1.2 billions pixels per second with four parallel processing at chip operation frequency of 300MHz) if a polygon consists of more than 20 pixels. (Loading time for polygon data per vertex needs 7 clock cycles at outline interpolator in 64-bit data bus.) Thus, the performance of shadow casting renderer is constant and depends simply upon frequency. Another factor determining performance is the bandwidth between renderer and frame buffer. According to the SARC roadmap, 256Mbits and 1Gbits will be embedded by 2001 and 2005, respectively. 1Gbit memory cell enable implementation of the full buffers shown in Table 4. The embedded memory-cell array and pixel cache [21] for the frame buffer configuration reduce the bottleneck. The performance and simulation for related modules are shown in Table 5.

    Complete Pipeline Structure with a Pixel per Single Clock Cycle

    It might seem as if an architecture based upon the described algorithm would be too complex to implement in hardware, due to the multiple conditions and empirical functions. As shown in the Figure 8, the proposed algorithm can be implemented with complete pipeline-flow structure with no feedback loop and the hardware is simple due to the use of RAM tables. Algorithmic complexity for software is not always equal to the hardware complexity. Figure 8 consists of seven pipeline-stages.

    Table 5. Development Environment, Circuit Scale, and Performance

    Development Environments

    Cadence IC4.43QSR3 (SUN Solaris 7)

    Verilog-XL 2.7

    Toshiba TC260 Standard Cell Library and Specific User Macros

    Performance (single clock cycle = 3.3ns)


    Circuit Scale (K-gate)



    Outline processor


    right/left edges/single clock cycle

    14 pipelines

    Span processor


    4 points/single clock cycle

    8 pipelines

    Shading processor


    4 pixels/single clock cycle

    35 pipelines

    Intensity Modulator


    4 pixels/single clock cycle

    7 pipelines

    Pixel Cache


    6.6ns/64 pixels block transfer


    IIR Filter


    2.5ns/sampling point


    Although some chip suppliers often present Phong shading, bump mapping, and soft shadow casting as embedded functions within an ASIC, a product that enables bare performance using such integrated functions is still not available. The lack of complete pipeline and parallel processing causes a variation of performance. HDL (Hardware Description Language) or complied cell-design could implement the above functions easily in software, using large-scaled LSI. However, the direct implementation of well-known empirical or physical-based models suffers enormous hardware scale. (Phong shading, for example, requires 72 floating point processors just for normalization of surfaces, light-sources, and view vectors within a single clock cycle operation.) To achieve performance of a billion polygons per second under above functions, a complete pipeline processing (hardware specific algorithm) is indispensable. Antialiasing soft shadow casting with fully hardware architecture, as described in this paper, may be the first research with this capability. Our system simply transmits polygon vertices once (in case of double buffering system) or twice, and does not involve any additional processing in geometry accelerator. To this purpose, we implement the specific filter and intensity modulator that perform specific computations on behalf of the accelerator.

    We indicated various problems in the shadow mapping algorithm and developed new techniques to solve them. There are still some visual restrictions. These include: (1) undefinable soft-shadow casting through multiple overlapped objects, and (2) restriction on numbers of light-sources (depending on hardware or computation cost). Estimating the frequency such situations might be encountered is critical in practical VR applications. These problems might be solvable with large-scaled hardware because the perception of LoD (level of detail) is saturated at any rate.

    Our system structures the graphics functions by modular configuration, as shown in Figure 1, and each module can further extend the parallelism of rendering (simultaneous interpolations of polygon interior). Middle grained processing is carried out by graphics accelerator consisted of multiple PEs [11][13]. In a frame buffer embedded structure, in particular, the bandwidth problem of frame buffer does not appear, so that the degree of parallelism is theoretically unlimitede.g., 8- or 16-span processing. Performance of shadow casting and bump-mapped rough surface shading will increase by a factor of about 20% every 4 span-processors, reaching a rate of 0.1ns per pixel or a billion polygons per second within a few years.


    I thank Kazuhiro Watanabe (Matsushita Electric Industrial) and Eisaku Obuchi (NEC) for their works on image rendering and verifications of hardware algorithms.


    1. P. Robertson, �Spatial Transformations for Rapid Scan-Line Surface Shadowing,� IEEE CG& A, Vol.9, No.2, March 1989, pp.30- 38

    2. P. Bergeron, �A General Version of Crows Shadow Volumes,� IEEE CG& A, Vol.6, No.9, Sept. 1986, pp.17-28

    3. F. Crow, �Shadow Algorithms for Computer Graphics,� Computer Graphics, Vol.11, No.3, Aug. 1977, pp.242- 248

    4. L. Williams, �Casting Curved Shadows on Curved Surfaces�, Computer Graphics, Vol.12, No.3, Aug. 1978, pp.270- 274

    5. P. Haeberli and K. Akeley, �The Accumulation Buffer: Hardware Support for High-Quality Rendering,� Computer Graphics, Vol.24, No.4, Aug. 1990, pp.309- 318

    6. T. Heidmann, �Real shadows, real time, Iris Universe, No.18, SGI Inc. 1991, pp.23- 31

    7. J. Amanatides, �Ray Tracing with Cones,� Computer Graphics, Vol.18, No.3, July 1984, pp.129- 135

    8. R. Cook, T. Porter, and L. Carpenter, �Distributed Ray Tracing,� Computer Graphics, Vol.18, No.3, July 1984, pp.137- 145

    9. T. Whitted, �An Improved Illumination Model for Shaded Display,� CACM, Vol.23, No.6, June 1980, pp.343- 349

    10. J.Arvo, �Backward Ray Tracing,� Tutorial Notes on the Developments in Ray Tracing, SIGGRAPH86, Aug. 1986

    11. http://www.computer.org/computer/articles/xbox.htm

    12. STARC Road Map 2000, www.starc.or.jp/roadmap00

    13. T. Ikedo and J. Ma, �Truga001: A Scalable Graphics Processor,� IEEE CG& A, Vol.18, No.2, March/April 1998, pp.59- 79

    14. H. Nguyen, �Casting Shadows on Volume�, Game Developer, vol.6, no.3, pp.44- 53, March 1999

    15. T. Moller and E. Haines, �Shadow� in Chapter 6.6, Realtime Rendering, A. K. Peters, Massachusetts, 1999, pp.167- 183

    16. W. Reeves, D. Salesin, and R. Cook, �Rendering Antialiased Shadows with Depth Maps,� Computer Graphics, Vol.21, No.40, 1987, pp.283- 291

    17.A. Woo, �Shadow Depth Map Revisited, �David Kirk, ed., Graphics Gems III, AP Professional, Boston, 1992, pp.338- 342

    18. D. F. Rogers, �Shadow,� Chapter 5.11, Procedural Elements for Computer Graphics, second ed., McGraw-Hill, 1998

    19. T. Ikedo and W. Martens, �Multimedia Processor Architecture,� In Proc. of IEEE Multimedia System'98, Int. Conf. on Multimedia Computing and Systems, Austin Texas, June 1998, pp.316- 325

    20. http://www.parims.org/