An Information-Theoretic Approach to Shape Complexity

Jaume Rigau, Miquel Feixas, and Mateu Sbert
Institut d'Informàtica i Aplicacions, Spain
, , and

Contents

Abstracts
1. Introduction
2. Fundamentals
2.1. Global Lines
2.2. Complexity and Mutual Information
3. Inner Shape Complexity
3.1. Complexity Measure
3.2. Monte Carlo Computation of MI
3.3. Inner 3D-shape Complexity Results
3.4. Inner 2D-shape Complexity Results
4. Outer Shape Complexity
5. Conclusions and Future Work
Acknowledgements
References

Shape complexity has recently received attention from different fields, such as computer vision and psychology.
In this paper, integral geometry and information theory tools are applied to quantify the shape complexity from two different perspectives: from the inside of the object, we evaluate its degree of structure or correlation between its surfaces (inner complexity), and from the outside, we compute its degree of interaction with the circumscribing sphere (outer complexity).
Our shape complexity measures are based on the following two facts: uniformly distributed global lines crossing an object define a continuous information channel and the continuous mutual information of this channel is independent of the object discretisation and invariant to translations, rotations, and changes of scale.
The measures introduced in this paper can be potentially used as shape descriptors for object recognition, image retrieval, object localisation, tumour analysis, and protein docking, among others.

Keywords: shape complexity, mutual information, integral geometry, object recognition, image retrieval.

1. Introduction

In the last years, the shape complexity has been analysed from different areas, such as computer vision [13] and psychology [7]. The benefits of a shape complexity theory (with its corresponding measures) would range from object classification for further database retrieval to improvements in cognitive science. The approach taken so far has been to consider the information (measured by Shannon entropy [4] contained in the curvature of an object. Either an a priori distribution for the curvature is given and the extrema of the function are shown to contain more information, or the entropy of histogram of rotated angles (or of angle excess) [13] at the meshed contour or surface of the object is taken as the measure of the complexity of an object. The first approach depends on the discretisation of the curve or surface, and the second is not suitable to compute the whole complexity.

In this paper, we propose two shape complexity measures of an object which are independent of the discretisation and appropriate to compute the partial or global complexity of any object. Between the information theoretic measures, one that fulfils this requirement is the continuous mutual information (MI), which measures the information shared between two probability distributions.

Shape complexity will be analysed from two different perspectives. First, from the inside of the object, its degree of structure (interdependence between its parts) is evaluated. We consider the information shared by the interior contour (or surface) from the object with itself. A differential of contour (or surface) will be related to another differential of inner contour (or surface) by the uniformly distributed global lines [15] that join them, this is, make them directly visible. Second, from the outside of the object, the degree of interaction between the object and its circumscribing sphere ("environment") is calculated. These complexity measures could be used as shape descriptors in fields such as object recognition and classification.

This paper is organised as follows. In Section 2 we review the generation of uniformly distributed global lines and also the MI definition. In Section 3 we present a complexity measure which quantifies the degree of (internal) structure of an object. In Section 4, a different approach is introduced in order to evaluate the external complexity of an object. And finally, in Section 5, we present our conclusions.

2. Fundamentals

In this section, the two basic tools used in this paper are reviewed: generation of uniformly distributed global lines and mutual information definition.

2.1. Global Lines

From integral geometry [14], a uniform density of lines that is homogeneous and isotropic (invariant under translations and rotations) is defined. An easy way to sample this line density is to take random pairs of points on the surface of a sphere enclosing an object [16]. These lines are called global lines [15]. Other ways of generating lines are described in [2]. Interestingly, this uniform density generation does not have a counterpart in 2D. That is, taking pairs of points uniformly distributed on a circumference does not provide a uniform density within the circumference. In 2D, a way to get a random chord consists in choosing uniformly at random a direction on the circle and then uniformly at random a point on the corresponding radius: the chord is the line segment whose endpoints are located on the circle and perpendicular to the radius [16,3].

The density of uniform lines crossing two differential areas dx, dy, centred at inner points x and y, is given by dG = F(x,y)dxdy, where F(x,y) is the point-to-point form factor and equal to cosθ_xcosθ_y/(πr_xy²) for mutually visible points, or zero otherwise, where θ_x and θ_y are the angles which the normals at x and y form with the segment joining them, and r_xy is the distance between x and y (Fig. 1).

Fig. 1.Geometry of one global line that generates five random chords in a 3D-shape: , and .

Global lines intersect an object forming random chords, that can be used to measure the visibility within a body (Fig. 1). Visibility directions must be homogeneous and isotropic over the body, a quality fulfilled by the uniform line density. Thus, it can be considered that the form factor measures this visibility. In Feixas et al. [5], a scene visibility complexity is defined using this property.

Interestingly, the global line density is related to the curvature function through an integral relation [14]. For an object K,

where κ_i is the curvature (2D) or Gauss curvature (3D) at the ith intersection point (out of n) of a line G and the object K, and for a planar object C=2 and for a 3D-object C=π, being c total curvature.

2.2. Complexity and Mutual Information

The study of complexity has multiple directions and objectives, and also many fields of application (automata, computer science, physics, biology, etc.) [1], which reflect the great activity in this area. But, what is complexity? According to W. Li, "the meaning of this quantity should be very close to certain measures of difficulty concerning the object or the system in question: the difficulty in constructing an object, the difficulty in describing a system, the difficulty in reaching a goal, the difficulty in performing a task, and so on" [10]. Many definitions of complexity, corresponding to the different ways of quantifying these difficulties, can be found. In the two last decades, diverse complexity measures, as for instance the mutual information, have been proposed to quantify the degree of structure, dependence, or correlation of a system [9,10,6].

The most basic measures of information theory are the Shannon entropy which quantifies the information content of a random variable, and the mutual information, which expresses the shared information in a communication channel.

Given two discrete random variables, X and Y, with values in the sets X={x₁,...,x_n} and Y={y₁,...,y_m}, respectively, the mutual information [4] between X and Y is defined as

where n=|X|, m=|Y|, p_i = Pr[X = x_i] and q_j = Pr[Y = y_j] are the marginal probabilities, and p_ij = Pr[X = x_i,Y = y_j] is the joint probability. I(X,Y) is a measure of the shared information or correlation between X and Y. If the logarithms are taken in base 2, mutual information and entropy are expressed in bits.

For continuous sources X and Y, the continuous MI is defined as

where values are taken from a continuous set S, p(x), and p(y) are the marginal probability functions associated with X and Y, and p(x,y) is the joint density function.

If we divide the range of the continuous random variable X into n bins of length Δ, and we consider its discretised version X^Δ (see [4]), the MI between two continuous random variables X and Y is the limit of the MI between their discretised versions. Thus, when the size of bins tends to zero, lim_Δ→0I(X^Δ,Y^Δ) = I^c(X,Y).

3. Inner Shape Complexity

In this section, the shape complexity of an object is analysed from its interior. In [5], continuous mutual information was introduced as a complexity measure in order to evaluate the difficulty of discretising a scene and to obtain a refinement criterion for hierarchical radiosity. In this paper, continuous mutual information is proposed to measure the shape complexity.

3.1. Complexity Measure

The approaches taken by Page [13] and Feldman [7] consider the information or entropy contained in the curvature of an object. In [13], the quantity measured depends on both the size and positioning of the (regular) discretisation of the curve or surface. A discretisation that misses some corner would give an incorrect measurement. Moreover, the entropy is a function that diverges with the number of discretisation bins. On the other hand, in [7], the curvature distribution is a generic distribution that is apt for a generic study of local complexity of a region of the curve but not to compute the whole complexity of the object.

We need a measure able to compute the shape complexity of an object independently of the discretisation. In the information theory field, continuous mutual information fulfils this requirement.

The basic idea of our approach is that a distribution of global lines crossing an object defines a continuous information channel X→Y, where X and Y represent the set of contour or surface points. Consequently, the continuous mutual information between the internal points of the object can be calculated.

To apply (3), we need to define joint p(x,y) and marginal p(x), p(y) distributions. We take as joint distribution p(x,y)=F(x,y)/L_K (or p(x,y)=F(x,y)/A_K), and marginal distributions result in p(x)=p(y)=1/L_K (or p(x)=p(y)=1/A_K), where L_K and A_K are respectively the contour length and the surface area. Remember from Section 2.1 that p(x,y) is the global line density between differential elements dx and dy.

Thus, mutual information I^c between two continuous random variables X and Y is given by

where ∂K represents the contour (2D) or the surface (3D) and μ(∂K) is the measure of the boundary of K (L_K and A_K for 2D and 3D objects respectively).

We can also easily study the complexity of a given region R in ∂K. It will be given by

Regions corresponding to extrema will have higher complexity than smooth regions, due to the angular dependency of F(x,y). And regions with minima (concave regions) will have in general higher complexity than regions with maxima (convex regions), as suggested in [7].

3.2. Monte Carlo Computation of MI

For 3D-objects, inner 3D-shape complexity is defined by

where S stands for the internal surfaces of the object.

This continuous MI can be efficiently computed with Monte Carlo integration by sampling global lines, as global lines crossing dx and dy are distributed according to F(x,y)dxdy. Thus, the density p(x,y)=F(x,y)/A_K, that will be used in the Monte Carlo integration of (6), is sampled by intersecting global lines with the object. We obtain

where N stands for the total number of segments of the global lines or the number of pairs of points considered, which is the total number of intersections divided by two (see Fig. 1). The term of the sumatory is the contribution of each chord to the complexity and we call it chord complexity.

Continuous MI is invariant to translations, rotations and a change of scale. As we have seen in Sec. 2.1, point-to-point form factor gives the density of uniformly distributed lines crossing differential areas with centre at these points, and by definition this density is invariant under translations and rotations. In addition, scale invariance is easily seen from formula (7), where a scaling of the distances is compensated by the corresponding scaling of the total area.

Observe that chord complexity is bigger for small chord lengths and for angles near to zero. Thus, regions corresponding to corners or narrow spaces will contribute more to MI (shape complexity). For the interior of an empty sphere, since any pair (x,y) fulfils F(x,y)=1/A_K, the result obtained is, as intuitively expected, I^c=0.

Note also, that from chord complexities we could obtain a shape complexity distribution of the object to be applied in object recognition, classification, clustering, and retrieval, analogous to [11,17,12].

3.3. Inner 3D-shape Complexity Results

In this section, we show the inner shape complexity (column MI of Table 1) of a set of 3D objects (Fig. 2). The Monte Carlo integral (7) has been solved by casting 10⁵ global lines. As we have seen, the sphere has the minimum complexity (I^c=0), and among the platonic solids, the minimum and maximum complexity correspond, respectively, to icosahedron and tetrahedron, since at the corners of a tetrahedron the dependence between the parts is greater than at the corners of an icosahedron. As expected, the polyhedra that are nearer to the sphere are less complex, i.e., they have less correlation. In fact, it can be considered that MI measures how "distant" is any object from the sphere. Observe also that the complexity of in-stellated hexahedron is bigger than the one of the out-stellated icosahedron because the hexahedron has folded faces that leave very narrow space between them.

Table 1. MI and dMI for the objects in Fig.2.

Fig. 2. Collection of test objects of Table 1: platonic solids and other geometrical shapes.

On the other hand, in Fig. 2 we have a collection of common objects with some modifications. Thus, similar shapes have MI values in a similar range (cylinders, cones, pencils, etc.). Observe that MI ranges from a minimal value for the Cylinder-I (Fig. 2.k), to a maximum value for the Glass-I (Fig. 2.n). The case of the Plate objects (Figs. 2.j and 2.o) is an apparent counter-example, as they are very simple objects but have high MI values. These values are explained from the fact that two very near surfaces contribute with high values to the computation of its MI. Also, Plate-II has a higher MI value that Plate-I because it is a proportionally thinner object. Note that the same kind of behaviour can be observed for both Pencil objects (Figs. 2.m and 2.r).

3.4. Inner 2D-shape Complexity Results

Similarly to (6), the continuous MI in flatland is defined by

where L is the set of segments that form the environment, L_K is the total length of the contour, and F(x,y)=cosθ_xcosθ_y/(2 r_xy)V(x,y) is the point-to-point form factor between x and y. As in (7), this integral can be solved by Monte Carlo integration and the computation can be done efficiently by casting uniformly distributed global lines upon segments [3]. Hence, continuous MI can be approximated by

where N is the total number of pairs of points considered, which is the total number of intersections divided by two.

The closed-form solution of the continuous MI integral for the circle and some regular polygons (hexagon, square, and equilateral triangle) is shown in Table 2. The Monte Carlo integral has also been solved by casting 10⁵ global lines.

Table 2. Exact values for a circle and three regular polygons compared with results obtained by Monte Carlo simulation (MC).

The circle requires special attention. Observe that its complexity is different from zero: I^c=log(π/e). Since a sphere has zero complexity (I^c=0), we could expect the same for a circle. But null complexity for the sphere is due to the fact that a global line can be generated by selecting two random points on its surface. However, in the case of a circle, selecting pairs of random points on its perimeter will not yield a uniform density [3]. In flatland, we can not imagine a scene with less complexity than a circle. In this sense, there is a significant difference between the 3D and 2D worlds.

We have also computed the complexity of a sequence representing the formation of a 12-pointed star and the Von Koch fractal. If we start with a polygon of 24 edges, with a complexity very similar to the one of a circle, and we continue closing the edges as shown in Fig. 3.i, the complexity increases noticeably, due to the growth of the interaction within the edges. In the Von Koch fractal (Fig. 3.ii), a similar thing happens: by increasing the number of corners, the correlation increases.

Fig 3. (i) values for a 24-sided regular polygon and three 12-pointed stars and (ii) for Von Koch fractals.

4. Outer Shape Complexity

In addition to the MI of an object, we introduce a secondary shape complexity measure given by the MI between the object and its minimum circumscribing sphere (Fig. 4). We call the MI of this "new" object (dual-object) dual-MI (dMI). This value can be seen as the increase in MI induced by the introduction of the object within an spherical environment. The choice of this environment is coherent with the fact that the MI of a sphere is zero.

Fig. 4. The dual-object is formed by the space between the object and its circum-scribing sphere.

The sphere of smallest radius that contains an object exists and is unique. Following Gärtner [8], we compute this in linear time with respect to the number of vertices of the object. The dMI, similarly to MI, is also computed from a set of global lines. Note that the same set of global lines can be used to compute both shape complexities. The chords that contribute to the dMI are the complementaries of the ones that are used for MI computation. Note that the sphere vs sphere chords do not contribute to the outer shape complexity, since the term within the logarithm in (7) is equal to 1. If the object is an sphere, a singularity is obtained.

From Table 1, we can analyse the behaviour of our secondary descriptor dMI. We can see that the bigger the \dmi\ the more the interaction between the object and the circumscribing sphere. For instance, we observe that, somehow against intuition, hexahedron and dodecahedron interchange their ordering with octahedron and icosahedron when considering dMI. This is so because the relative volume within the circumscribing sphere is higher for the hexahedron and dodecahedron than for the octahedron and icosahedron, respectively, and consequently the interaction within the dual object becomes stronger. Also, for two similar objects, the more complex is the contour of an object (concavities, rugosities) the higher the dMI value.

It can be seen that the dMI clearly discriminates between the Cone-I (dMI=3.164) and the Cone-II (dMI = 1.985). The same happens for Cylinder-I and II with dMI values of 5.337 and 3.508, respectively. Maybe, the most remarkable example are the objects Glass-I and Glass-II (Figs. 2.n and 2.s, respectively). With very similar MI value, the dMI clearly separates both objects.

5. Conclusions and Future Work

In this paper, integral geometry and information theory tools have been applied to quantify the shape complexity from two different perspectives. First, from the inside of the object, correlation between its surfaces (inner complexity) is considered. Second, from the outside of the object, its degree of interaction with the circumscribing sphere (outer complexity) is evaluated. Our measures are robust and can be efficiently computed using global lines crossing an object and assigning a complexity value to each chord. The measures are independent of the discretisation and invariant to a change of scale, and can be used as shape descriptors for different applications.

In our future work, we will study the application of these complexity measures as shape descriptors. We will also analyse the distribution of chord complexities applied to shape analysis and object recognition.

Acknowledgements

This project has been supported by TIN2004-07451-C03-01 (Spanish Government) and by the Game Tools FP6-004363 EU project.

References

[1] R. Badii and A. Politi. 'Complexity. Hierarchical Structures and Scaling in Physics'. Cambridge University Press, 1997.

[2] F. Castro and M.Sbert. 'Aplication of quasi-monte carlo sampling to the multi-path method for radiosity'. In 3rd International Conference on Monte Carlo and Quasi-Monte Carlo methods in Scientific Computing, June 1998.

[3] F. Cazals and M. Sbert. 'Some integral geometry tools to estimate the complexity of 3D scenes'. Research Report 3204, INRIA, July 1997.

[4] T.M. Cover and J.A. Thomas. 'Elements of Information Theory'. Wiley Series in Telecommunications, 1991.

[5] M. Feixas, E. del Acebo, P .Bekaert, and M. Sbert. 'An information theory framework for the analysis of scene complexity'. Computer Graphics Forum (Proceedings of Eurographics'99)}, 18(3):95-106, September 1999.

[6] D.P. Feldman and J.P. Crutchfield. 'Discovering noncritical organization: Statistical mechanical, information theoreticand computational views of patterns in one-dimensional spin systems'. Working Paper 98-04-026, Santa Fe Institute, Santa Fe (NM), USA, April 1998.

[7] J. Feldman and M. Singh. 'Information along contours and object boundaries'. Psychological Review, 2004.

[8] B. Gärtner. 'Fast and robust smallest enclosing balls'. In Proceedings of 7th Annual European Symposium on Algorithms (ESA), Lecture Notes in Computer Science, 1643:325-338, Springer-Verlag Vienna-New York, 1999.

[9] P. Grassberger. 'Toward a quantitative theory of self-generated complexity'. International Journal of Theoretical Physics, 25(9):907-938, 1986.

[10] W. Li. 'On the relationship between complexity and entropy for markov chains and regular languages'. Complex Systems, 5(4):381-399, 1991.

[11] A.B. Novikoff. 'Integral Geometry as a Tool in Pattern Perception'. Pages 347-368, Pergamon Press, Elmsford (NY), USA, 1962.

[12] R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin. 'Shape distributions'. ACM Transaction on Graphics, 21(4):807-832, October 2002.

[13] D.L. Page, A.F. Koschan, S.R. Sukumar, B. Roui-Abidi, and M.A. Abidi. 'Shape analysis algorithm based on information theory'. In IEEE International Conference on Image Processing (Proceedings of ICIP'03)}, 1:229-232, September 2003.

[14] L.A. Santaló. 'Integral Geometry and Geometric Probability'. Addison-Wesley, Reading (MA), USA, 1976.

[15] M. Sbert. 'An integral geometry based method for fast form-factor computation'. Computer Graphics Forum (Proceedings of Eurographics'93), 12(3):409-420, 1993.

[16] H. Solomon. 'Geometric Probability'. Volume 28 of CBMS-NFS Regional Conference Series in Applied Mathematics (SIAM), Philadelphia (PA), USA, 1978.

[17] E. Wong and J. Steppe. 'Invariant Recognition of Geometric Shapes'. Pages 535-546, Academic Press, New York (NY), USA, 1969.