Computer Graphics & Geometry
Limitations of the SIFT/SURF based methods in the classifications of fine art paintings
Z. Haladová, E. Šikudová
Comenius University, Bratislava, Slovakia
The area of the image retrieval has come a long way from the basic text based image search to sophisticated content based image retrieval systems. Problems of the semantic gap, on the other hand, causes that it is still not possible to create a system which can correctly identify any object in the image. The paper deals with the limitations that have occurred within the previous work  which was oriented on the classification of the one sort of objects - paintings. In the work three different segmentation (Gauss gradient, Anisotropic Diffusion and Watershed) and two different classification (SIFT  and SURF ) methods were presented and tested on the dataset of 100 Rembrandt�s paintings. In this paper an additional method is proposed and the problems with segmentation methods are discussed. Besides segmentation methods, the limitations of the SIFT and the SURF classification are analyzed.
The approach proposed in this paper consists of a CBIR (Content based image retrieval system) which operates over a database of fine art paintings. This CBIR system is defined (according to the categorization proposed by Data et al. ) as a system operating over domain specific collection, which queries by an image, with content-based processing of the query. The motivation for the creation of the system are deficient retrieval possibilities in the most art web-galleries, where one can retrieve only by the name or the author of the painting. The specific domain of the objects retrieved shifts this work to the scientific area of the classification and the digital preservation of art. The Limitations presented within the paper are also closely connected to the domain of fine art paintings.
The paper is organized as follows. The following section Previous Work proposes the work previously done in the field. The section Datasets describes the datasets used to test and evaluate the work. In the Fourth section Segmentation the four different segmentation methods are introduced. The Classification section deals with the classification and matching methods and the Limitations section talks about the results and limitation of proposed solutions.
For more than 25 years the computer graphics and vision community is focusing on the problems of the restoration and the classification of the fine art painting. In this area the most significant assignments are the categorization of paintings based on the style , distinguishing paintings from real scene photographs  description of painting�s texture using brush strokes  and digital restoration.
However when we want to focus on the task of the database based classification of the fine art paintings we have to consider two projects. The first is the student paper which shares the same assignment with this paper and other is the part of the huge Google goggles project. In the first paper a group of students from Stanford were classifying the paintings from photographs they took in the Cantor Arts Center. Photographs were taken under constant lighting, without any distractive elements covering the painting. Under these conditions segmentation of the painting from the photograph could be achieved with simple threshold method. For the classification they used the matching of the color histograms of the photographs with the database of the color histograms of originals. On the other hand our approach is not limited by the conditions under which the painting is photographed.
When considering the limitations of the segmentation and classification method it is necessary to describe the datasets for which the problems does not occur. For testing and verification of segmentation and classification methods two different datasets were used. Both datasets consist of images of the paintings created by Rembrandt Harmenszoon van Rijn (Figure 1). All of the paintings in the images are of the rectangular shape and are painted by Rembrandt himself, with oil on the canvas The first dataset, (the Originals), consists of 15 photographs of paintings obtained from the Olga�s gallery , the internet gallery with over 10.000 works of art. These photographs contain the paintings without a frame or a wall photographs are in the resolution 600 times 600 dpi.
The second dataset, (the Photographs) includes 100 photographs taken in museums or galleries by tourists with unspecified digital cameras. This dataset contains photographs from the collection of the author of this paper, from the initiative on her website 1 and from the travel.webshots web portal. Photographs are in different resolutions, miscellaneous scales and are taken under varying lighting. In 8 images the painting is partly covered by the bodies of tourists. All of the photographs from both datasets are resized to the width of 600px and converted to gray scale. Conversion to gray scale was done by eliminating the hue and saturation information while retaining the luminance from the HSV representation of RGB values of the image.
Figure1: Examples of the paintings from the Originals (upper right, lower left) and the Photographs (upper left, lower right) datasets.
In the segmentation phase the goal was to segment the rectangular painting and its frame from the input image (from the Photographs dataset). Four different techniques were used, all of them focused on the quadrilateral shape of the painting. The primal one used the Gauss gradient method , in the improved methods the Anisotropic diffusion  or Kuwahara filter was applied and the additional method is based on the watershed transformation . Results of all different methods of the segmentation are presented in the Conclusion section.
Gauss gradient method In the primal method the image is processed using Gauss gradient function which computes the gradient using first order derivative of the Gaussian. It outputs the gradient images Gx and Gy of the input image using convolution with a 2-D Gaussian kernel. In the next phase the Gx and Gy gradient images are send as an input to the Hough transform. The Matlab�s implementation of the
Hough transform is used, since it enables to count the lines from the Hough peaks directly and to connect or trim them based on their length. Lines created in the previous step are then expanded to the borders of the image and lines with big slope are filtered. Lines are then divided into four groups, one for upper, lower, left and right edges. Consecutively, the painting is segmented as the smallest quadrilateral created from the lines. Gauss gradient method is depicted in the figure
Anisotropic diffusion method In this approach firstly the histogram equalization is done. Then the image is processed using Anisotropic diffusion, the technique which smooths the image, but preserves the edges. The function is used with the following parameters: number of iterations = 10, kappa = 30, lambda= 0.25 and option = 1 (kappa controls conduction as a function of gradient, lambda controls speed of diffusion, it is 0.25 for maximal stability, option = 1 means the Diffusion equation, this choice favors high contrast edges over low contrast ones). The output image of the function was then convolved with the horizontal and vertical Sobel edge filter, resulting in two binary images Sx, Sy. The images Sx, Sy are processed equally to Gx and Gy images in the first method, and also the input image is segmented in the same way. Anisotropic diffusion method is presented in the figure 4.
Watershed method In the third approach the input image is firstly preprocessed to enhance edges of the painting's frame. Afterwards the watershed transform is applied. The preprocessing phase consists of 4 steps: 1. Create top and bottom hat of the input image. 2. Create image I2= (I+ tophat)-bottomhat. 3. Create Im3 as extended minima (regional minima of the H-minima transform) of Im2. 4. Im4 is created as the minima imposition from the complement of Im2 with the marker Im3. In the next step, clusters are created with watershed transform applied on the Im4 image. In the last phase the final segmentation is made by growing the background from the corners in the clustered image. Watershed method is presented in the figure 3.
Kuwahara method In the new method the Kuwahara filter is used on the image processed by the histogram equalization. Kuwahara filter preserve edges and can be implement for different window sizes. In our case the filter was used with the square window with parameters J = K = 4L+1, then the window was divide in to 4 regions of the size [(J+1)/2] * [(K+1)/2]. For each region the mean intensity and dispersion is estimated and the intensity of the central point is then computed as the intensity of the region with the smallest dispersion. The filtered image is than processed with the Sobel filter, and the painting is segmented similarly to the Anisotropic diffusion method. The development of the Kuwahara method is depicted in the Figure 2.
Figure 2: Segmentation of the painting using Kuwahara method. Upper Left: Image segmented with Kuwahara filter. Upper Right: Image filtered using vertical Sobel filter.
Lower Left: lines found with Hough transformation. Lower Right: Segmented painting
The process of the classification of the painting segmented from the input image is divided in two steps. First the database of descriptor files of the Originals is created. Then the descriptor file of the painting is matched with the database to find the corresponding original. The descriptor file is a N by M matrix, where N is number of the interest points found in the image and M is the length of the descriptor (128 values for SIFT and 64 for SURF). The two different methods are used for producing the descriptor files. SIFT (Scalable Invariant Feature Transform) developed by D. Lowe  and SURF (Speeded Up Robust Features) developed by H. Bay et al..
SIFT method consists of a detector and a descriptor of features invariant to translation, rotation, scale, and other imaging parameters.
In the first step of the method the interest points (IPs) are identified in the image by the detector. Then the descriptor for each IP is created. The detector is localizing the IPs in the scale-space pyramid, which is created by the consequent scaling of the image, its filtering with the Gaussian kernel and the subtraction of subsequent filtered images in each scale. IPs are chosen as the local extremes in the 3x3x3 neighborhood.
The descriptor for each IP is summarized from the orientation histograms of 4x4 subregions of the IP neighborhood. In every sample point of the subregion the size and the orientation of the gradient is computed and weighted by the Gaussian window indicated by the overlaid circle. Orientation histogram with 8 directions is created from these values. The descriptor of IP than consists of 8 values for all 16 subregions (128 values).
SURF likewise SIFT includes the detector and the descriptor. SURF also operates in the scale-space for identifying the IPs, but unlike SIFT it convolve the original image with the different scales of the box filters (approximations of the Gaussian second order partial derivatives in the y a xy directions). In order to localize IPs in the scale spaces, a non maximum suppression in a 3x3x3 neighborhood is applied.
64 valued descriptor is created in a few steps. Firstly, the dominant orientation of the IP is extracted from the circular neighborhood as the longest vector estimated by the calculating the sum of all response (the Haar-wavelet responses in the x and y direction, weighted by the Gaussian window) within a sliding orientation window covering the angle of π/3.
Than, the square region around the IP is created and oriented along the dominant orientation. Lastly, the region is divided into 4x4 subregions. In every subregions 4 features are counted from 5x5 uniformly distributed points. These 4 features are Σdx, Σdy, Σ|dx|, Σ|dy| : sum of Haar-wavelet responses in the horizontal and vertical direction and the sum of absolute values of Haar-wavelet responses in the horizontal and vertical direction. Four features for all of 16 regions produce the 64 values for every IP.
Figure 3: Example of the incorrectly matched keypoints Figure 4: Example of correctly matched keypoints
In the matching phase of the classification the binary representations of the descriptor files from the database are loaded and matched with the descriptor file of the segmented painting (DF1) using the nearest neighbor technique. In both SIFT and SURF approaches for each descriptor file (DF2) from the database, the value of the matching with DF1 is counted. For every row (corresponding to the descriptor of one IP) of DF1 the nearest neighbor and the second nearest neighbor from DF2 is counted. Nearest neighbor is a row from DF2 with the smallest Euclidean distance from the DF1 row. The matching value is then the sum of DF1 rows for which the nearest neighbor has value smaller than distanceRatio times second nearest neighbor. The distanceRatio was determined by authors of SIFT and SURF as 0.6 and 0.7. The painting from the Originals dataset with the greatest matching value is elected as the painting best corresponding to the input image (Example of the corresponding paintings with linked keypoints is presented in the Figure 4.).
It is possible that two non corresponding paintings have the matching value greater a 0 (Figure 3. shows the example of the incorrectly matched keypoints). This is caused by the fact that features recognized by the SIFT and SURF are not distinctive at 100% and additional inaccuracies are caused by the blur and the noise. In order to prevent incorrect classification of the paintings not present in the Originals database, the threshold for minimal matching value is established. If the DF2 best corresponding to the DF1 of the input painting has the matching value smaller than the threshold, the input painting is not present in the Originals dataset.
In this paper the shortcomings and limitations of proposed methods are presented. Limitations of the segmentation are evidently connected with the usage of the Hough transformation and the assumption of the tetragon painting. With these settings algorithm is not capable of detecting paintings not defined by the rectangle. This will cause segmentation problems on the paintings with the oval, round or irregular shape. On the other hand the algorithm have achieved 89% of correctly segmented paintings with the Kuwahara and Anisotropic Diffusion methods on the dataset of rectangular paintings proposed in the dataset section.
The problems connected with the classification are not so obvious. The core of these problems lies in the detection and definition of the interest points (IPs) in the SIFT/SURF based methods. These methods described IP by its descriptor which is created form the neighborhood of the IP. In the case of the classification of the traditional western art works painted by the hand, one can assume that two descriptors of the IPs from different paintings will not be as close as the IPs from the same painting in the descriptor space. This was proved by the 90 % of correctly classified paintings. Authors consider that the distance between different IPs is caused by the uniqueness of the artists brush strokes, materials, colors and other factors. On the other hand this will cause problems in the classification of some categories of modern paintings. The typical artist whose paintings are not classifiable by SIFT/SURF methods is Piet Mondrian (Figure 5.). Two different paintings made by this author can have up to 22 matches (the threshold for the corresponding paintings was set within the previous paper on 8 matches). Similar problem was also identified in the area of the registration algorithms. The solution proposed in the Kanazawa et al. imposes non-local constraints that should be approximately satisfied across the image.
Figure 5. Painting from Piet Mondrian
 Gertjan J. Burghouts and Jan-Mark Geusebroek. Performance evaluation of local colour invariants. Comput.Vis. Image Underst., 113(1):48�62, 2009.
 Etezadi-Amoli M. Chang C. and Hewlett M. A day at the museum., 2009. http://www.stanford.edu/class/ee368/Project07/reports/ee368group06.pdf.
 Florin Cutzu, Riad Hammoud, and Alex Leykin. Distinguishing paintings from photographs. Computer Vision and Image Understanding, 100(3):249 � 273, 2005.
 Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z.Wang. Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv., 40(2):1�60, 2008.
 Xiong G. Gradient using first order derivative of gaussian., 2009. http://www.mathworks.com/matlabcentral/fileexchange/8060-gradient-using-firstorder-derivative-of-gaussian.
 Shuqiang Jiang, Qingming Huang, Qixiang Ye, and Wen Gao. An effective method to detect and categorize digitized traditional chinese paintings. Pattern Recognition Letters, 27(7):734 � 746, 2006.
 Thomas Edward Lombardi. The classification of style in fine-art painting. PhD thesis, New York, NY, USA, 2005. Adviser-Tappert, Charles and Adviser-Cha, Sung-Hyuk.
 David G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91�110, 2002.
 Jiri Militky. Image analysis and matlab., 2009.http://centrum.tul.cz/centrum/itsapt/Summer2005/files/militky 6.pdf.
 Mataev Olga. Olga�s gallery., 2009. http://www.abcgallery.com/index.html.
 Pietro Perona and Jitendra Malik. Scale-space and edge detection using anisotropic diffusion. Technical report, Berkeley, CA, USA, 1988.
 Hough P.V.C. Method and means for recognizing complex patterns.
 Robert Sablatnig, Paul Kammerer, and Ernestine Zolda. Hierarchical classification of paintings using face- and brush stroke models. In in Proc. of 14th Int. Conf. on Pattern Recognition, pages 172�174, 1998.
 Yasushi Kanazawa and Kenichi Kanatani. Robust image matching preserving global consistency. 2004.
 Haladova Z. Segmentation and classification of the fine art paintings. CESCG, TU Wien, 2010.
 M. Kuwahara. Processing of ri-angiocardiographic images. in Digital Processing of Biomedical Images, pages 187�203, 1976.