Towards Real-Time Design Drawing Recognition

 

 

Dr. ir. Henri Achten

Eindhoven University of Technology

Faculty of Architecture, Building and Planning

Design Systems

 


 Contents

 

 


 

Abstract:

In this paper, we present a theoretical study on automated understanding of the design drawing. This can lead to design support through the natural interface of sketching. In earlier work, 24 plan-based conventions of depiction have been identified, such as grid, zone, axial system, contour, and element vocabulary. These are termed graphic units. Graphic units form a good basis for recognition of drawings as they combine shape with meaning. We present some of the theoretical questions that have to be resolved before an implementation can be made. The contribution of this paper is: (i) identification of domain knowledge which is necessary for recognition; (ii) outlining combined strategy of multi-agent systems and online recognition; (iii) functional structure for agents and their organisation to converge on sketch recognition.

Key words: Multi-agent system, decision tree, pattern recognition, sketch

 

 

1. Introduction

 

During the design process, roughly three classes of graphic representations are utilised: the diagram, the design drawing, and the sketch. The diagram is a clear, well-structured schematic representation of some state of affairs. It is typically used in the early phase of design, very often as an analytical tool. The design drawing is a well-drafted comprehensive drawing applying the techniques of plan, section, and fa�ade. It is typically utilized after concept design, to communicate ideas with the patron or other participants in the design process. The sketch is a quickly produced rough outline of a design idea, typically created in the early phase of design. Within each class there are many differences due to production technique, time constraints, conventions of depiction and encoding, and personal style.

The diagram and sketch are very apt for early design. They enable fast production of numerous ideas, thereby allowing the architect to engage in an iterative process using internal and external memory, reflection, exploration, and test [17], [40]. In this work we are concerned with computational interpretation of such drawings. We are motivated from a design support and design research perspective. Much of the output in early design is produced by means of diagrams and sketches. Understanding these drawings supports early design assessment, can provide relevant knowledge, aid in information exchange, and facilitate transfer of information between design phases or applications.

In order to achieve understanding of graphic representations, we need a theory for computational interpretation of drawings. Such a theory is lacking in the design research field ([23], pp. 174-175; [41], pp. 520; [11], pp. 6) and in general recognition research [37], [28], pp. 23. An additional complication is the general oversight of context [36] and annotations [27], pp. 432-433, the lack of which leads to unrealistic assumptions about image recognition. As a consequence, we have to draw our foundations, evidence, and intuitions from many disciplines including design research, image and pattern recognition, handwriting recognition, artificial intelligence, and multi-agent systems. The contribution of this paper is: (i) identification of domain knowledge which is necessary for recognition; (ii) outlining combined strategy of multi-agent systems and online recognition; (iii) functional structure for agents and their organisation to converge on sketch recognition.

Design sketches have been subject of much study. Although there is now general agreement that the sketching activity is structured [20], [35], there is no consensus in terminology and framework how to categorise this structuring. This seems to be mainly caused by the great variety of research questions, design domains, and agendas pursued (compare for example [17], [32], [27], [34], [25], [5], [38]). Most authors employ their own categorisation to differentiate between kinds of drawing (e.g. [17], pp. 128-136, [41], and [27]).

Applied research on sketch recognition in the engineering areas has focused almost exclusively of the construction or identification of three-dimensional objects based on sketches or line drawings (e.g. [19], [10], [38], [31], [7]). With regard to formalised content of architectural sketch drawings, most work has been done on the plan representation. Koutamanis [21] outlines a computational analysis of space formation in architectural plans. Cha and Gero [8] formalise a number of shape configurations that may occur without consideration of design content. Do [13] and [18] deal with graphic shorthands of for example �table,� �chair,� and �house.� Koutamanis and Mitossi [22] focus on local and global co-ordinating devices such as �door in wall� and �proportion system� of Palladio. Leclercq [24] has implemented a system that takes sketch input and recognises grids, spaces, and functions. In earlier work, we have classified 24 conventions that are used in drawings such as �grid,� �zone,� �contour,� and �axial system� [1].

1.1 Basic assumptions

Domain knowledge is important for drawing recognition as it informs what should be looked for. Based on the authors above, we can state that the kind of knowledge we need combines meaning with shape. Such meaning necessarily is based on agreement or convention within a domain: it is not intrinsic to a graphic representation. In our work, we have the following basic assumptions:

    There is nothing inherently ambiguous in graphic representations. Ambiguity through multiple interpretation is what the architect does with the graphic representation, not the graphic representation itself.

    We primarily consider sketches that are made with some care for clarity (and which can thus be verified by an outside observer). Sketches that are purposefully unclear fall outside our scope. Within all possible conventions of depiction, we only look at plan representations.

    Although designers habitually reinterpret sketches, they do so in an orderly fashion and in a limited way. Reinterpretations do not wildly diverge and are relatively close to each other. The question therefore, is not to investigate why sketches are ambiguous, but rather which clues architects employ and which interpretations they allow.

    We limit our definition of computational interpretation of a sketch to instrumental meaning (what do the graphic entities map to the design task), not architectural theoretical meaning or other domains of discourse. Instrumental meaning is domain-specific and context dependent. We take the previously identified 24 graphic units as instrumental meaning in graphic representations.

1.2 Graphic units

The definition of a graphic unit is: �a specified set of graphic entities and their appearance that has a generally accepted meaning within the design community� [1], pp. 22. The appeal to �generally accepted meaning,� even though it introduces methodological problems, is necessary because meaning is in many cases the only distinguishing factor between otherwise similar shapes (e.g. �is this set of straight lines a grid or a close packing of squares,� and �is this rectangle a table or a column?�) Graphic units can be divided in structuring and descriptive graphic units.

Structuring graphic units build up and organise the design. Their elements only indirectly map on elements in the built environment, and they are typically left out in final documentation drawings. They are measurement device, zone, schematic subdivision, modular field, grid, refinement grid, tartan grid, structural tartan grid, schematic axial system, axial system, proportion system, and circulation system.

Descriptive graphic units map to objects in the built environment: simple contour, contour, specified form, elaborated structural contour, complementary contours, function symbols, element vocabulary, structural element vocabulary, combinatorial element vocabulary, functional space, partitioning system, and circulation.

 

Graphic Unit

Description

Simple contour

Regular shape showing an outline.

Contour

Any irregular shape showing an outline.

Measurement device

Measure for establishing (relative) dimensions.

Specified form

Contour with specified dimensions.

Elaborated structural contour

Outline with structural detail.

Complementary contours

Composition of outlines.

Function symbols

Textual indication of function.

Zone

Area with specific use or function.

Schematic subdivision

Schematic depiction of principal subdivision.

Modular field

Irregular subdivision of area along coordinating lines.

Refinement grid

Grid with smaller module coordinated in other grid.

Schematic axial system

Schematic depiction of organisation of axes.

Axial system

Organisation of axes applied to building design.

Grid

System of modularly repeating coordinating lines.

Tartan grid

Double grid based on two alternating modules.

Structural tartan grid

Tartan grid with structural elements.

Element vocabulary

Set of simple shapes depicting (interior) elements.

Structural element vocabulary

Set of simple shapes depicting structural elements.

Functional space

Outline combined with function indicator.

Partitioning system

Schematic depiction of more detailed subdivision.

Proportion system

Diagram showing how proportions are derived.

Combinatorial element vocabulary

Precise relationships between particular elements.

Circulation system

Principal layout of circulation.

Circulation

Layout of circulation applied to building design.

  Table 1. Graphic units

Sketch recognition therefore, translates in our view to recognition of graphic units in a plan-based diagrammatic drawing or sketch which has been created with some care for clarity.

 

 

 

 

 

2. Graphic units

 

Image recognition research mainly focuses on photo- or video-like sources, or aims to mimic some functionality of the human visual system. Line art or line drawings are quite distinct from these kind of sources. In particular handwriting recognition has received a lot of attention, but in the area of drawings there has not been much work. Comparatively much effort has been given to automated document management, in particular in the transfer from paper-based maps to the electronic format of GIS and CAD [30]. Other work includes the reconstruction of telephone system manhole drawings [6], conversion of machine engineering line drawings [43], and symbol detection in architectural drawings [4]. In all cases the processed sources are the precise class of design documentation drawings.

Most approaches in image recognition are built on multiple levels of (similar or mixed) recognisers that incrementally process or reason about information delivered from lower levels. A recent trend is to incorporate statistical techniques [15], although there is concern to which extent this can link to domain knowledge [28]. Questions in this area are about the often manual tuning of the system, incorporation of domain knowledge, and the problem of low performance when variations in the drawings occur. It is to be expected that this latter problem is much larger when the input material is a sketch drawing. This calls for a strategy which can deal with much uncertainty in the recognition process.

We propose to combine two techniques to tackle the question of drawing recognition: (1) multi-agent approach; and (2) online recognition. We discuss these two strategies in the context of a hypothetical drawing system that recognises graphic units. The hypothetical system has the following components: (a) drawing area; (b) drawing pen (similar to the technique reported in [9]); (c) module for tracking and segmenting strokes made by the pen; (d) multi-agent module for determining which graphic units are present in the drawing; (e) visual display for feedback of d-module.

2.1 Multi-agent approach

It has become increasingly productive to recast the multiple classifier approach in terms of multi-agent systems. Although the classifiers remain the same ([29] and [16] point out the utility of aggregating various classifiers), the approach adds to the conceptual level a more autonomous role to each classifier; it acknowledges explicitly the limited capabilities per classifier; and it realises that classifiers (in the guise of agents) should communicate with other classifiers to settle ambiguities ([42], [14]). The parallelism inherent in multi-agent systems is another major motivation. In particular when multiple interpretations are possible, resolution critically depends on a weighed and balanced exchange of viewpoints. In sequential processing this may lead to long waiting times for a decision-module to gather all relevant evidence.

Earlier, we have established a multi-agent framework that forms the basis for building a drawing recognition system [3]. An agent in the framework has input, output, and an internal state and processes that are closed to the outside world. The input part senses the world environment and receives broadcast messages. The output part manipulates the world environment and broadcasts messages. Agents operate independently. It is possible to instantiate any number of agents of a given type. The multi-agent system is multithreaded, having all the agents run continuously at the same time. As in this way it is not possible to predetermine in which order agents perform their actions, the design of the agents� behaviour has to anticipate various orders. We establish implicit control through the broadcasts. An agent reads the broadcasts and selects those messages that are relevant. The agent�s implementation is basically as follows:

����������� Wait for a message (waiting state).

����������� If the message is not interesting, remain in waiting state.

����������� Do something with the message.

����������� Send messages.

����������� Interact with the environment (if the agent can manipulate).

����������� Return to waiting state.

The functional behaviour of an agent Ai is described by its properties {Pi,Gi,Fi,Ci,Si}, where P is purpose, G is goal, F is a set of features, C is a set of criteria, and S is a set of segmentation measures. They are informally defined as follows.

The purpose Pi of an agent Ai is to recognise one particular graphic unit. Thus, there is a Grid-agent, Zone-agent, Circulation System-agent, Simple Contour-agent, etc.

The goal Gi of an agent Ai is to determine whether the graphic unit Pi is present in the current state of the drawing. Thus, the Grid-agent continuously checks for grids.

Each agent Ai employs a set of features Fx to decide if graphic unit Pi occurs. Thus, the Grid-agent looks for occurrences of parallel (F1) aligned (F2) straight lines (F3) drawn consecutively (F4) in the same direction (F5) of roughly the same length (F6) in two directions (F7) in which the parallel lines keep the same distance from each other (F8), and the lines in two directions overlap each other in an implied grid area (F9). Obviously, multiple agents can use the same features.

Each agent Ai has a set of criteria Cy which establish a relative measure RMi of the degree to which graphic unit Pi is detected. Thus, the Grid-agent wants to detect at least three consecutive lines that fulfil F1 - F9 before it considers the possibility that a grid occurs. If the number of consecutive lines increases that fulfil F1 � F9, the relative measure also increases according to an S-shaped curve.

Each agent Ai has a set of segmentation measures Sz to determine if it should continue tracking the current sequence of strokes graphic unit Pi. This basically defines the activation window during which the agent considers the current sequence of strokes. Thus, the Grid-agent stops tracking a series of consecutive lines when curved lines occur, or when new lines no longer fall within the implied grid area without extending it, and so forth.

2.2 Online recognition

Online recognition means that computer interpretation takes place while the designer is drawing. In particular in the handwriting recognition area, numerous researchers opt for online recognition of text ([44] contains many examples), motivated especially by the high efficiency of the stroke direction feature ([26], pp. 2271). To the best of our knowledge, there is no online computer interpretation of sketches applied in architectural design, and only sporadically in engineering design ([12] and [33] are notable exceptions). This is an omission since the creation order of strokes and their direction information can only be derived from an online process. Especially in the case of diagrams and sketches, where the appearance of elements shows high degree of variety, these are important clues to derive what is being drawn (e.g. features F4 and F5 of the Grid-agent example).

The c-module in the hypothetical sketch system creates a data stream of strokes which forms the input for the multi-agent system in the d-module. The agents continuously parse the stream in the manner described above; this is in fact their way of �seeing� the drawing. An agent Ai annotates the drawing by placing four kinds of markers in the stream: Ms, Mh, Me, and Mc. The start marker Ms designates the first stroke of a sequence that an agent is tracking. A new start marker can only be placed after the agent has placed an end marker or claim marker. The hypothesis marker Mh designates the end of a sequence while the agent is anticipating that in the near future it will be completed (in this manner, interfering non-fitting strokes can be ignored). The end marker Me designates the last stroke of a sequence that an agent is tracking. A new end marker can only be placed after the agent has placed a start marker. The claim marker Mc is placed once the agent reaches the threshold value and a graphic unit has been identified.

At any point in the stream, any agent can place its own marker; each marker is associated with the agent that has put it there. After parsing and annotating the data stream, the stream is stored for later reference.

2.3 Resolving the decision process

Whenever an agent reaches its threshold value and places a claim, all other agents respond by polling their current relative measures RMi. In order to converge to a decision, we adopt a decision tree developed earlier for architects to decide which graphic unit is present [2]; see Figure 1.

In the decision tree, the path ABDGK17 is the series of questions that must be answered to determine the graphic unit. These are currently stated in natural language: A. Is it a graphic or symbol element; B. Is it a closed shape or a set of one or more lines; D. Is it a coordinating system or not; G. Is it a zone, grid, or proportion system; and K. Is it a modular field, grid, refinement grid, tartan grid, or structural tartan grid. The leaves 1-27 are the specific graphic units.

The decision tree essentially divides graphic units in groups: text (A7); multiple shapes on building level (ABCEI); multiple shapes on element level (ABCEJ); single shapes on building level (ABCF); area structuring devices (ABDG and ABDGK); and building structuring devices (ABDH, ABDHL, ABDHM, ABDHN). The groups are distinguished from each other through the nodes A-N. The decision in each node to decide between one or the order further branch is made on a specific set of features for that node. Therefore, each graphic unit is characterised by a unique decision cluster of features (the aggregation of the nodes leading to that graphic unit).

At any given point during the online recognition process, it is likely that multiple agents have a relative measure RMi which is higher than 0. In other words, features are likely not to activate one single agent (restricted to a single path in the decision tree), but multiple agents (distributed over the nodes of the whole tree). So we can view the distribution of the features over the 24 distinct paths as additional evidence for a ranking which can modify a preliminary ranking based on RMi only. Qualitatively speaking, the path which has the greatest collection of features presents the most likely candidate. We still have to determine a quantitative means of aggregating the features along a path and balancing this measure against RMi.

-

Fig. 1. Decision Cluster ABDGK For Graphic Unit Tartan Grid

 

 

3. Future work

We have outlined a multi-agent system for online recognition of graphic units in diagrammatic and sketch drawings. Empirical work is still needed to calibrate the stroke segmentation of the c-module. The utility of the decision tree is to separate the major groups, and show near the leaves how ambiguity occurs through alternative interpretations. The different depths of the decision tree give a relative indication which groups are easier to recognise than others. A major step we still have to take is the formalisation into features of the natural language questions in the nodes A-N. This also requires an additional revision of the decision tree into truly binary branches per node. Based on the resulting pool of features, we can determine which aggregation policy has the most potential.

A running implementation will provide feedback about the viability of the theoretical work. Supposing that we find some virtue in the current work, further theoretical questions then concern whether an agent should keep a record of its own performance when recognition has to take place; if it is desirable to have each agent segment the input by itself (and how to aggregate over feature activations that are built on different segmentations); and whether an agent should follow multiple tracks in the data stream.

 

 

 

References

[1] J.A.Hartigan, 'Clustering Algorithms', Wiley, New York, 1975.

[1] H.H. Achten, �Generic Representations�, PhD-diss., Eindhoven University of Technology, 1997.

[2] H.H. Achten, Design case retrieval by generic representations. �Artificial Intelligence in Design �00,� ed. J.S. Gero, pp. 373-392. Kluwer Academic Publishers, Dordrecht, 2000.

[3] H.H. Achten and J. Jessurun, Learning from mah jong. �Digital Design: Research and Practice,� ed. M.L. Chiu et al., pp. 115-124. Kluwer Academic Publishers, Dordrecht, 2003.

[4] C. Ah-Soon and K. Tombre, Architectural symbol recognition using a network of constraints, �Pattern Recognition Letters� 22, pp. 231-248, 2001.

[5] O. Akin and H. Moustapha, Strategic use of representation in architectural massing, �Design Studies� 25(1), pp. 31-50, 2004.

[6] J.F. Arias, Lai, C.P., Surya, S., Kasturi, R. and A. Chhabra, Interpretation of telephone system manhole drawings, �Pattern Recognition Letters� 16, pp. 355-369, 1995.

[7] O. Bimber, Encarna�ão, L.M. and A. Stork, A multi-layered architecture for sketch-based interaction within virtual environments, �Computers & Graphics� 24(6), pp. 851-867, 2000.

[8] M.Y. Cha and Gero, J.S., Shape pattern recognition, �Artificial Intelligence in Design�98,� ed. Gero, J.S. and F. Sudweeks, pp. 169-187, Kluwer Academic Publishers, Dordrecht, 1998.

[9] N.Y.-W., Cheng, Stroke sequence in digital sketching. �Architecture in the Network Society,� ed. R�diger, B., Tournay, B. and H. Ørbæk, pp. 387-393, The Royal Danish Academy of Fine Arts, Copenhagen, 2004.

[10] M.C. Cooper, Interpreting line drawings of curved objects with tangential edges and surfaces, �Image and Vision Computing� 15, pp. 263-276, 1997.

[11] D.W. Dahl, Chattopadhyay, A. and G.J. Gorn, The importance of visualization in concept design, �Design Studies� 22(1), pp. 5-26, 2001.

[12] C.G.C. van Dijk and A.A.C. Mayer, Sketch input for conceptual surface design. �Computers in Industry� 34(1), pp. 125-137, 1997.

[13] E. Do, Gross, M.D., Neiman, B. and G. Zimring, Intentions in and relations among design drawings, �Design Studies� 21(5), pp. 483-503, 2000.

[14] M. van Erp, Vuurpijl, L. and L. Schomaker, An overview and comparison of voting methods for pattern recognition, �Proceedings of IWFHR�02,� ed. Williams, A.D., pp. 195-200, IEEE Computer Society, Los Alamitos, 2002.

[15] D. Forsyth, An empirical-statistical agenda for recognition, �Shape, Contour and Grouping in Computer Vision,� ed. Forsyth, D.A., Mundy, J.L., di Ges�, V. and R. Cipolla, pp. 9-21, Springer Verlag, Berlin, 1999.

[16] G. Giacinto and F. Roli, An approach to the automatic design of multiple classifier systems, �Pattern Recognition Letters� 22, pp. 25-33, 2001.

[17] V. Goel, �Sketches of Thought,� The MIT Press, Cambridge, 1995.

[18] M. Gross, The Electronic Cocktail Napkin - a computational environment for working with design diagrams, �Design Studies� 17(1), pp. 53-69, 1996.

[19] H. Jansen and F.-L. Krause, Interpretation of freehand drawings for mechanical design processes, �Computers & Graphics� 8(4), pp. 351-369, 1984.

[20] M. Kavakli, Scrivener, S.A.R. and L.J. Ball, Structure in idea sketching behaviour, �Design Studies� 19(4), pp. 485-517, 1998.

[21] A. Koutamanis, �Development of a Computerized Handbook of Architectural Plans,� PhD. diss., Delft University of Technology, 1990.

[22] A. Koutamanis and V. Mitossi, On representation, �Design Research in the Netherlands 2000,� ed. H. Achten, B. de Vries, and J. Hennessey, pp. 105-118, Eindhoven: Eindhoven University of Technology, 2001.

[23] B. Lawson and S.H. Loke, Computers, words and pictures, �Design Studies� 18(2), pp. 171-183, 1997.

[24] P.P. Leclercq, Programming and assisted sketching. In �Computer Aided Architectural Design Futures 2001,� ed. B. de Vries and J.P. van Leeuwen and H.H. Achten, pp. 15-31, Kluwer Academic Publishers, Dordrecht, 2001.

[25] S. Lim, Qin, S.F., Prieto, P., Wright, D. and J. Shackleton, A study of sketching behaviour to support free-form surface modelling from on-line sketching, �Design Studies� 25(4), pp. 393-413, 2003.

[26] C.L. Liu, Nakashima, K., Sako, H. and H. Fujisawa, Handwritten digit recognition: benchmarking of state-of-the-art techniques, �Pattern Recognition� 36(10), pp. 2271-2285, 2003.

[27] A. McGown, Green, G. and P.A. Rodgers, Visible ideas: information patterns of conceptual sketch activity, �Design Studies� 19(4), pp. 431-453, 1998.

[28] J. Mundy, A formal-physical agenda for recognition. �Shape, Contour and Grouping in Computer Vision,� ed. Forsyth, D.A., Mundy, J.L., di Ges�, V. and R. Cipolla, pp. 22-27, Springer Verlag, Berlin, 1999.

[29] G.S. Ng and H. Singh, Democracy in pattern classification: combinations of votes in various pattern classifiers, �Artificial Intelligence in Engineering� 12, pp. 189-204, 1998.

[30] J.M. Ogier, Mullot, R., Labiche, J. and Y. Lecourtier, Multilevel approach and distributed consistency for technical map interpretation: Application to cadastral maps, �Computer Vision and Image Understanding� 70(3), pp. 438-451, 1998.

[31] P. Parodi, Lancewicki, R., Vijh, A. and J.K. Tsotsos, Empirically-derived estimates of the complexity of labelling line drawings of polyhedral scenes, �Artificial Intelligence� 105, pp. 47-75, 1998.

[32] A.T. Purcell and J.S. Gero, Drawings and the design process, �Design Studies� 19(4), pp. 389-430, 1998.

[33] S.-F. Qin, Wright, D.K. and I.N. Jordanov, On-line segmentation of freehand sketches by knowledge-based nonlinear thresholding operations, �Pattern Recognition� 34(10), pp. 1885-1893, 2001.

[34] P.A. Rodgers, Green, G. and A. McGown, Using concept sketches to track design progress, �Design Studies� 21(5), pp. 451-464, 2000.

[35] S.A.R. Scrivener, Ball, L.J. and W. Tseng, Uncertainty and sketching behaviour, �Design Studies� 21(5), pp. 465-481, 2000.

[36] X.B. Song, Abu-Mostafa, Y., Sill, J., Kasdan, H. and M. Pavel, Robust image recognition by fusion of contextual information, �Information Fusion� 3(4), pp. 277-287, 2002.

[37] K. Tombre, Graphics recognition � general context and challenges, �Pattern Recognition Letters� 16(9), pp. 883-891, 1995.

[38] M. Tovey, Styling and design: intuition and analysis in industrial design, �Design Studies� 18(1), pp. 5-31, 1997.

[39] M. Tovey, Porter, S., and R. Newman, Sketching, concept development and automotive design, �Design Studies� 24(2), pp. 135-153, 2003.

[40] I.M. Verstijnen, �Sketches of Creative Discovery,� PhD. diss, Delft University of Technology, 1997.

[41] I.M. Verstijnen, Hennessey, J.M., Leeuwen, C. van, Hamel, R. and G. Goldschmidt, Sketching and creative discovery, �Design Studies� 19(4), pp. 519-546, 1998.

[42] L. Vuurpijl and L. Schomaker, L.,Multiple-agent architectures for the classification of handwritten text, �Proceedings of IWFHR6,� pp. 335-346, Taejon, Korea, 1998.

[43] L. Wenyin and D. Dori, A generic integrated line detection algorithm and its object-process specification, �Computer Vision and Image Understanding� 70(3), pp. 420-437, 1998.

[44] A.D. Williams, ed., Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition, IEEE Computer Society, Los Alamitos, 2002