Computer Graphics & Geometry
Visual Querying for Molecular Dynamics
Olga Sourina and Yubo Wang
Nanyang Technological University, Singapore
Contents
Nowadays, biologists deal with gigabytes of data as the results of molecular dynamics simulation. To understand and interpret the molecular dynamics results and to come up with new hypotheses the user often needs visualization and querying tools to be interactively involved in the process of visual data mining and querying of the spatio-temporal data. In this paper, we propose a novel visual query system for visualization of molecular dynamics simulation results and visual querying. The system design and its implementation are described. We propose specific visual queries for visual data analysis in molecular dynamics. The proposed visual query system is unique and allows us to formulate spatio-temporal queries that cannot be implemented directly using any available spatio-temporal database system and/or molecular visualization program. This paper is the result of collaborative research and graduate study.
The results of molecular dynamics simulation are spatio-temporal data describing the movement of atoms/molecules in molecular system. The volumes of data are extremely large, and thus it is tedious to sort and process such data one by one. Usually, it is necessary to study part of the data, for example, data within a certain region and/or time interval. But existing systems of molecular visualization and analysis do not provide such geometric queries. After studying domain of molecular dynamics, and molecular visualization and analysis systems we developed novel techniques for visual data mining of time-dependant data using arbitrary shape range querying. In work [1], a geometric query model for relational databases with implicit functions [2] was proposed. Then, in works [3], the uniform geometric query model to handle spatio-temporal data and elaboration on application of the model for computer simulation analysis in molecular dynamics were proposed and described. In this paper, we describe design and implementation of the novel visual query system for molecular dynamics application named Molecular Dynamics Visual Query System (MDVQS). In our system, we use function-based model of geometric solid in spatio-temporal predicate to query spatio-temporal data. Function-based spatio-temporal predicates firstly were introduced in work [4]. Thus, proposed visual query system MDVQS allows us to visualize spatio-temporal data and to pose time-dependent complex shape queries on spatio-temporal data. First, the molecular system is visualized for all time frames. Then, the user analyses the data visually and formulates different geometric hypotheses that can be tested by posing and implementing spatio-temporal queries. MDVQS allows us to pose complex shape queries on data changing over time visually. We introduced three basic types of queries for data analysis in molecular dynamics. MDVQS was implemented by using 3D computer graphics system Visualization Toolkit (VTK) [5]. MDVQS was coupled to gOpenMol system to provide the user with full spectrum of other analysis tools as well. System gOpenMol [6] is a tool for visualization and analysis of molecular structures combined with several applications for data analysis.
The paper is organized as follows. In the next Section, molecular visualization and analysis system gOpenMol is briefly reviewed. Then, spatio-temporal querying model we use in our system is introduced in Section 3. In Section 4, design and implementation of the visual query system MDVQS is described. Finally, we discuss possible future work.
2. Molecular Visualization and Analysis Systems
Molecular visualization and analysis systems use advantages of computer graphics, data mining, virtual reality, and even cognitive psychology to provide biologists with a deep insight into complex structures, fine features, and obscure patterns in large-scale datasets. There are different molecular visualization programs with possibilities for geometry analysis and visualization within existing packages for molecular dynamics simulation. With VMD [7] and GOpenMol [6] software systems we can display, animate, and analyze large biomolecular systems using 3-D graphics and built-in scripting. But structure geometry analysis still could be improved. In VMD [7], extensive atom selection syntax is implemented but range queries are limited to a sphere region - �find atoms within 6 Å of (1, -2.3, 0)�. System gOpenMol is a tool for visualization and analysis of molecular structures combined with several applications for data analysis and presentation originated from quantum mechanics, molecular dynamics and other computational chemistry calculations. But gOpenMol lacks the tools for visual analysis of specific space regions as well. Thus, we studied visual molecular dynamics system gOpenMol, and developed additional functions that could bring up new solutions to the problems in molecular dynamics. gOpenMol is a tool for the visualization and analysis of molecular structure and its chemical properties. The program has a graphical user interface (GUI) and an internal command line interpreter based on the Tcl. gOpenMol can be used for the display and analysis of molecular structures and properties calculated with external programs. In our system, gOpenMol can be used to look through dynamic molecular data presented in �xmol� format. In Figure 1 (a-b), visualization of molecular system at two time frames with gOpenMol is shown.
(a) (b)
Fig. 1. Visualization of dynamic molecular data at different time points by gOpenMol: (a) time frame #1. (b) time frame #2
An example of �xmol� format file is shown in Figure 2.
Fig. 2. Snapshot of �xmol� file
This is a common molecular dynamics data format used for input and output in different molecular dynamics software systems. The number in the first line indicates the number of atoms and molecules. The second line specifies the snapshot time point. Other lines start with chemical symbols of the atoms or molecules followed by x, y and z coordinates of the atoms. There are many snapshots of data corresponding to time sequence.
3. Spatio-temporal Querying Model
Now, let us introduce the formal mathematical specification of the spatio-temporal model used in our system. As it was mentioned in Introduction, the model was first described in work [1], and it was extended to handle time-dependant data in [3-4].
In this model, a geometric object can be a set of points
P = {[x1, x2., . . ., xn, t]}= {[X, t]}
in n dimensional Euclidean space En, and t is time.
Primitive solid objects are defined with implicit functions as f(x1, x2., . . . ., xn) ³0 in Euclidean space En. The implicit function f(x1, x2 ,. . . . ,. xn) ³0 can be defined analytically or by procedure. Such functions define closed n-dimensional objects in En space under the following conditions:
f(X) > 0 - for the points inside the object,
f(X) = 0 - for the points on the object boundary,
f(X) < 0 - for the points outside the object.
In the model, query solid can have time-dependent parameters and/or coordinates. Thus, the geometric query model consists of the following geometric objects:
time-dependent 3-dimensional geometric object formed by n-dimensional points P = {[x1, x2 . . . . . xn, t ]} where t is time;
time-dependent 3-dimensional primitive geometric objects for the construction of a query solid using geometric operations.
Here, we give examples of ellipsoid and cone described with implicit functions as follows.
Ellipsoid:
G1: f1(X, t) = 1- ((x1- x0,1[t])/a1 [t])2 - ((x2- x0,2[t])/a2 [t])2 - ((x3- x0,3[t])/a3 [t])2 ³ 0
where x0,1, x0,2,, x0,3 Î R and a1, a2, a3 Î R.
Cone:
G1: f1(X, t) = ((x1- x0,1[t])/a1 [t])2 - ((x2- x0,2[t])/a2 [t])2 - ((x3- x0,3[t])/a3 [t])2 ³ 0
where x0,1, x0,2, x0,3 Î R and a1, a2, a3 Î R.
Geometric operations are applied to primitive geometric objects to obtain complex geometric shapes at each time point. The analytical definition of set-theoretic operations is realized in the form proposed by (Ricci 1973)[8], where operations over implicit functions are considered. Affine transformations (translation, rotation and scaling) are also used to increase an expressive power of the proposed geometric model. Geometric operations include set-theoretic union, intersection, difference, and orthographic projection and are fully described in work [1, 3-4].
For query implementation, we apply point/solid predicate introduced in work [4]. Let P be a point in Euclidean space En and t is time, G1 be a query solid described with implicit function f1 defined with time-dependent parameters and location changing over time, bG1 be a boundary of G1 and iG1 be an interior of G1. Then a point/solid predicate is described with the implicit function representation of the geometric object G1 by a 3-valued predicate:
After studying the problems of molecular dynamics, the geometric model for visual mining and querying of spatio-temporal data we proposed three types of queries that can be easily implemented with MDVQS
4. Design and Implementation of MDVQS
Based on the spatio-temporal query model, we developed visual query system MDVQS for visual mining and querying results of numerical simulation in molecular dynamics. We proposed an implemental design for developing the MDVQS. The system should be able to do the following:
visualize time-dependent molecular data presented in file format �xmol�,
add adjustable query objects,
perform visual queries,
visualize the query results,
output the query results in the format compatible to gOpenMol system for further analysis,
be easily integrated with other molecular visualization and analysis programs.
Fig. 3. Proposed design for MDVQS coupling with gOpenMol
In Figure 3, the proposed system design is shown. The molecular system data can be imported from external files in �xmol� format. The results of spatio-temporal querying can be visualized and analyzed in MDVQS system or exported to gOpenMol system for further analysis. MDVQS was implemented with VTK [5]. A graphical user interface was implemented using Win32 API. It consists of two separated windows for displaying the molecular system and querying objects, and operations control. The �control window� contains all the necessary operations. It also shows the operation processing status. The �displaying window� displays the input data structures, query object, and operation results. The user can load the molecular data to the system by choosing �file� option in menu as shown in Figure 4(a). The MDVQS automatically extracts the basic information about molecular system such as atomic radii, potential bonds, etc. and performs the molecular structure modeling. The molecular system is displayed as 3D objects in the �displaying window�. We can choose any geometric query shapes for querying (including cone, cylinder, cuboid, tube, and ellipsoid) from the menu option �shapes� as shown in Figure 4(b). The querying shapes are always visualized as transparent. We can also perform operations using a mouse including zoom, translation, and rotation of the displaying objects. Querying of the displaying molecular system by using the posed querying shapes could be done through the menu option �operation� as well. The query results are visualized and can be exported to the �xmol� format file (Figure 5) for further analysis by other molecular visualization and analysis software. There are different molecular system representations types, e.g. stick, ball-stick, CPK, licorice [6], each of these representations is designed to show a particular aspect of a molecular system structure. In our system, the ball-stick representation was implemented to provide a comprehensive view of a molecular system structure. In a ball-stick view mode, an atom is represented by the colorful ball with a specific radius as shown in Figure 6 (a). Different atoms in the molecular system are represented by different color balls whose radius is associated with the radius of its covalent radius. The bond is represented by a small radius cylinder that connects two atoms. All bonds have the same color and cylinder radius in our MDVQS.
(a) (b)
Fig. 4. The GUI: (a) Menu of data loading, exporting of query result, viewing, and system reset. (b) Menu of querying objects
Fig. 5. Input and output �xmol� files for Query Type 1
(a) (b)
Fig. 6. Query Type 1 : (a) Before the query, (b) Query result
(a) (b)
(c) (d)
Fig. 7. Geometric querying with cuboid: (a) Add the query shape; (b) The query result; (c) View of query result at time point t1; (d) View of query result at time point t2
Now, let us introduce the basic functions of MDVQS and how it works to perform the queries.
Query Type 1. Find and display trajectories of atoms by atom name or by its exact location (x, y, z).
The user is able to pick specific atoms that he/she is interested in, by their names or by location, and only the selected atoms are displayed as changing its location over time. The number of atoms that could be chosen would be limited due to the large amount of data that the system may go through. Our tests were done with 500 files of 40-100 MB each.
Firstly, the MDVQS displays the whole molecular system from the file in �displaying window� as it is shown in Figure 6(a). Then, the program searches through the whole list of atoms in the file starting from the first atom. For example, if the atom C matches the atom name given by the user, the program saves in the output �xmol� file its time, atom name, and coordinates as shown in Figure 5. As the result of the query, only the matched atoms are displayed on the �displaying window� as shown in Figure 6(b).
Query Type 2. Find and display snapshots of atoms over time in the selected region where the region is the final query solid constructed as a result of operations over primitive solids.
The region can be of various shapes such as cuboid, ellipsoid, cylinder, tube, sphere, and cone. The result of union, intersection, and/or subtraction operation over the primitive solids can be a query solid as well. The user loads the molecular data file. Then, the user selects a query shape from the menu of �control window�, for example, cuboid as shown in Figure 7(a). The query object is visualized on the screen as well. The user can translate, shift, and zoom in or out both the query object and the molecular system separately. The system computes implicit function parameters of the query object. After selecting intersection operation, the system applies spatio-temporal predicate for each atom coordinate of the molecular system and forms an output file of the query result. The query result is visualized as shown in Figure 7(b). The user can go through all time frames of the query result and adjust the view of the query result with operation of translation, shift, and zooming as shown in Figure 7(c-d).
In Figure 8 (a-b) and 9 (a-b), the results of visual querying with cone, cylinder, ellipsoid, and tube are shown correspondently.
(a) (b)
Fig. 8. Visual querying with (a) cone and (b) cylinder
(a) (b)
Fig. 9: Visual querying with (a) ellipsoid and (b) tube
Fig. 10. An example of Query Type 3 with time interval setting from 2~6
Query Type 3. Find trajectories of atoms for a specified time interval [t1, t2].
From the control window, the user selects the time query type and keys in the time interval as shown in Figure 10. Only those molecular data with the starting time t1 and ending time t2 are visualized as the result of the query. The querying results are always saved in �xmol� format, and could be visualized and analyzed with gOpenMol system as well.
5. Conclusions and Future Work
In this paper, we described visual query system MDVQS for visual data analysis in molecular dynamics. The proposed system allows the user to visualize results of molecular dynamics simulation and to pose spatio-temporal queries on time dependant data visually, and to visualize query results and the process of querying as well. The MDVQS was implemented with the advanced computer graphics tools on Microsoft Windows platform. We integrated visual mining with querying of time-dependent data in one GUI. We introduced three basic types of queries for visual data analysis in molecular dynamics. In future, we are going to implement queries with arbitrary shapes changing over time as well. The developed system allows the user to visualize the results of mining and querying, visualize the querying process, and analyze spatial relationship inside the molecular data changing over time. With MDVQS, the user can come up with new biological hypotheses and test their validity.
[1] O. Sourina and S. H. Boey, Geometric Query Types for Data Retrieval in Relational Databases, Data & Knowledge Engineering, Elsevier Science B.V., Vol. 27(2), pp.207�229, 1998.
[2] J. Bloomenthal, An Introduction to Implicit Surfaces, Morgan-Kaufmann, San Francisco 1997.
[3] O. Sourina, N. Korolev, Geometric querying of time-dependent data for data mining in molecular dynamics. In Proc. of Cyberworlds 2004, Tokyo, Nov., pp.351-355, 2004.
[4] O. Sourina, N. Korolev, Visual Mining and Spatio-Temporal Querying in Molecular Dynamics, Special issue on Computational Intelligence for Molecular Biology and Bioinformatics of the Journal of Computational and Theoretical Nanoscience, American Scientific Publishers, Vol. 2(4), 2005.
[5] W. Schroeder, K. Martin, and B. Loresen, The Visualization Toolkit An Object-Oriented Approach To 3D Graphics 3rd Edition, 2000.
[6] D. L. Bergman, A. Laaksonen and L. Laaksonen, Visualization of Solvation Structures in Liquid Mixtures, J.Mol.Graph. Model., Vol. 15, pp.301-306, 1997.
[7] W. Humphrey, A. Dalke and K. Schulten, VMD - Visual Molecular Dynamics, J.Mol.Graph., Vol. 14, pp.33-38, 1996
[8] A. Ricci, A Constructive Geometry for Computer Graphics, The Computer Journal, Vol. 16(2), pp.157-160, 1973.
Load file for print and local use.
Computer Graphics & Geometry