American - Russian teleworking on solar-terrestrial data representation: SPIDR 2 project

Computer Graphics & Geometry

Interactive intelligent space physics data mining and visualization via Internet

M. Zhizhin, A. Burtsev, A. Gvishiani
University of Kaiserslautern, Department of Computer Science

Center of Geophysical Data Studies,
Institute of Physics of the Earth RAS,
Moscow, Russia

E. Kihn, H. Kroel
University of Kaiserslautern, Department of Computer Science

National Geophysical Data Center,
Boulder CO, USA

Contents

Abstracts
1. Project components, structure and implementation
2. Data contents
3. System architecture
4. Data visualization in SPIDR II
5. Fuzzy search engine for STP data mining
6. Data dissemination and mirroring
References

Abstract

The Space Physics Interactive Data Resource (SPIDR) is a distributed network of synchronous databases and application servers designed to allow a modeling and prediction customer to intelligently access and manage historical space physics data for integration with virtual environment models and real-time space weather forecasts. Eliminating the network bottlenecks associated with transcontinental links, the distributed system architecture is a key factor for low latency in multimedia data visualization and fast data delivery.

The SPIDR is a set of 100% Java platform independent middle-ware servers accessed via World Wide Web. Each server resides at a parallel computer cluster and provides fuzzy logic based searching on a relational database of space weather parameters. The system is designed to allow the user to specify desired spatial, temporal, and parameter conditions in fuzzy linguistic and/or numeric terms and to receive a ranked list of events best matching the desired conditions in the historical archive. Once discovered, the client can request dynamical temporal and spatial visualization using a set of communicating Java applets, browse the archive of Sun and Earth satellite images, and request delivery of the data formatted for inclusion in model runs. Each SPIDR server has a database management interface, which allows data updates performed either by a local user or by another SPIDR server from the Net. The servers communicate to each other for scheduled mirroring of the data and software.

Keywords: space physics, intelligent data mining, Internet scientific data visualization, fuzzy logic

1. Project components, structure and implementation

The main task of the Internet-based international project Solar Physics Interactive Data Resource (SPIDR) is to develop a distributed network of database and application servers, which implement interactive data mining technologies, dynamic visualization, metadata description, physical modeling, and rapid delivery of selected data [1,2].

Figure 1. A screenshot from a SPIDR-II title page at locations http://spidr.ngdc.noaa.gov, http://plato.wdcb.ru/spidr, http://spidr.ips.gov.au

SPIDR project has two key development centers: one at National Geophysics Data Center in Boulder, CO, USA; another in Center of Geophysical Data Studies in Moscow, Russia; international bodies which take part in data exchange are World Data Centers for Solar-Terrestrial Physics. These organizations also are hosting SPIDR-II network nodes, which are accessible at the following URLs: http://wsg7.ngdc.noaa.gov/spidr and http://zeus.wdcb.ru/spidr. Ionosphere Prediction Service (IPS) in Sidney, Australia has applied in July 1999 for SPIDR-II node status and installed SPIDR server and databases software at http://spidr.ips.gov.au/. Homepage of a SPIDR-II node is shown in Fig.1.

SPIDR-II has the following important features:

Open source: 100% Java servlets and applets, HTML and JavaScript forms;
Portability: Unix (Linux, Solaris) and Windows 9x/NT;
Scalability: full-featured version available for a notebook;
Unified data model: similar access to various пїЅpluggableпїЅ relational databases;
Dynamic time-series and imagery data visualization;
AI-based data mining and forecast;
Web-based data-basket interface initiated by e-commerce concept;
Automatic database and software synchronization.

For each network node Relational Database Management System (RDBMS) is interfaced with server-side and client-side applications (пїЅservletsпїЅ and пїЅappletsпїЅ respectively). SPIDR II software is written in Java, which gives platform independence. Fig. 2 shows the principal scheme of interoperation between elements of SPIDR-II. First, after a client with the help of his web browser connects to the SPIDR-II web-server, several Java applets are transferred to the client machine and run there. They help almost all following operations and queries to take place. Further on, when a user wants to make a query to a database, he selects all necessary parameters and the query form is being sent to a Java servlet, running on a server machine under a Java engine. In its turn the servlet executes a JDBC query on a MySQL database and after receiving a result sends it back to a client machine. The user then can further manipulate with the received data with the help of Java applets running on his machine.

Figure 2. SPIDR-II client-side and server-side applications.

In Fig. 3 a more detailed scheme of available activities for users and administrator is presented. It shows three major activities of a client connection and a means by which they can be done. For users this is limited to a web browser, for the administrator, though, most of the functions for the time of writing are done at a serverпїЅs console. Three major activities available for users include plotting of data, downloading of data and data modeling. For administrative purposes there are modules for addition and removal of data, for providing control over userпїЅs activities and a module for controlling data quality.

Figure 3. SPIDR-II user and administrator functions

SPIDR-II software modules (both server-side and client-side) are tested under Linux RedHat 6.0 and Windows 95/98/NT4.0. For the optimum performance the minimum hardware requirements are: Pentium 100MHz, 24 Mb RAM, 3Gb HDD, TCP/IP network protocol family with connection to the Internet. Basic hardware configuration used throughout the network of SPIDR nodes is: dual Pentium-II 300MHz, 256 Mb RAM, 30 Gb soft-RAID disk array, 100Mbps FastEthernet.

The so-called пїЅpluggabilityпїЅ of the SPIDR databases refers to the fact that additional database sets can be added to the existing complex of data already available through a unique interface. To add a database to SPIDR-II one has to write access method creating SPIDR data model objects (one day of observations per variable per station); then prepare HTML data request form and create visualization and delivery servlets aware of the new data type (units, log/linear scales, axis labels, etc.)

2. Data contents

As of the moment of writing SPIDR-II environment had the following data sets already plugged in:

SSN пїЅ Sun Spot Numbers, from 1817
Geomagnetic and solar indices пїЅ Kp, Ap, Cp, C9, DST, from 1932
IMF пїЅ Interplanetary Magnetic Field, from 1973
GOES пїЅ Geosynchronous Operational Environmental Satellites, from 1986
GEOM пїЅ Geomagnetic field variations
Ionospheric stations observation data пїЅ under test, 1997, 1998
X-rays events, under test
Solar daily images in the following bands: X-rays, alpha-particles, radio band, magnetograms пїЅ from 1992

Data sets	Min sampling	Date interval (year,month)
GOES data
GOES-5	1 min, 5 min	1986,1-1987,4
GOES-6	1 min, 5 min	1986,1-1994,11
GOES-7	1 min, 5 min	1987,3-1996,8
GOES-8	1 min, 5 min	1995,3-1998,9
GOES-9	1 min, 5 min	1996,4-1998,8
GOES-10	1 min, 5 min	1998,7-1998,9
IMF	60 min	1973,1-1998,12
DST	60 min	1957,1-1997,12
KPAP	180 min	1932,1-1999,1
FLUX	daily	1947,2-1999,1
SSN	daily	1610,1-1999,5
IONO	60 min	1989,1-12 1997,1-12
GEOM	1 min, 60 min	1997,1-12
event_xrays		1975,9-1999,7

Table2. SPIDR-II data contents summary

SPIDR-II database combines several thematically linked sets (relations) sets for different branches of solar-terrestrial physics. It also contains a few supplementary tables for system administration purposes.

3. System architecture

The architecture of inner elements interoperation of SPIDR-II consists of several major blocks and links between them. Key terms for the understanding of SPIDR-II architecture are login module, data basket, fuzzy search engine, plotting module, data retrieval and some supplementary modules.

It is important to introduce the term пїЅdata basketпїЅ. For illustration consider Fig.4. This module is a special database, associated with particular username, which contains the set of data selected for alanysis and which can be plotted, printed out or downloaded. The information in this object is saved from session to session for each registered user, so that whenever a user logs in next time he already has his data set selection constraints ready from the previous session.

Figure 4. SPIDR-II data access structure and modules

4. Data visualization in SPIDR II

Graphical-texture representation of unified data model in SPIDR II is mainly based on embedded applets technology. After the web-server receives a request to plot data (coming from a thematic data request form or a data basket form page on a client), the specialized SPIDR II servlet connects to the STP database (Fig.5), retrieves the data needed for plot, stores it in a new STPDataSet object implementing the unified data model, serializes the object into a FTP accessible directory, and sends back to the client a dynamically built HTML-formatted page with embedded links to the plotting applets (possibly several applet-plots on one page) together with formatting parameters and URLs to the serialized datasets to be plotted (Fig.6).

Figure 5. SPIDR II data visualizatio scenario

If there is more than one applet on a page, the applets start to negotiate a synchronous time scale for all the plots, and in case a user wants to zoom into an interesting space weather event time period on one of the applet-plots, all the remaining plots on this page are automatically rescaled to the same time window.

Figure 6. Example of the applet window plotting GOES magnetic field data

Another visualization task, to display geomagnetic and ionospheric station maps, in SPIDR II is also performed by a configurable Java applet. It receives from the server a list with coordinates of stations, stations names, URL to a background map (possibly several layers), hidden parameters for the following data request, and map formatting options. After a user clicks by mouse into vicinity of a particular station, the applet sends to SPIDR II server a data request with the parameters hidden on the map page followed by the selected station name and/or coordinates (Fig.7).

Figure 7. Mapping applet used to select an ionospheric station for data visualization request

Visualization of solar images in SPIDR II may be performed either by periodically refreshing static HTML pages (HTTP push technique), or by a specialized animated applet. Pushing HTTP servlet is used for the standard access to the solar imagery database, when the image sequence length is not known in advance (Fig.8). For fast and smooth visualization of short-term space weather events found by fuzzy search engine, the animated applet is used.

Figure 8. Visualization of solar images using push HTTP technique

5. Fuzzy search engine for STP data mining

Intelligent parallel data mining of the STP database thematic table sets is performed within the SPIDR II fuzzy search engine (Fig.9) [3]. The search conditions may be specified in a number of ways depending on the userпїЅs familiarity with the themes and the space weather event of interest. An expert user can specify exact thresholds and/or limitations that must be maintained on certain parameters. Conditions can also be specified via abstract natural language definitions for each parameter.

Figure 9. Fuzzy engine flow chart

Fuzzy Set theory is an extension of classical set theory developed by L. Zadeh and others over the last twenty-five years. Unlike the crisp unambiguous boundaries found in classical set theory, fuzzy sets have a transitional boundary describing the set (Fig.10a-b). That is, the transition from being in the set to not being in the set is gradual, with a smooth transitional function called a membership function. Fuzzy sets "play an important role in human thinking, particularly in the domains of pattern recognition, communication of information, and abstraction" [4].


a) Crisp segment [5,8]	b) Fuzzy segment [5,8]

Figure 10. Examples of the crisp a) and fuzzy b) sets.

In addition to fuzzy set theory, a system of "fuzzy logic" has been developed which mirrors that of classical set logic. Operation such as "and", "or", "not" can be combined to determine logical membership in a composed set, or in the case of fuzzy logical operations a degree of membership in a set. Finally, these rules can be extended by a set of fuzzy "if-then" rules, or fuzzy conditional statements which provide maps and relations between sets such as: if x is A then y is B. Thus armed with fuzzy logic tools it is possible to extend traditional computational and mathematical algorithms to more closely mirror the human thought process.

Five one-dimensional fuzzy membership functions (MF) from the linguistic term set {very small, small, medium, large, very large} are plotted in Fig. 11.

Figure 11. SPIDR II linguistic fuzzy sterm set

On the next graph (Fig. 12) we present four one-dimensional MFs from the numeric fuzzy term set {less than, about, in the range, greater than}

Figure 12. SPIDR II numeric fuzzy term set

One-dimensional MFs are formed for each variable participating in the multidimensional search using the generalized bell function

Here, x stands for the range normalized scalar data variable, c stands for center of the symmetrical "bell", a for its half-width, and b/2a controls its slope. Parameters a,b,c are governed by the linguistic form of the query and by the results of the statistical analysis of the variable in the requested time interval.

For example, geomagnetic field disturbance index Kp values can be specified as пїЅstormпїЅ, пїЅdisturbedпїЅ, or пїЅquietпїЅ. Thus, a user may specify the following fuzzy search request for geomagnetic quiet day:

Fuzzy constrains: (VERY LOW "Kp index") AND ("DST index" ABOUT 0)

Time constrains: FROM (1/1/95) TO (12/31/97) DURING (24 hours).

The form of query actually looks like a table with parameters on the left and relative values of these parameters along the X-axis (Fig. 13).

Figure 13. Fuzzy search query form

The result of such a request reported by the fuzzy search module is always a list of the "most likely" space weather event dates. The list is sorted by the match quality (Fig.14), i.e. values of the aggregated multidimensional fuzzy membership function. In the current version of fuzzy engine, multidimensional database fuzzy search patterns are specified as logical AND aggregations of one-dimensional MFs.

Fuzzy AND aggregation of the one-dimensional MFs is conducted by Yager's T-norm operator (we use q=5):

The resulting multi-dimensional MF is more smooth than simple minimum of the aggregating MFs, which is the limit case of YagerпїЅs T-norm for q=1 [5].

Optional temporal moving average smoothing (from 5 minutes to 1 day) of all variables can be performed prior to the aggregated MF calculation.

Figure 14. Ranked list of space weather events reported by

SPIDR II fuzzy search engine

Action buttons in the last two columns of the fuzzy search report table call SPIDR II time series (Fig. 15) and solar imagery (Fig. 16) visualization servlets for the corresponding event dates and parameters listed in the fuzzy request.

Figure 15. Time series plots for Kp and DST indices history

of the first пїЅgeomagnetic quiet dayпїЅ event listed in fuzzy search report

Figure 16. Sun X-rays and H-Alpha images for the first

пїЅgeomagnetic quiet dayпїЅ event listed in fuzzy search report

The ability to define abstract space weather events will also be supported using simple rules-based logic built into the user data basket. As mentioned above, a пїЅgeomagnetic quiet dayпїЅ can be defined by setting the Kp index vailue to пїЅlowпїЅ, the DST index to пїЅabout 0пїЅ. These tools give the user a friendly and powerful means to discover space weather conditions that meat their particular needs.

6. Data dissemination and mirroring

All nodes of the network are equal partners in data dissemination. Each node exercises full control of both loading and exporting of local data and decide which data sets are loaded locally and which nodes receive data. Automatic transfer from local to remote node is available if the data is not mirrored locally.

The mirroring technology is vital to the goal of synchronization of the data sets at all SPIDR locations. Dealing with low bandwidth and transient network connections the developers had to create a robust and simple mechanism of replication and synchronization of data at remotely located nodes. Two strategies are available for this purpose: passive and active. Passive strategy is implemented through a mechanism, when local data loading routine sends e-mail with the compressed data and loading instructions to the subscribed participants. Active strategy implies scheduled web-grabbers that search for new data (imagery) and download it to its location.

References

Kihn, E., et al. Solar Physics Interactive Data Resource: SPIDR II Project. American-Russian teleworking on solar-terrestrial data representation. Activity report for 1999. Moscow, GEOS, 2000, ISBN 5-89118-133-9, 34 pp.
Kihn, E. and M. Zhizhin. Improved Kp Forecast Using Neuro-Fuzzy and Granular Computing Techniques. Invited paper at AGU Fall Meeting, San Francisco, California, December 13-17, 1999, URL: http://www.agu.org/meetings/fm99top.html.
Kihn, E., M. Zhizhin, S. Lowe, A. Troussov, R. Englebretson, and R. Siquig. The weather scenario generator. Proceedings of the 1999 International Conference on Web-Based Modeling and Simulation, San-Francisco, California, January 17-20, 1999, P. 233-238.
Zadeh, L. A. Fuzzy sets. Information and Control, vol. 8, 1965, pp. 338-353
Jang, J.-S. R., Sun, C.-T. and Mizutani, E. Neuro-fuzzy and soft computing : a computational approach to learning and machine intelligence. Prentice Hall, 1997, ISBN 0132610663

Computer Graphics & Geometry