Mosaic, HDF and EOSDIS: Providing Access to Earth Sciences Data

Theodore Meyer, Information Architect, Earth Science Data and Information System Project, NASA GSFC

Ramachandran Suresh, Principal Scientist, Hughes STX

Douglas Ilg, Senior Programmer/Analyst, Hughes STX

Bruce Moxon, Manager of Advanced Technology Projects, Science Data Processing Segment, HAIS

Abstract

The National Aeronautics and Space Administration is developing the Earth Observing System Data and Information System (EOSDIS) to receive, process, and distribute Earth and environmental science data from the EOS series of sensors to be launched beginning in 1997 as a part of the Mission to Planet Earth program. This paper describes how Mosaic, HDF, and NCSA's Collage can support access and sharing of Earth science data by users of this system. Additionally, the paper presents the relationship of WWW and EOSDIS development.


"The U.S. Global Change Research Program requires massive quantities of highly diverse data and information to improve our understanding of global change processes." - The U.S. Global Change Data and Information Management Program Plan, 1992


Introduction

Mission to Planet Earth (MTPE) is a program to support the collection and distribution of reliable Earth and environmental science data to study global change. Beginning in 1997, a series of Earth science remote sensing satellites, known as the Earth Observing System (EOS), will be launched as NASA's centerpiece in support of the MTPE. The EOS Data and Information System (EOSDIS) will process the large amount of data generated by the sensors from these satellites. EOSDIS is a distributed system currently being developed by NASA, which will receive, compile, process, and distribute the massive amount of data collected by the EOS instruments and related Earth science programs. EOSDIS will be one of the largest data systems in the world and will pose technological challenges in the areas of data processing, access, and distribution. This system will need to accommodate a very large and diverse user community. Software tools like Mosaic and WWW servers can play a major role in providing additional access to information and allowing researchers to share science data during the EOS era.

Background

An operational prototype of EOSDIS called Version 0 (V0) was built using existing Earth science data at 9 Distributed Active Archive Centers (DAACs). This system, built by NASA and the DAACs, is currently operational and is accessible to users. Some components of EOSDIS V0 system use Mosaic to provide access to information describing data sets. The section in this paper titled "EOS Home Pages" provides sources of information to more completely describe MTPE, EOS, and related projects. Additionally, a variety of documents completely describes the requirements and architecture of EOSDIS, the location and host institutions for the DAACs, and methods for accessing the current V0.

The Hierarchical Data Format (HDF) is a data format developed at NCSA to support the transport of scientific data between computing platforms. HDF was adopted as the standard data format for data distribution for the V0 system in 1991. Since then, a large number of Earth science data sets representing different data structures have been implemented in HDF and distributed to the users. Initially these data sets were distributed on CD-ROMs, tapes and on-line with various software tools.

Recently, an interface with HDF and Mosaic was developed with links to NCSA's Collage application. As a result, many of these data sets are available to a larger number of users on-line. The V0 Information Management System (IMS) Guide subsystem also uses the Mosaic library to provide links to the DAACs.

Mosaic can be used to access HDF data and determine the contents of HDF files. Through Mosaic and Collage most HDF data sets can be displayed and analyzed. A combination of HDF, Mosaic, and Collage provides a powerful toolkit for EOSDIS users to access and share Earth science data. Since both Mosaic and Collage work in a multi-platform environment (Workstations, Macs, and PCs), a large number of users will benefit. A detailed discussion of Mosaic, HDF and links to Collage is given in a separate section.

Based on the experience of using HDF for EOSDIS V0 and many other projects, HDF was adopted for use in later EOSDIS releases. It was recognized that HDF would need to be extended to fully meet the requirements of complete system. The full release of EOSDIS will inherit some components of the V0 system. Mosaic and WWW server are already being used to distribute project and system related information to the scientists and general users.

Figure 1: Mosaic and NCSA Collage Access to HDF Data

Mosaic Compatibility with HDF and Collage

Beginning with versions 2.0, NCSA X Mosaic has incorporated some rudimentary, but very useful, HDF capabilities. Mosaic recognizes files with the extension ".hdf" as HDF files and displays them using the "Scientific Data Brows-o-rama" viewer which is currently built into the X Windows version of Mosaic. Currently, Mosaic supports HDF version 3.3 release 3 and all earlier versions.

The "Scientific Data Brows-o-rama" converts the structure of the HDF file to HTML on the fly and presents a logically formatted and easily readable representation of the file in the document view window. Brows-o-rama displays all file level and object level binary attributes and text annotations as well as producing in-line displays of raster images and palettes. Several samples of HDF files can be viewed from the Mosaic Demo Document available from Mosaic's "Help" menu.

In addition to its native capabilities for display of certain types of HDF data, Mosaic can also be linked to NCSA's Collage (version 1.3 or later is recommended), a collaborative data display and analysis application, through the Data Transfer Mechanism (DTM). In this way, Mosaic can be used as a data discovery and retrieval front-end to a powerful data visualization package. From the EOSDIS point of view, the most important benefit of this linkage is the ability to display and manipulate large arrays of science data stored in HDF Scientific Data Sets (SDS). The only major HDF data object that is not supported is the Vdata. It is also hoped that more complete support of SDS features can be incorporated into Collage.

Using a combination of Mosaic and Collage, a scientist can locate and retrieve an HDF file, browse through its contents and hierarchical structure, read the descriptive text fields and binary attributes in the file, and view and analyze raster images, palettes and scientific data sets. Currently, Collage does not fully support all the features of the SDS and neither application directly supports Vdatas. It is hoped that these features can be incorporated into future versions of Collage or Mosaic. Figure 1 shows an example of Mosaic and Collage accessing an HDF File.


Connecting Mosaic and Collage

To use Mosaic as a data discovery and retrieval front-end for Collage:

1) Start up Mosaic and Collage.

2) Initiate a collaborative session in Collage by choosing "Begin Session" from the "Collaborate" menu in the Collage window. Collage will prompt you for a port number to use for the session. This number can be any unused port number from 1025 to 65535. Collage will then pop up a text box notifying you that the collaborative session has been established.

3) Connect Mosaic into Collage's collaborative session by choosing the "Open DTM Outport" from Mosaic's "File" menu and supplying the same port number as was used to start up the Collage session. Collage should now show "Mosaic" as a user in the session.

4) Open an HDF file using any standard Mosaic method. You should see hyperlinks in parentheses following the text describing each raster image, palette, and SDS in the file. The hyperlinks look like this: (To broadcast this data set over DTM, click here.)

5) Now simply click on a hyperlink to view the data using Collage.


EOS Home Pages

Currently Mosaic has been used widely in the EOS project by Earth science user community, DAACs and many related organizations to disseminate information on software, data, and EOS project related news. This is evident by many home pages related to EOS. Table 1 gives a list of current EOS home pages.

Table 1: List of Some EOS Related Servers Accessible Through the Earth Observing System Mosaic Information Server

Server/Document      Description                                                        
     Name

EOS Project          This service allows you to discover, retrieve and display          
Science Office       documents and data about the EOS from all over the internet.       
                     Issues of the newsletter, The Earth Observer, the Payload Panel    
                     Report and the EOS Reference Handbook are currently available.     

EOSDIS V0 IMS        This service allows a user to link to other DAACs and ADCs and     
                     through which provides access to data and information at each      
                     DAAC.                                                              

ECS Data Handling    This is maintained by the Hughes EOSDIS Core System team which     
System               provides meeting notes, white papers, and a variety of documents   
                     related to the EOSDIS V1 system.                           

Oak Ridge National   Provides data on carbon dioxide research, biogeochemical           
Laboratory           dynamics, and cycle.     

MTPE - Mission to    This is an experimental service provided by Mission to Planet      
Planet Earth         Earth staff at NASA Headquarters in Washington, DC. The            
                     information provided by this web include MTPE and NASA resources   
                     along with Internet tools and resources useful to MTPE personnel.  

Volcanology Team,    This provides information about the project activities of the      
EOS                  NASA EOS Interdisciplinary Science (IDS) Investigation             
Interdisciplinary    Volcanology Team. This will provide information on Global          
Science              Assessment of Active Volcanism, Volcanic Hazards, and Volcanic     
                     Inputs to the Atmosphere from the Earth Observing System.          

Alaska SAR Facility  Provides data and information to support research activities to    
                     improve the knowledge and understanding of the effects of global   
                     change/warming on the polar regions and the role of the polar      
                     regions in global change/warming. Particularly it provides a       
                     variety of SAR data products.                                      

EROS Data Center     The Land Processes Distributed Active Archive Center (LP-DAAC)     
                     at the EDC provides data and information on Land related data      
                     sets such as Landsat MSS. TM, AVHRR and aircraft SAR data.         

Goddard Space        The Goddard (DAAC) provides data and information related to        
Flight Center        upper atmosphere, global biosphere, atmospheric dynamics and       
                     geophysics research.                                               

Jet Propulsion       The Physical Oceanography Distributed Active Archive Center        
Laboratory           (PO.DAAC) archives and distributes data relevant to the physical   
                     state of the oceans, in support of oceanographic and geophysical   
                     sciences research.                                                 

Langley Research     The Langley DAAC provides access and information in the areas of   
Center               radiation budget, clouds, aerosols, and tropospheric chemistry.    

Marshall Space       The George C. Marshall Space Flight Center (MSFC) DAAC provides    
Flight Center        data and information related to hydrological cycle.                

National Snow and    NSIDC provides data and information on snow and ice, their         
Ice Data Center      properties, characteristics and contexts, and of their             
                     significance for human activity.                                   

NOAA SAA             The Satellite Active Archive provides access to data and           
                     information from NOAA satellites.                                  

Global Change        The Global Change Master Directory is a comprehensive source of    
Master Directory     information about Earth science, environmental, climate, and       
                     global change data holdings available to the scientific            
                     community throughout the world.                                    

EOS Project Office   Provides EOS project related information and links to EOSDIS V0    
                     IMS and DAACs             

SeaWiFS              This page provides access into the background, status and          
                     documentation for NASA's upcoming global ocean color monitoring    
                     mission called SeaWiFS. 

TISDIS               This server provides information for the Tropical Rainfall
                     Measuring Mission (TRMM) Science Team Members (TSTMs) interested
                     in the current status of the TRMM Science Data and Information
                     System (TSDIS).    

The V0 IMS has a home page, which provides connections to DAACs and ADCs (Affiliated Data Centers). In addition to many EOS home pages, there are other Earth science related home pages from different EOS-related projects such as SeaWiFS, EOS Pathfinders and institutions like NOAA, USGS, and NCAR. These home pages provide a variety of information about Earth science data systems. The Earth Observing System Mosaic Information Server is the starting point for EOS Earth science related information. This home page provides links to many other servers of interest.

EOSDIS V0 IMS

EOSDIS V0 IMS is currently operational and is a result of a collaborative effort between representatives of the IMS team at GSFC, the DAACs, and ADCs. The IMS interface is available through both graphical and character user interfaces. The graphical interface to the IMS guide is a Mosaic based application for providing access to information, whereas the character interface is based on the CERN Line Browser. The IMS home page itself provides some links to other home pages that describe data holdings, platforms, sensors, data sets, and the data centers in EOSDIS V0. Figure 2 presents an example of a the V0 guide function. Access to the IMS interface through WWW has been successfully tested and will soon be available from the EOSDIS V0 IMS home page.

Figure 2: Example of EOSDIS Version 0 Guide Page

EOSDIS Development and WWW

In this section, we first present a high level discussion of key EOSDIS architectural features. We then explore the suitability of both WWW-based and object-oriented approaches, and identify opportunities for additional WWW-based research.

The design and development of future releases of EOSDIS are under way. These releases will focus on the development of a robust, scalable information system whose primary mission is to support scientists and policy makers in understanding the extent, causes, and regional consequences of global climate change. The system functionality is centered around the needs of research teams to effectively and efficiently find, access, use, and share the results of their research, which include data, computer models, modeling results, and observations and analyses.

This community is, by nature, a distributed one, with widely varying scientific disciplines and approaches, and a diverse set of analysis tools. Yet this community has a number of common needs in accessing and managing a complex data repository. These include:

* Direct access to complex data objects and services

This includes a wide variety of sampling, subsetting, formatting, and model-based product generation algorithms, provided both by the DAACs, and by an extended provider network of scientists. Because of the potentially complex and dynamic distributed environment, these objects and services need to be provided in a location-transparent manner.

* Incremental, distributed search capabilities

The data holdings within the community tend to be discipline-specific at any one site, both in terms of their scope, and their content (e.g. spatial resolution). Interdisciplinary collaboration requires complex distributed search approaches to provide spatially and temporally collocated data products with adequate content, and at appropriate resolutions for interdisciplinary modeling. These searches are often carried out interactively in an incremental approach, requiring sessions to hold interim results. Attribute-based searches for services that meet a user's needs are also required.

* Flexible data management system

Scientists are continually augmenting, categorizing, and re-categorizing data collections. These collections and their services must be made available to the community at large (published and advertised), and must include detailed information about the manner in which they were produced (data lineage). The research nature of this work virtually assures that the attributes of these products will change over time. A flexible data management system that can dynamically adapt to changes in data product attributes (metadata) and can incorporate computational methods to instantiate attributes or generate new products is required.

* Interactive, collaborative environment

The development of many of these products will be accomplished through interdisciplinary collaboration among geographically distributed colleagues. An environment for easily sharing, exchanging, and collaboratively developing new data collections is required.

* Event-driven processing and distribution scenarios

Both research scenarios and quality control (QC) activities require event-driven data distribution and subsequent processing.

* Equitable resource distribution

In order to provide an equitable balance of resources across funded research teams, the system incorporates a resource-based accounting service. It employs dynamic service "pricing" as a means to naturally manage user demand.

Secondary objectives for EOSDIS focus on more widespread availability of EOS data to support additional environmental and earth science research, education at all grade levels, and possible commercial applications. Requirements for this portion of the user community are perhaps less aggressive than those described above, aiming mainly at distribution of EOSDIS data holdings according to a more traditional "publishing" metaphor.

WWW-based Data Access in EOSDIS

These and other related requirements have led us to the design of a distributed objects- and services-based architecture for EOSDIS. The requirements include fairly rigorous data management and transaction semantics consistent with developments in distributed object technology (e.g., the Object Management Group's CORBA) and object-relational or object-oriented database management systems (ORDBMS or OODBMS).

Some of the required user functionality described above is addressed well by current WWW implementations and planned extensions. Other functionality may require a set of "interoperable" WWW- and object-based solutions, each of which provides it's own benefits to a selected set of users. Some of the more interesting areas of interoperability under investigation include:

* Persistent, location independent object naming

Current implementations of URIs (e.g. the HTTP URL) seem to map directly to physical networking hierarchies (ip.address:port/path). In order to preserve location transparency, we will need a more persistent, location independent implementation of object naming. While this is supported by the more abstract URI specification, a mapping will need to be supported between EOSDIS and URL namespaces.
* Distributed objects and services
The object interoperability issue is further complicated by the rich object typing environment proposed for EOSDIS. This environment supports a distributed object paradigm, with some objects split across physical machine boundaries (e.g. aggregate collection objects will typically span multiple machines). This approach is critical for efficient implementation of potentially very large objects (a collection of 100 LANDSAT images comprising the result of a search, e.g. might span 30 GB of storage). It is important that the system be able to effectively access only relevant portions of data objects, and to apply appropriate methods in an effective manner. The methods or services appropriate for a specific object might also be distributed across a number of sites. DAACs will provide a common set of methods to users through Data Server interfaces. Additional methods will be developed by users, and may be bound to object types and invoked in a distributed computing environment.
This approach seems at odds with the current URI specification:
scheme:path?query#fragment
in which the fragment specifier is retained by the client for local focus (analogous to our subsetting operation). In EOSDIS, fragment specification and extraction may need to be performed at the server side for network performance considerations. There may be some other semantic mappings of the path?query#fragment components within the URI (in particular, serverpath?service, with #fragment unused) that are appropriate as an interoperability approach.
* Object versioning, replication, and migration
Because data products will change over time, we require support for versioning of logical objects (retaining lineage information), and automatic user notification of coherence issues (i.e. notification that there is a newer version of the product that they are working with). A coherent object replication scheme supports efficient access and management; transparent migration between levels of the storage hierarchy, and among multiple tertiary storage technologies is also required.
* Incremental and distributed search
Distributed, multi-site search appears to be implementable through appropriate CGI services in current Web implementations (through appropriate interpretation of the URI query). In this model, HTTP servers act as gateways into a more sophisticated data management system. However, the basic stateless model of HTTP seems to preclude more complex, incremental search as is typically found in the database arena. This is one area where we expect to see differences in the types of services delivered to end users through WWW- and EOSDIS-based interfaces.
* Security, product pricing and accounting, and resource management
This is an area that seems to be getting increased attention in the WWW community, as electronic commerce applications are explored. In addition to secure HTTP transactions, we are interested in extending core resource management functionality to a point where we can guarantee sufficient resource availability for the primary user community. These will likely need to be enforced at the HTTP / EOSDIS gateway boundary, and may have some implications for both HTTP (e.g. being able to carry encrypted access tokens), and for additional HTTPD functionality.
Figure 3: Integrated Approach to EOSDIS Development

While we recognize the tremendous rate of progress in the WWW community, we can not be sure of the design trades that will be made over time in this community. A hybrid approach is depicted in Figure 3. We currently believe a hybrid model of access and distribution within EOSDIS will be appropriate, with a variety of data and services being available to the community through both HTTP-based browsers, and EOSDIS interfaces and object browsers.

Additional Data Distribution Approaches

These considerations for resource management are important in a resource constrained environment with very focused scientific objectives. As mentioned previously, EOSDIS also hopes to serve a much larger community through more widespread data availability. A wide variety of users can be supported through value added providers in the EOSDIS architecture. These providers would likely employ Web technology for widespread data distribution. In this model, secondary providers will subscribe to EOSDIS data delivery services. These providers will then generate HTML-based descriptions of their data holdings, either periodically, or dynamically, in response to forms-based user specified selections.

A dynamic "fan out" approach can be used to provide feeds to a limited set of secondary providers, reducing the "pull" on the primary system resources while leveraging the delivery abilities of the secondary providers. Such approaches have been developed for distribution of weather data to the educational community [Unidata], and could be augmented to support broad distribution of commonly accessed data from EOSDIS.

Author Biographies

Theodore Meyer is the Information Architect for the Earth Science Data and Information System Project, Goddard Space Flight Center, NASA. In this role he is responsible for developing data standards and implementing a fully integrated data system. Mr. Meyer has represented NASA in international and national forums and technical meetings related to data formats, data processing, data access and management. Mr. Meyer has a M.S., Aerospace Engineering, University of Texas at Austin and B.S. degrees in Physical Sciences and Computer Science, University of Maryland.

Ramachandran Suresh is currently working as a Principal Scientist at Hughes STX. He has over 14 years of research experience in the areas of remote sensing for Earth science applications, geophysics, geology, advance technologies, mass storage, data modeling, and data format standards. He is working with the EOS project for the last seven years in various capacities. He is the architect behind implementation of standard data format for EOSDIS V0 system. At present he is involved in the development of ECS-HDF standard, data types, and data product design in HDF.

Doug Ilg is currently working as Senior Programmer/Analyst at Hughes STX. He is expecting a Masters degree in computer science from University of Illinois, in 1995. He has over eight years of experience in scientific programming in C and UNIX environment. He worked at the National Center for Supercomputing Applications and was involved in the development of HDF library. He has been working with the EOS project over one and half years and has supported many data producers and users in the use of HDF. Mosaic and Collage. He is also involved in the development of ECS standards, data modeling and data structures.

Bruce Moxon has an M.S., Computer Engineering, University of Southern California. Mr. Moxon is the Manager of Advanced Technology Projects within the Science Data Processing Segment (SDPS) of Hughes Applied Information Systems (HAIS). In this capacity, he is responsible for the identification, tracking, prototyping, and infusion of distributed information technologies to support the Earth Observing System Data Information System (EOSDIS) Core System (ECS). In this role, Mr. Moxon works on developing collaborative relationships with external organizations and research projects.

For more information, please contact R. Suresh at suresh@ulabsgi.gsfc.nasa.gov