Theodore Meyer, Information Architect, Earth Science Data and Information System Project, NASA GSFC
Ramachandran Suresh, Principal Scientist, Hughes STX
Douglas Ilg, Senior Programmer/Analyst, Hughes STX
Bruce Moxon, Manager of Advanced Technology Projects, Science Data Processing Segment, HAIS
The National Aeronautics and Space Administration is developing the Earth Observing System Data and Information System (EOSDIS) to receive, process, and distribute Earth and environmental science data from the EOS series of sensors to be launched beginning in 1997 as a part of the Mission to Planet Earth program. This paper describes how Mosaic, HDF, and NCSA's Collage can support access and sharing of Earth science data by users of this system. Additionally, the paper presents the relationship of WWW and EOSDIS development.
Mission to Planet Earth (MTPE) is a program to support the collection and distribution of reliable Earth and environmental science data to study global change. Beginning in 1997, a series of Earth science remote sensing satellites, known as the Earth Observing System (EOS), will be launched as NASA's centerpiece in support of the MTPE. The EOS Data and Information System (EOSDIS) will process the large amount of data generated by the sensors from these satellites. EOSDIS is a distributed system currently being developed by NASA, which will receive, compile, process, and distribute the massive amount of data collected by the EOS instruments and related Earth science programs. EOSDIS will be one of the largest data systems in the world and will pose technological challenges in the areas of data processing, access, and distribution. This system will need to accommodate a very large and diverse user community. Software tools like Mosaic and WWW servers can play a major role in providing additional access to information and allowing researchers to share science data during the EOS era.
An operational prototype of EOSDIS called Version 0 (V0) was built using existing Earth science data at 9 Distributed Active Archive Centers (DAACs). This system, built by NASA and the DAACs, is currently operational and is accessible to users. Some components of EOSDIS V0 system use Mosaic to provide access to information describing data sets. The section in this paper titled "EOS Home Pages" provides sources of information to more completely describe MTPE, EOS, and related projects. Additionally, a variety of documents completely describes the requirements and architecture of EOSDIS, the location and host institutions for the DAACs, and methods for accessing the current V0.
The Hierarchical Data Format (HDF) is a data format developed at NCSA to support the transport of scientific data between computing platforms. HDF was adopted as the standard data format for data distribution for the V0 system in 1991. Since then, a large number of Earth science data sets representing different data structures have been implemented in HDF and distributed to the users. Initially these data sets were distributed on CD-ROMs, tapes and on-line with various software tools.
Recently, an interface with HDF and Mosaic was developed with links to NCSA's Collage application. As a result, many of these data sets are available to a larger number of users on-line. The V0 Information Management System (IMS) Guide subsystem also uses the Mosaic library to provide links to the DAACs.
Mosaic can be used to access HDF data and determine the contents of HDF files. Through Mosaic and Collage most HDF data sets can be displayed and analyzed. A combination of HDF, Mosaic, and Collage provides a powerful toolkit for EOSDIS users to access and share Earth science data. Since both Mosaic and Collage work in a multi-platform environment (Workstations, Macs, and PCs), a large number of users will benefit. A detailed discussion of Mosaic, HDF and links to Collage is given in a separate section.
Based on the experience of using HDF for EOSDIS V0 and many other projects, HDF was adopted for use in later EOSDIS releases. It was recognized that HDF would need to be extended to fully meet the requirements of complete system. The full release of EOSDIS will inherit some components of the V0 system. Mosaic and WWW server are already being used to distribute project and system related information to the scientists and general users.
Figure 1: Mosaic and NCSA Collage Access to HDF Data

Beginning with versions 2.0, NCSA X Mosaic has incorporated some rudimentary, but very useful, HDF capabilities. Mosaic recognizes files with the extension ".hdf" as HDF files and displays them using the "Scientific Data Brows-o-rama" viewer which is currently built into the X Windows version of Mosaic. Currently, Mosaic supports HDF version 3.3 release 3 and all earlier versions.
The "Scientific Data Brows-o-rama" converts the structure of the HDF file to HTML on the fly and presents a logically formatted and easily readable representation of the file in the document view window. Brows-o-rama displays all file level and object level binary attributes and text annotations as well as producing in-line displays of raster images and palettes. Several samples of HDF files can be viewed from the Mosaic Demo Document available from Mosaic's "Help" menu.
In addition to its native capabilities for display of certain types of HDF data, Mosaic can also be linked to NCSA's Collage (version 1.3 or later is recommended), a collaborative data display and analysis application, through the Data Transfer Mechanism (DTM). In this way, Mosaic can be used as a data discovery and retrieval front-end to a powerful data visualization package. From the EOSDIS point of view, the most important benefit of this linkage is the ability to display and manipulate large arrays of science data stored in HDF Scientific Data Sets (SDS). The only major HDF data object that is not supported is the Vdata. It is also hoped that more complete support of SDS features can be incorporated into Collage.
Using a combination of Mosaic and Collage, a scientist can locate and retrieve an HDF file, browse through its contents and hierarchical structure, read the descriptive text fields and binary attributes in the file, and view and analyze raster images, palettes and scientific data sets. Currently, Collage does not fully support all the features of the SDS and neither application directly supports Vdatas. It is hoped that these features can be incorporated into future versions of Collage or Mosaic. Figure 1 shows an example of Mosaic and Collage accessing an HDF File.
Connecting Mosaic and Collage
To use Mosaic as a data discovery and retrieval front-end for Collage:
1) Start up Mosaic and Collage.
2) Initiate a collaborative session in Collage by choosing "Begin Session" from the "Collaborate" menu in the Collage window. Collage will prompt you for a port number to use for the session. This number can be any unused port number from 1025 to 65535. Collage will then pop up a text box notifying you that the collaborative session has been established.
3) Connect Mosaic into Collage's collaborative session by choosing the "Open DTM Outport" from Mosaic's "File" menu and supplying the same port number as was used to start up the Collage session. Collage should now show "Mosaic" as a user in the session.
4) Open an HDF file using any standard Mosaic method. You should see hyperlinks in parentheses following the text describing each raster image, palette, and SDS in the file. The hyperlinks look like this: (To broadcast this data set over DTM, click here.)
5) Now simply click on a hyperlink to view the data using Collage.
Currently Mosaic has been used widely in the EOS project by Earth science user community, DAACs and many related organizations to disseminate information on software, data, and EOS project related news. This is evident by many home pages related to EOS. Table 1 gives a list of current EOS home pages.
Table 1: List of Some EOS Related Servers Accessible Through the Earth Observing System Mosaic Information Server
Server/Document Description
Name
EOS Project This service allows you to discover, retrieve and display
Science Office documents and data about the EOS from all over the internet.
Issues of the newsletter, The Earth Observer, the Payload Panel
Report and the EOS Reference Handbook are currently available.
EOSDIS V0 IMS This service allows a user to link to other DAACs and ADCs and
through which provides access to data and information at each
DAAC.
ECS Data Handling This is maintained by the Hughes EOSDIS Core System team which
System provides meeting notes, white papers, and a variety of documents
related to the EOSDIS V1 system.
Oak Ridge National Provides data on carbon dioxide research, biogeochemical
Laboratory dynamics, and cycle.
MTPE - Mission to This is an experimental service provided by Mission to Planet
Planet Earth Earth staff at NASA Headquarters in Washington, DC. The
information provided by this web include MTPE and NASA resources
along with Internet tools and resources useful to MTPE personnel.
Volcanology Team, This provides information about the project activities of the
EOS NASA EOS Interdisciplinary Science (IDS) Investigation
Interdisciplinary Volcanology Team. This will provide information on Global
Science Assessment of Active Volcanism, Volcanic Hazards, and Volcanic
Inputs to the Atmosphere from the Earth Observing System.
Alaska SAR Facility Provides data and information to support research activities to
improve the knowledge and understanding of the effects of global
change/warming on the polar regions and the role of the polar
regions in global change/warming. Particularly it provides a
variety of SAR data products.
EROS Data Center The Land Processes Distributed Active Archive Center (LP-DAAC)
at the EDC provides data and information on Land related data
sets such as Landsat MSS. TM, AVHRR and aircraft SAR data.
Goddard Space The Goddard (DAAC) provides data and information related to
Flight Center upper atmosphere, global biosphere, atmospheric dynamics and
geophysics research.
Jet Propulsion The Physical Oceanography Distributed Active Archive Center
Laboratory (PO.DAAC) archives and distributes data relevant to the physical
state of the oceans, in support of oceanographic and geophysical
sciences research.
Langley Research The Langley DAAC provides access and information in the areas of
Center radiation budget, clouds, aerosols, and tropospheric chemistry.
Marshall Space The George C. Marshall Space Flight Center (MSFC) DAAC provides
Flight Center data and information related to hydrological cycle.
National Snow and NSIDC provides data and information on snow and ice, their
Ice Data Center properties, characteristics and contexts, and of their
significance for human activity.
NOAA SAA The Satellite Active Archive provides access to data and
information from NOAA satellites.
Global Change The Global Change Master Directory is a comprehensive source of
Master Directory information about Earth science, environmental, climate, and
global change data holdings available to the scientific
community throughout the world.
EOS Project Office Provides EOS project related information and links to EOSDIS V0
IMS and DAACs
SeaWiFS This page provides access into the background, status and
documentation for NASA's upcoming global ocean color monitoring
mission called SeaWiFS.
TISDIS This server provides information for the Tropical Rainfall
Measuring Mission (TRMM) Science Team Members (TSTMs) interested
in the current status of the TRMM Science Data and Information
System (TSDIS).
The
V0 IMS has a home page, which provides connections to DAACs and ADCs
(Affiliated Data Centers). In addition to many EOS home pages, there are other
Earth science related home pages from different EOS-related projects such as
SeaWiFS, EOS Pathfinders and institutions like NOAA, USGS, and NCAR. These home
pages provide a variety of information about Earth science data systems. The
Earth Observing System Mosaic Information Server is the starting point for EOS
Earth science related information. This home page provides links to many other servers of
interest.
EOSDIS V0 IMS is currently operational and is a result of a collaborative effort between representatives of the IMS team at GSFC, the DAACs, and ADCs. The IMS interface is available through both graphical and character user interfaces. The graphical interface to the IMS guide is a Mosaic based application for providing access to information, whereas the character interface is based on the CERN Line Browser. The IMS home page itself provides some links to other home pages that describe data holdings, platforms, sensors, data sets, and the data centers in EOSDIS V0. Figure 2 presents an example of a the V0 guide function. Access to the IMS interface through WWW has been successfully tested and will soon be available from the EOSDIS V0 IMS home page.
Figure 2: Example of EOSDIS Version 0 Guide Page

In this section, we first present a high level discussion of key EOSDIS architectural features. We then explore the suitability of both WWW-based and object-oriented approaches, and identify opportunities for additional WWW-based research.
The design and development of future releases of EOSDIS are under way. These releases will focus on the development of a robust, scalable information system whose primary mission is to support scientists and policy makers in understanding the extent, causes, and regional consequences of global climate change. The system functionality is centered around the needs of research teams to effectively and efficiently find, access, use, and share the results of their research, which include data, computer models, modeling results, and observations and analyses.
This community is, by nature, a distributed one, with widely varying scientific disciplines and approaches, and a diverse set of analysis tools. Yet this community has a number of common needs in accessing and managing a complex data repository. These include:
* Direct access to complex data objects and services
This includes a wide variety of sampling, subsetting, formatting, and model-based product generation algorithms, provided both by the DAACs, and by an extended provider network of scientists. Because of the potentially complex and dynamic distributed environment, these objects and services need to be provided in a location-transparent manner.
* Incremental, distributed search capabilities
The data holdings within the community tend to be discipline-specific at any one site, both in terms of their scope, and their content (e.g. spatial resolution). Interdisciplinary collaboration requires complex distributed search approaches to provide spatially and temporally collocated data products with adequate content, and at appropriate resolutions for interdisciplinary modeling. These searches are often carried out interactively in an incremental approach, requiring sessions to hold interim results. Attribute-based searches for services that meet a user's needs are also required.
* Flexible data management system
Scientists are continually augmenting, categorizing, and re-categorizing data collections. These collections and their services must be made available to the community at large (published and advertised), and must include detailed information about the manner in which they were produced (data lineage). The research nature of this work virtually assures that the attributes of these products will change over time. A flexible data management system that can dynamically adapt to changes in data product attributes (metadata) and can incorporate computational methods to instantiate attributes or generate new products is required.
* Interactive, collaborative environment
The development of many of these products will be accomplished through interdisciplinary collaboration among geographically distributed colleagues. An environment for easily sharing, exchanging, and collaboratively developing new data collections is required.
* Event-driven processing and distribution scenarios
Both research scenarios and quality control (QC) activities require event-driven data distribution and subsequent processing.
* Equitable resource distribution
In order to provide an equitable balance of resources across funded research teams, the system incorporates a resource-based accounting service. It employs dynamic service "pricing" as a means to naturally manage user demand.
Secondary objectives for EOSDIS focus on more widespread availability of EOS data to support additional environmental and earth science research, education at all grade levels, and possible commercial applications. Requirements for this portion of the user community are perhaps less aggressive than those described above, aiming mainly at distribution of EOSDIS data holdings according to a more traditional "publishing" metaphor.
These and other related requirements have led us to the design of a distributed objects- and services-based architecture for EOSDIS. The requirements include fairly rigorous data management and transaction semantics consistent with developments in distributed object technology (e.g., the Object Management Group's CORBA) and object-relational or object-oriented database management systems (ORDBMS or OODBMS).
Some of the required user functionality described above is addressed well by current WWW implementations and planned extensions. Other functionality may require a set of "interoperable" WWW- and object-based solutions, each of which provides it's own benefits to a selected set of users. Some of the more interesting areas of interoperability under investigation include:
* Persistent, location independent object naming

While we recognize the tremendous rate of progress in the WWW community, we can not be sure of the design trades that will be made over time in this community. A hybrid approach is depicted in Figure 3. We currently believe a hybrid model of access and distribution within EOSDIS will be appropriate, with a variety of data and services being available to the community through both HTTP-based browsers, and EOSDIS interfaces and object browsers.
These considerations for resource management are important in a resource constrained environment with very focused scientific objectives. As mentioned previously, EOSDIS also hopes to serve a much larger community through more widespread data availability. A wide variety of users can be supported through value added providers in the EOSDIS architecture. These providers would likely employ Web technology for widespread data distribution. In this model, secondary providers will subscribe to EOSDIS data delivery services. These providers will then generate HTML-based descriptions of their data holdings, either periodically, or dynamically, in response to forms-based user specified selections.
A dynamic "fan out" approach can be used to provide feeds to a limited set of secondary providers, reducing the "pull" on the primary system resources while leveraging the delivery abilities of the secondary providers. Such approaches have been developed for distribution of weather data to the educational community [Unidata], and could be augmented to support broad distribution of commonly accessed data from EOSDIS.
Theodore Meyer is the Information Architect for the Earth Science Data and Information System Project, Goddard Space Flight Center, NASA. In this role he is responsible for developing data standards and implementing a fully integrated data system. Mr. Meyer has represented NASA in international and national forums and technical meetings related to data formats, data processing, data access and management. Mr. Meyer has a M.S., Aerospace Engineering, University of Texas at Austin and B.S. degrees in Physical Sciences and Computer Science, University of Maryland.
Ramachandran Suresh is currently working as a Principal Scientist at Hughes STX. He has over 14 years of research experience in the areas of remote sensing for Earth science applications, geophysics, geology, advance technologies, mass storage, data modeling, and data format standards. He is working with the EOS project for the last seven years in various capacities. He is the architect behind implementation of standard data format for EOSDIS V0 system. At present he is involved in the development of ECS-HDF standard, data types, and data product design in HDF.
Doug Ilg is currently working as Senior Programmer/Analyst at Hughes STX. He is expecting a Masters degree in computer science from University of Illinois, in 1995. He has over eight years of experience in scientific programming in C and UNIX environment. He worked at the National Center for Supercomputing Applications and was involved in the development of HDF library. He has been working with the EOS project over one and half years and has supported many data producers and users in the use of HDF. Mosaic and Collage. He is also involved in the development of ECS standards, data modeling and data structures.
Bruce Moxon has an M.S., Computer Engineering, University of Southern California. Mr. Moxon is the Manager of Advanced Technology Projects within the Science Data Processing Segment (SDPS) of Hughes Applied Information Systems (HAIS). In this capacity, he is responsible for the identification, tracking, prototyping, and infusion of distributed information technologies to support the Earth Observing System Data Information System (EOSDIS) Core System (ECS). In this role, Mr. Moxon works on developing collaborative relationships with external organizations and research projects.
For more information, please contact R. Suresh at suresh@ulabsgi.gsfc.nasa.gov