By Jason Cooper, Archivist, National Climatic Data Center, U.S. National Oceanic and Atmospheric Administration (NOAA)
Data repositories and archives play a critical role as the source for the observational data used in the study of weather and climate. After over two centuries of recording observations on paper, observations in the last 20 years have mostly been collected digitally. Repositories around the world now have a mix of data on physical media, such as paper and microfilm, and digital media such as computer disks, magnetic tapes and DVDs. There is risk of data loss regardless of storage media. Repositories must identify, appraise, acquire, archive and migrate data to new technologies to make it available for research. These acts of stewardship and conversion to common electronic formats are vital to preserving and enhancing the global climate record.
Creating homogeneous, complete data sets from disparate collections is a fundamental challenge facing the climate research community. This is best achieved by leveraging partnerships to share resources and areas of expertise. The goal is to bring more information into the hands of scientists who seek to understand the global climate system and anticipate changes to the climate.
Observational data today
In the twenty-first century, weather observations are overwhelmingly collected digitally. The U.S. National Climatic Data Center’s (NCDC or the Center) digital collection began in earnest in the late 1970s when satellite data transitioned to digital formats. In the 1990s, the National Weather Service (NWS) deployed new systems for in situ and radar data, an aging network of weather radar sites was replaced with the Next Generation Weather Radar (NEXRAD) network. Digital imagery began flowing directly to NCDC. Today, the Center holds twenty years of NEXRAD data in its digital repository, while maintaining over 20 million pre-NEXRAD images on microfilm.
Since the start of hourly weather observations at airports the late 1920s, NCDC has archived tens of millions of forms from stations nationwide. In situ data collection also moved toward digital in the 1990s with the NWS transition to the Automated Surface Observing System (ASOS). ASOS transmits digital data directly to NCDC, largely ending the creation of paper observation forms. New digital data submission practices were also established for the volunteer Cooperative Observers network as citizens connected to the Internet by the millions.
Today, there are few new paper forms or analog charts. The only new hard-copy data arriving at NCDC comes from ships at sea, stations with intermittent transmission issues, and a very small number of stations that have not made the conversion to digital data submission. NCDC’s large collection of records on physical media (referred from here onward as physical records) is not growing very quickly. Occasionally, the Center will still receive a small collection of records found at an NWS forecast office or state climate office or paper and portable digital media, such as floppies and CD-ROMs, from outgoing employees. Some data has been forgotten or neglected for decades.
With the mass transition of weather observations to digital formats, repositories across the globe are left with aging paper and microfilmed records. Data rescue efforts aim to make more of this historical data readily available. In the U.S., a hugely successful digitization effort known as the Climate Database Modernization Program (CDMP) converted more than fifty million observational records to digital image formats such as PDF, TIFF and JPEG. The program also supported the keying of hundreds of millions of weather elements into digital datasets. As a result of the success of the program, hourly observations for long-running stations in the U.S. are available in image formats as well as keyed data files, and the U.S. has a dense coverage of data that dates to the 19th century.
The face of data rescue and digitization looks different today. CDMP ended in 2011, and there are still physical records to digitize. NCDC actively seeks partnerships to continue to rescue and convert data to digital formats ready for use in climate research. NCDC has engaged in partnerships to scan early satellite records that reveal sea-ice extent in the late 1960s and early 1970s. The National Hurricane Center has begun to use early geostationary and polar-orbiting satellite imagery, previously available only in analog format, in the reanalysis of the Atlantic Hurricane seasons from 1966–1975. Opportunities exist for reanalysis of the Pacific and Indian Ocean basins, possibly through the International Best-Track Archive for Climate Stewardship (IBTrACS) initiative.
NOAA's archives / © NOAA
As part of its engagement as a WMO World Data Center, NCDC has over two thousand boxes of foreign paper records and over 5 000 reels of microfilm in its own archive. It also holds observational records taken by the U.S. military during wartime, from World War II to the early 1990s conflicts in the Balkans and Kuwait. For the first time, users can discover these and all records on physical media at NCDC as the database of physical holdings is now publicly available through the Web-enabled Archive Records Tracking and Inventory System (WebARTIS) interface. Opportunities exist to capture and use global in situ observations as well. Those requesting conversion of records from physical media to digital formats are asked to bear some of the cost to furnish the request. This is especially necessary for large requests. The government of New Caledonia recently funded the scanning of World War II era observations taken on their island in order to fill a large gap in their climate record. The Hong Kong Observatory discovered that the Center holds original records from their observatory taken in the 1930s. NCDC is creating digital copies of these records.
NCDC is not unique in possessing historic weather records. All across the globe, countries hold valuable weather records that, if digitized, would bolster the climate record and allow a greater understanding of climate. NCDC maintains an active relationship with the International Environmental Data Rescue Organization (IEDRO). IEDRO works to identify and rescue weather records in countries that hold large amounts of analog weather records and need assistance preserving and digitizing them. NCDC is working with IEDRO to define common data formats and recommend image formats for records that non-profit organizations identify and rescue. The data rescue realm includes those who possess and preserve physical records, those who seek records for scanning, those who digitize the data, and those who preserve and provide access to digital collections that comprise the modern climate record.
NCDC routinely accepts and cares for paper, microfilm and many digital data formats. However, there are some formats that are more difficult to preserve and access. While the Center has the capacity to read standard 16mm and 35mm film and even micro cards, it has a very small number of less common five-inch microfilms that require more specialized equipment to read. A more common problem that repositories must address: data on outdated portable media and data in outdated file formats. NCDC is fortunate to maintain a machine capable of reading and handling 3.5 and 5.25 inch floppy discs, but again must seek help reading “zip discs,” a media that was very popular for a short time in the late 1990s. CD-ROMs and DVDs are already fading in popularity. Because of rapid obsolescence, NCDC makes it a policy to move data from portable media onto a managed, networked digital archiving system to continue preservation.
Another digital preservation issue is the use of outdated or unique file formats. Many formats are proprietary, owned by software companies. Once the company ceases operations or moves to a newer software application, support for the formats generated by the older software begins to decline. When this happens, repositories must either convert the files to a better-supported format or maintain documentation to assist users in doing so. Digital repositories must monitor the file formats in their holdings and migrate from those that appear to be losing support. NCDC policy is to use as few formats as possible in order to minimize the number of formats that must be monitored. The broader archival community offers suggestions for lasting image formats, and scientific communities advocate for standard self-describing digital formats such as NetCDF.
Whether the collection is physical or digital, an archival process known as appraisal occurs. Appraisal is a subjective judgment of whether a record, dataset or collection belongs in a given repository. NCDC appraisal considerations fall into two primary categories. The first is appropriateness for archival, which is determined by asking the following (note the use of the word collection could easily be replaced with data or records, depending on the media):
- Is the collection already archived elsewhere? If yes, is it in a national, university, or corporate archives? If so, should we enter into an agreement to take possession should the host of the archives decide to dispose of them?
- Is the Center the appropriate repository for the collection? If not, where are the alternatives?
- Is this where users would expect to find the collection? If not, where would they look? Would that repository be able to steward?
- Would NCDC customers be interested in the collection? Producers are asked to identify who they believe the user community would be for their collection.
- Does the collection complement our existing holdings?
- Does the collection support work done at NCDC such as product development, research or publishing?
- Is the collection produced by the National Oceanic and Atmospheric Administration (NOAA)? NCDC’s first and foremost mission is to serve as the repository for its parent organization.
It would be ideal if NCDC could accept all records that satisfy the questions above. However, resources and ability to steward the new data or physical records also play a part in appraisal. Therefore, more questions need to be answered:
- Can NCDC handle the volume, both in transfer and in storage?
- Can NCDC prepare the data for archival? Will the producer assist with transfer, metadata creation and file naming?
- Is the data in a preferred format, or can NCDC accommodate the format?
- Can NCDC provide access to the records?
- Is there subject matter expertise at NCDC to help answer user questions?
When appraisal is conducted in a consistent manner, a repository develops a strong identity, and the user community begins to perceive the types of data and physical records to be found there. Conversely, a haphazard appraisal practice leads to disjointed collections and uncertainty about the mission of the repository, from both the community it serves and even its own employees.
The repository and climate research
NDC serves as a state-of-the-art digital repository for records that have been converted to image and keyed data formats. NCDC’s role as a World Data Center, as well as its commitment to digital stewardship, makes it an ideal place to submit data for archival. NCDC is actively involved in the International Surface Temperature Initiative, the global repository for land surface observations from the 1800s to present. Similarly, NCDC participates in the International Surface Pressure Databank, the world’s largest collection of pressure observations. These federations of data represent the final instance of the global data rescue and partnership effort: aggregated datasets that are as complete and homogeneous as possible given the disparate origins of the original observational records.
Data rescue and preservation in the twenty-first century requires partnerships to share resources and areas of expertise. In this way, data rescue looks more like climate research than ever before, with dozens of nations, non-profit organizations and universities contributing to the improvement of the climate record. Provided with richer, more complete datasets, the global climate research community will continue to advance understanding of the global climate system and anticipate changes to the climate in the future.
National Climate Data Center (NCDC)
International Surface Temperature Initiative (ISTI)
The International Surface Pressure Databank (ISPD)
Global Observing Systems Information Center (GOSIC)
World Data System (WDS)
International Environmental Data Rescue Organization (IEDRO)
National Archives and Records Administration (NARA) archive environmental guidelines