To better anticipate future weather and climate impacts on the Earth system and society, there is an ever-increasing demand for longer and higher resolution terrestrial and marine databases of the weather. The construction of these baseline climate data resources requires a massive effort to recover and translate handwritten records to digital format, and then quality control, integrate and serve huge amounts of historical weather data to a new generation of modelling and retrospective analysis (reanalysis) systems running on the world’s most powerful computers.
Over the last decade there has been growing recognition of the importance of historical marine weather data to fill major gaps in existing data coverage. Marine data, covering the 70% of the Earth that is ocean, are a critical (and for most of history only) means to quantify the various roles that the global ocean play in climate regulation over time, and hence provide the best means to foresee the future trajectory of the climate and its likely impact on every aspect of life. Indeed, the urgent need to anticipate future climate, combined with increasingly capable models and data-driven reanalysis systems, has transformed the value of historical weather data to climate science.
The main source of historical marine data are weather anecdotes, remarks and observations recorded in logbooks and diaries written aboard ships that sailed local seas or crossed the oceans of the world for centuries. Standardized tabulations of non-instrumental information and measurements of marine weather appeared at the beginning of the sixteenth century, while more systematic observations using high-quality meteorological instruments commenced in the mid- to late eighteenth century. The first efforts to establish international coordination and standardization in marine meteorology arose with the Brussels Maritime Conference (1853). It is from this period that naval and merchant ships of many nations began to systematically collect and record weather and sea-surface observations by the millions. Today these records provide the data needed to drive state-of-the art models and reanalyses.
COADS to ICOADS
In the marine data rescue field, a major effort was initiated in the 1980s to produce the most complete collection of surface marine weather observations. The Comprehensive Ocean Atmosphere Data Set (COADS) was the outcome of those efforts and included newly available repositories of digitized marine weather observations from multiple sources, typically produced and stored on punch cards at that time. Following increased international support and contributions to the dataset’s development over the years, the project was renamed to the International Comprehensive Ocean-Atmosphere Data Set (ICOADS) in 2002 to better reflect the important contributions made by international partners and global data managers.
Figure 1: Major historical digitized and external archive marine data sources add to ICOADSv3, from 1800-2014. Horizontal black lines show the time range of the original marine data sources. The annual numbers of reports are plotted as curves (logarithmic scales on the vertical axis), blue for the previous ICOADS R2.5, and red for ICOADS R3.0. Marine data coverage prior to 1800 is sparse, and that following 2007 continues to grow annually. Source: Freeman et al. (2017).
Over its existence, ICOADS has also achieved recognition as the major repository and access point for historical marine weather observations recovered from digitization efforts, small and large. This includes historical weather data recovered and digitized by the CDMP (Climate Database Modernization Program: 2000-2011), CLIWOC (Climatological Database for the World's Oceans 1750-1850: 2001-2003), RECLAIM (RECovery of Logbooks And International Marine data: 2004->), International Atmospheric Circulation Reconstructions over the Earth (ACRE), 2007->) initiative (Allan et al., 2016) and the CoRRaL (UK Colonial Registers and Royal Navy Logbooks: 2008- 2009) projects.
The most current version of ICOADS is Release 3: ICOADS R3.0 (covering 1662-2014) (Freeman et al., 2017), with monthly near-real-time extensions from 2015-present. Figure 1, from the latter publication, provides a comparison of years 1800 to 2014 between ICOADS Releases R2.5 and R3.0, showing the gains made from the numerous data recovery efforts. The significance of these rescue efforts is shown, in both volume and temporal coverage, and are critical to further expanding this major collection and providing public access to more ocean data. As ICOADS looks to modernize and expand its collections in the near future for a new dataset release, historical data rescue and digitization efforts will be vital in providing new sources of data for the dataset, further enabling better scientific understanding of historical environmental conditions over the global oceans.
Since the release of ICOADS R3.0, concerted efforts have been made to expand the recovery, imaging and digitization of historical global marine weather data. Much has been undertaken by a mix of ongoing and new data rescue projects and citizen science activities under Deutscher Wetterdienst (DWD, German Weather Service), National Oceanic and Atmospheric Administration (NOAA) and the University of Washington working with the U.S. National Archives, and the efforts of the Global Surface Air Temperature (GloSAT), (2019->) project, or linked to the international ACRE initiative, and its ACRE Oceans chapter (e.g. the EU Copernicus C3S Data Rescue Service [DRS] and the UK Newton Fund projects of ACRE China under CSSP China, ACRE/C3S DRS/WCSSP South Africa, ACRE/ C3S DRS Argentina and ACRE/C3S DRS Antarctica). These initiatives have included new, interlinked marine citizen science data rescue foci under Old Weather (2013->), Weather Detective (2014-2017) and Southern Weather Discovery (2018->). The Danish National Archives has also identified a collection of more than 7000 archive boxes of ship-based weather data dating from 1650 onward that are suitable for digital imaging and transcription. All the marine data digitized by the above will be provided to ICOADS and the new EU Copernicus Global Land and Marine Observations Dataset (GLAMOD) (Thorne et al., 2017).
The great bulk of the data rescued (imaged/scanned and catalogued) by ACRE Oceans was achieved through the efforts of just two individuals, focusing on three different archives in the United Kingdom – The Met Office (UKMO), Hydrographic Office (UKHO) and National Archives (TNA) – and working with a number of other repositories around the world (Argentina, Australia, Chile, New Zealand, Scandinavia, South Africa and U.S.). Some of these historical marine data have since been digitized using both traditional keying and citizen science initiatives. In 2019, ACRE Oceans scanned 2.6 million, and had digitized 1.5 million, historical marine observations. For just the Antarctic and the Southern Ocean regions, the following tabulation provides a comprehensive picture of the imaging/scanning and digitization that has been undertaken. It should be noted, that much of the rescued data were recovered from just a few archives and much more has been uncovered but not imaged. There are a number of other archives around the world that could potentially hold such data but have not yet been visited.
It is important to recognize that historical marine data are not only found in naval and merchant shipping logbooks. There are meteorological and oceanographic data in marine surveying and hydrographic documents (e.g. Remark Books), material connected with the regulation of whaling and fisheries, marine cable laying, transportation of mail (packet ships), yachts, vessels carrying convicts and settlers and many other types of documents other than ship logbooks. The bulk of this material still needs to be addressed – imaged/ scanned and/or digitized, catalogued and archived.
It is also worth mentioning that new archives, and new types of documentation are always coming to light – for instance at the UKHO, ACRE Oceans found workbooks used to compile observations of meridian distances to establish the longitude of places, also had twice daily pressure and air temperature. These had been overlooked before because the observations were obscured by all the other figures around them.
DWD holds in its Seewetteramt (Hamburg Marine Meteorological Office) an archive of several collections of original historical worldwide weather records from ships and also German coastal and overseas land stations. The archive originates from the Deutsche Seewarte (German Marine Observatory), a predecessor of DWD that existed from 1868 to 1945 in Hamburg. With a stock of more than 37 000 meteorological ship logbooks, it is one of the world's largest archives of this kind.
On Frigate Hummer (Lobster), sailing from Nyborg on Funen to Copenhagen, a westerly breeze (Beaufort 3) was noted on Monday, 14 December 1733.
The historical archive of ship logbooks consists of several logbook collections starting in 1828. The first observations are from regular nautical logbooks. All other collections consist of standardized meteorological logbooks that were introduced by Maury (1840 to 1860). Starting in 1868, the German Marine Observatory provided their own meteorological journals to German merchant ships. The weather observations from these logbooks were used to produce charts of weather, winds and currents. Based on this climatological knowledge and the experience of the sailors, the German Marine Observatory compiled sailing instructions for merchant ships in return for their voluntary observations – a system that is still in existence with the International Voluntary Observing Ship (VOS) Scheme.
The overall number of marine observations in the historical archive of the Seewarte is estimated to be at least 23 million observations, and likely to be considerably more. Efforts to digitize the logbooks started in the early 1940s and millions of observations were transferred to punch cards at that time. Since then, the digitization effort has been continued at DWD in several phases. Until now, about 15 million observations have been digitized and added to a digital data base.
A sophisticated workflow has been established to digitize the contents of the ship journals, consisting of several steps: gathering all metadata for a specific logbook, optical scanning of the logbooks and finally transcription (keying) of the contents. All digitized data from each step are stored in a database system. Finally, quality-controlled data are included in the DWD’s marine meteorological archive as well as ICOADS.
Digitization efforts, not only at DWD, have been ongoing for decades in different projects, most of which have been detailed in this piece. Consequently, the different data archive contents are fragmented, e.g. some logbooks were only partly digitized, or the database entries originate from different digitization periods. For other datasets, the links to the respective metadata records has been lost over time. In the course of data exchange programmes, some data has been duplicated in several archives.
Hence, a further challenge, in parallel with the integration of newly digitized data, is the consolidation and homogenization of the existing data archives. A priority for DWD in this context is the assignment of missing ship IDs to each observation. This will allow for identifying data gaps and applying a better-quality control on entire ship voyages leading to a significantly improved quality of the existing data collections.
The efforts to digitize DWD’s historical archives contents are ongoing. Apart from meteorological ship logbooks, several archives of land stations are currently being digitized, quality controlled and submitted to international databases (more information on DWD data rescue activities are available here).
The DWD digitization effort still relies mostly on keying the observations by hand. A variety of old German handwriting and unusual data sheet layouts are a persistent challenge for automatic text recognition systems. Future machine-based transcription may significantly speed up the transcription. However, the (meta)data management of the rescued data, as well as the handling and scanning of the old and fragile documents still require a lot of careful work to create high quality modern data sets originating from these valuable historic data sources.
Figure 2: Ships’ positions where new-to-science marine-meteorological and sea-ice observations have been recovered from historical records (a. Northern Hemisphere, b. Southern Hemisphere). Data extracted via the Old Weather citizen-science project from U.S. federal sources, primarily Navy and Coast Guard vessels are shown in yellow. Positions in orange are data extracted from logbooks by the Southern Weather Discovery (SWD) citizen-science project run by the New Zealand National Institute of Water and Atmospheric Research (NIWA). Approximately 1-2% of ~130K images with maritime weather observations that have been supplied to NIWA have been keyed through SWD, which is expected to accelerate in coming years to improve Southern Hemisphere spatiotemporal coverage.
NOAA/University of Washington/U.S. National Archives
The United States Steamer “Powhatan” in a cyclone of Hatteras – From a sketch by G.T. Douglass, U.S.N. – [See Page 374.] in Harper’s Weekly, May 12, 1877.
NOAA and the University of Washington (Cooperative Institute for Climate, Ocean & Ecosystem Studies) have been collaborating with the U.S. National Archives since 2011. During this period, the project has produced high-resolution digital images of 4 618 volumes of federal ship logbooks, dating between 1844 and 1955. These are all publicly available worldwide on the National Archives Catalog. These assets have so far produced about 1.5 million new-to-science hourly weather records via the Old Weather citizen-science project. As shown in figure 2a, more than 600 000 weather and sea-ice observations pertaining to the Arctic have been further enhanced by painstaking reconstruction of ship tracks to hourly resolution using the ‘dead reckoning’ and pilot information contained in the logbooks (i.e. data on the ship’s course and distance run, bearings and ranges from known landmarks).
The process of creating digital surrogates, transcribing and quality-controlling weather data from them, and passing these data into ICOADS and the International Surface Pressure Databank (ISPD) is ongoing. Sea-ice data recovered by Old Weather have been used for validation of a model-based reconstruction of Arctic sea-ice volume over the last century (Schweiger et. al 2019, Wood et. al 2019), and transcribed weather data are also available for machine learning research on handwriting recognition (HCR).
The enormous potential for data rescue is illustrated by the size of the U.S. collection that remains largely unutilized. Beginning in 1847, the logbooks of the U.S. Navy, Coast Guard/Revenue Cutter Service and Coast Survey contain 24-hourly weather records per day, and include 7–10 variables per hour, although not all variables were uniformly acquired in fact until after the U.S. Civil War (1861–1865). There are roughly 22 700 logbooks in the National Archives that date between 1801 and 1941. Until 1915, most volumes contain about one year’s worth of observations, and then from 1915–1941 logbooks were generally bound in monthly volumes. Conservatively estimating that only half of these logbooks contain all 24-hourly observations that would amount to 75 500 000 weather records to be recovered. There are undoubtedly tens of millions more unrecovered weather records from the World War II era and after.
Danish Meteorological Institute/National Archives of Denmark
The National Archives of Denmark contain huge collections of logbooks. Starting as early as the mid-seventeenth century, it has been possible to identify more than 7 000 archive boxes, filling more than 700 metres of shelves with logbooks and other maritime data. Only a very small part of this data has been digitized to date.
Data has been made available by many seafaring nations recently, but the Danish data, apart from its sheer age, is special in two respects:
- There were regular ship connections between Denmark and the other parts of the Realm of Denmark. This enables us to obtain a wealth of information about wind, weather, temperature and ice extent en route to Greenland and Iceland and back.
- The Øresund duty was a tax that every ship passing through the Øresund between Denmark and today’s Sweden (Danish at the time) had to pay. In certain years, this duty made up about a third of the Danish national budget. Therefore, the king decreed that ships would not pass without paying their duties, and ships were installed at several places along the sound and at the Great Belt to enforce this. The logbooks of these ships are interesting because they have a high temporal resolution and go back to the seventeenth century.
The National Archives and the Danish Meteorological Institute are setting up a project, named ROPEWALK (Rescuing Old data with People's Efforts: Weather and climate Archives from LogbooK records), to digitize this enormous amount of data. Machine-learning techniques will be used as much as possible, then the remaining data will be digitized by volunteers as has been the case in other comparable projects. The digitized data will be quality-checked and made available to the scientific community.
National Institute of Water and Atmospheric Research
The National Institute of Water and Atmospheric Research (NIWA) in New Zealand have been conducting meteorological data rescue as a contribution to ACRE (via ACRE Pacific and ACRE Antarctica) since 2009. They also have been passing data to the ISPD through that channel. During the past decade, their focus has primarily been on recovery of southwest Pacific and Southern Hemisphere high latitude meteorological observations spanning the period 1800–1950.
NIWA hold millions of observations dating back to the mid-1850s, and ongoing efforts are creating digital surrogates and a metadata catalogue to verify physical document holdings and keyed data held in digital archives. Several other high-value historic meteorological documents have been located in New Zealand (Lorrey and Chappell, 2016), which have been used to reconstruct synoptic weather patterns and have been compared to marine observations rescued by other scientific organizations.
In recent years, NIWA has driven the Southern Weather Discovery (SWD) citizen science platform hosted on Zooniverse (southernweatherdiscovery.org), recovering ~250 000 Southern Hemisphere marine weather observations, promoting meteorological data rescue and completing experimentation on replicated data keying (Fig. 2b). They are also actively collaborating with Microsoft on an Artificial Intelligence (AI) for Earth project that is comparing manually transcribed observations and those completed by automatic means.
Challenges and actions
The principal challenges faced by the marine data rescue community fall largely into two categories: access to historical records and conversion from manuscript to digital format.
In the first category, ship logbooks, which tend to have the largest quantities of high-quality marine weather data, are often 100 or more years old and considered documents of national significance. The state archives that typically have responsibility for the care and preservation of these sometimes-fragile documents are understandably cautious about handling. However, it is not uncommon to encounter other barriers, such as monetization of access (beyond the cost of the imaging itself), or other embargoes on access that effectively limit uses at the scale necessary for data rescue.
In the second category, conversion to an actionable digital format is also a significant bottleneck. At present, this step relies on manual transcription, either via double-blind keying or citizen-science (crowdsourcing). These approaches are quite worthwhile if targeting particular regions or time periods with sparse data, such as the Arctic or Southern Ocean, or a discrete research question. However, a large-scale conversion of the vast quantities of unused marine weather data that are known to exist will require an efficient AI/ machine-learning solution.
Finally, digitized records must be as complete as possible with detailed metadata (where possible). This is especially important when dealing with data biases, which depend on knowledge about parameters such as solar radiation, wind speed and direction, humidity and air temperature. For metadata, information like where thermometers are stored or sheltered, locations of screens, observing platforms and details of other instrumentation can all be important. For sea surface temperatures, there is rarely information about instrumentation for engine room measurements or what type of ocean sampling bucket was used, and often little about how measurements were made (Kent and Kennedy, 2021). Documentation of methods, such as can sometimes be found in marine observer handbooks and etc, are also important. In conjunction with the above is the need to reprocess legacy data so that as much can be obtained from them as possible, but also to assess what is complete and what is not. All of these efforts would benefit immeasurably from access to more sustainable funding sources.
Actions which are beginning to address the above needs include:
- In the U.S., NOAA has recently elevated both citizen-science and machine auto-transcription development with targeted funding opportunities for small business and via the NOAA High Performance Computing and Communications Program’s Information Technology Incubator.
- Private philanthropy has been increasingly engaged in supporting climate science in areas where funding has traditionally been challenging
- The coming together of EU Copernicus, WMO, ACRE, DWD, UK Newton Fund, NOAA, NIWA and similar initiatives and funding streams, which are all increasingly working in conjunction with the National Weather Services, to rebuild and enhance fundamental data infrastructure to meet the needs of high-performance reanalysis and emerging AI applications in this domain.
Once in digital format, the technology exists to assimilate every marine weather observation collected by every ship, every day, for the past two centuries. What is learnt about the long-term state and future of the Earth System from such a comprehensive reanalysis may prove to be of the utmost importance in the future.
Rob Allan, Atmospheric Circulation Reconstructions over the Earth (ACRE) Manager, Met Office Hadley Centre, United Kingdom
Kevin Wood, University of Washington, Cooperative Institute for Climate, Ocean & Ecosystem Studies, and NOAA Pacific Marine Environmental Laboratory, U.S.
Eric Freeman, University of Maryland, Cooperative Institute for Satellite Earth System Studies, and NOAA National Centers for Environmental Information, U.S.
Clive Wilkinson, ACRE OCEANS/CSW Associates-Data Services, U.K.
Axel Andersson, Deutscher Wetterdienst, Germany
Andrew Lorrey, National Institute of Water and Atmospheric Research, New Zealand
Philip Brohan, Met Office Hadley Centre, U.K.
Martin Stendel, Danish Meteorological Institute, Denmark
John Kennedy, Met Office Hadley Centre, U.K.
Rob Allan leads the international ACRE initiative and is supported by funding from the U.K. Newton Fund [which is managed by the U.K. Department for Business, Energy and Industrial Strategy (BEIS)], under its Climate Science for Service Partnership (CSSP) China (a collaborative climate science initiative between research institutes in the UK and China) and the Weather and Climate Science for Service Partnership (WCSSP) South Africa (a collaborative initiative between research institutes in the UK and South Africa) projects, plus the EU Copernicus C3S Data Rescue Service. He also acknowledges the University of Southern Queensland, Toowoomba, Australia, and the Centre for Maritime Historical Studies, University of Exeter, Exeter, United Kingdom, where he is an Adjunct and Honorary Professor, respectively.
Kevin Wood is supported in part by the Cooperative Institute for Climate, Ocean & Ecosystem Studies (CICOES) under NOAA Cooperative Agreement NA20OAR4320271, and by a Digitizing Hidden Collections grant from the Council on Library and Information Resources (CLIR). This grant program is made possible by funding from The Andrew W. Mellon Foundation.
Eric Freeman’s base funding comes from NOAA, and he would like to thank all ICOADS contributors and users for continued efforts to expand and improve the dataset.
Clive Wilkinson leads the activities of ACRE Oceans which have been funded by various institutions and funding bodies including the National Institute of Water and Atmospheric Research (NIWA), New Zealand, Hokkaido University in Japan, Reading University in the UK, CSSP China and the EU Copernicus Science Programme.
Axel Andersson works for Deutscher Wetterdienst (DWD) which supports several data rescue activities in its Climate and Environment business area. The digitization of marine weather observations is conducted by the Marine Climate Monitoring unit of DWD.
Andrew Lorrey is a Principal Scientist in Climate and Environmental Applications and Programme Leader of Southern Hemisphere Climates and Environments at the National Institute of Water and Atmospheric Research (NIWA), New Zealand. He is also the Project Manager of both ACRE Pacific and Antarctica, and a Task Leader on analysis of observations.
Philip Brohan is supported by the Joint BEIS and Department for Environment Food & Rural Affairs (DEFRA) Integrated Climate Programme, Department of Energy & Climate Change (DECC)/DEFRA (GA01101) and the Climate Science for Service Partnership China.
Martin Stendel works for and is supported by the Danish Meteorological Institute (DMI).
John Kennedy was supported by the Met Office Hadley Centre Climate Programme funded by U.K. BEIS and DEFRA.
Allan, R., Endfield, G., Damodaran, V., Adamson G., Hannaford, M., Carroll, F., Macdonald, N., Groom, N., Jones, J., Williamson, F., Hendy, E., Holper, P., Arroya, P., Hughes, L., Bickers, R. and Bliuc, A-M., 2016: Towards integrated historical climate research: the example of ACRE (Atmospheric Circulation Reconstructions over the Earth). WIREs Climate Change, 7, 164–174. https://research-information.bristol.ac.uk/files/56642601/MELD_Clean_v4.pdf doi: 10.1002/wcc.379.
Brussels Maritime Conference (1853). Maritime Conference held at Brussels for devising an uniform system of meteorological observations at sea, August and September 1853 = Conférence maritime tenue à Bruxelles pour l'adoption d'un système uniforme d'observations météorologiques à la mer, aout et septembre 1853. https://hdl.handle.net/2027/uc1.$c186788
Freeman, E., Woodruff, S.D., Worley, S.J., Lubker, S.J., Kent, E.C., Angel, W.E., Berry, D.I., Brohan, P., Eastman, R., Gates, L., Gloeden, W., Ji, Z., Lawrimore, J., Rayner, N.A., Rosenhagen, G. and Smith, S.R., 2017: ICOADS Release 3.0: a major update to the historical marine climate record. Int. J. Climatol., 37: 2211-2232. https://doi.org/10.1002/joc.4775
Kent, Elizabeth C. and Kennedy, John J., 2021: Historical Estimates of Surface Marine Temperatures. Annual Review of Marine Science, 13:1, 283-311.
Lorrey, A.M. and Chappell, P.R. 2016. The "Dirty Weather" diaries of Reverend Richard Davis: insights about early Colonial-era meteorology and climate variability for Northern New Zealand, 1839–1851. Climate of the Past, 12, 553-573, doi:10.5194/cpd-12-553-2016
Schweiger, A.J., Wood, K.R. and Zhang J., 2019: Arctic Sea Ice Volume Variability over 1901–2010: A Model-Based Reconstruction. J. Climate, 32, 4731-4752, https://journals.ametsoc.org/doi/pdf/10.1175/JCLI-D-19-0008.
Thorne, P.W., R. J. Allan, L. Ashcroft, P. Brohan, R.J.H Dunn, M.J. Menne, P. Pearce, J. Picas, K.M. Willett, M. Benoy, S. Bronnimann, P. Canziani, J. Coll, R. Crouthamel, G. Compo, D. Cuppett, M. Curley, C. Duffy, I. Gillespie, J. Guijarro, P. Hechler, S. Jourdain, E.C. Kent, H. Kubota, T. Legg, J. Matsumoto, C. Murphy, L. Qingxiang, N. Rayner, E. Rustemeier, L. Slivinski, V. Slonosky, A. Squintu, B. Tinz, M.A. Valente, S. Walsh, X. Wang, N. Westcott, K. Wood, S. Woodruff and S. Worley, 2017: Towards an integrated set of surface meteorological observations for climate science and applications. Bull. Amer. Meteor. Soc., 98, 12, 2689-2702. https://doi.org/10.1175/BAMS-D-16-0165.1.
Wood, K.R., M. Purves, J. Arthur, M. Davis, A. De Havilland, et al., 2019: Pacific arctic sea-ice observations from U.S. Federal logbooks (1900-1938). Arctic Data Center. doi:10.18739/A2S46H60V.