On February 11 Science magazine published a special issue dedicated to the challenges that research communities face as they produce increasing quantities and types of data. One of the articles tells the story of particle physicist Siegfried Bethke, who wanted to reanalyze the data from an experiment conducted twenty years earlier. He discovered that no organized effort had been made to preserve the data, and it took him, his secretary and a graduate student two years to find it all and rewrite the now-obsolete code necessary to read it.
“The problem starts when the experiment is over, and the data used by one group of people is only understood by those people,” [Cristinel] Diaconu says. “When they go off and do other things, the data is orphaned; it has no parents anymore.” The orphan metaphor only goes so far: After a certain point, orphaned data can’t be adopted by later researchers who weren’t part of the original team. Even given the raw data, only someone intimately involved in the original experiment can make sense of it.
A particle physics study group has recommended that every large experiment hire a ‘data archivist,’ a sort of Receiver of Memory who would be responsible for making sure that data remains intelligible and accessible long after ‘the end’ of a project.
A data archivist would be a mix of librarian, IT expert, and physicist, with the computing skills to keep porting data to new formats but savvy enough about the physics to be able to crosscheck old results on new computer systems.
As indicated by another article in the special issue, “Climate Data Challenges in the 21st Century,” scientists not only need to make their data accessible to colleagues and to researchers of the future, but also to non-researchers of the present. As managers and policy-makers move to address time-sensitive issues such as climate change, the long-term soundness of their decisions will depend at least partly on the information available to them.
Increased support from the funding agencies is needed to enhance data access, manipulation, and modeling tools; improve climate system understanding; articulate model limitations; and ensure that the observations necessary to underpin it all are made. Otherwise, climate science will suffer, and the climate information needed by society—climate assessment, services, and adaptation capability—will not only fall short of its potential to reduce the vulnerability of human and natural systems to climate variability and change, but will also cause society to miss out on opportunities that will inevitably arise in the face of changing conditions.