Laval University, CC BY-SA 4.0, via Wikimedia Commons

Escaping the Digital Dark Age

Digital storage is easy; digital preservation is not.

Due to the relentless obsolescence of digital formats and platforms, along with the ten-year life spans of digital storage media such as magnetic tape and CD-ROMs, there has never been a time of such drastic and irretrievable information loss as right now. If that claim seems extravagant, consider the number of literate people in the world and how much work is "knowledge" work, which increasingly means computer work. The world economy itself has become digital. This is a civilizational issue.

Information lives in two major dimensions-space and time. With digitization and the Internet, all information is now potentially global. The space dimension for data will keep exploding, but the time dimension is shrinking. The half-life of data is currently about five years. There is no improvement in sight because the attention span of the high-tech industry can only reach as far as next year's upgrade, and its products reflect that. But civilizational time is measured in centuries. A major disconnect is in progress. Loss of cultural memory has become the price of staying perfectly current.

Nothing like acid-free paper

The loss is already considerable. You may have noticed that any files you carefully recorded on 5l/4" floppy disks a few years ago are now unreadable. Not only have those disk drives disappeared, but so have the programs, operating systems, and machines that wrote the files (WordStar in CP/M on a Kaypro?). Your files may be intact, but they are as unrecoverable as if they never existed. The same is true of Landsat satellite data from the 1960s and early 1970s on countless reels of now-unreadable magnetic tape. All of the early pioneer computer work at labs such as MIT Artificial Intelligence is similarly lost, no matter how carefully it was recorded at the time. The pioneer work of today is just as doomed, because the rate of digital obsolescence keeps accelerating, and the serious search for a long-term strategy for storage has yet to begin.

There is still nothing in the digital world like acid-free paper. Former University of California, Berkeley librarian Peter Layman points out, "When we know a book is important, we...tell a publisher: print it on acid-free paper. And with decent library air-conditioning it will last 500. years If you want to preserve something else, like a newspaper, microfilm it. We know there is a 500-year life to microfilm properly cared for. But what do we do with digital documents? What we do today is we refresh them every time there's a change in technology-or every 18 months, whichever comes first. This is an expensive approach! We need a digital equivalent to microfilm, a 500-year solution."

Losing our collective memory

Supercomputer designer Danny Hillis also put the problem in perspective at a conference on "Digital Continuity" held at the Getty Center in Los Angeles in February 1998. "Back when information was hard to copy" said Hillis, "people valued the copies and took care of them. Now, copies are so common as to be considered worthless, and very little attention is given to preserving them over the long term." He noted that thousands of years ago we recorded important matters on clay and stone that lasted thousands of years. Hundreds of years ago we used parchment that lasted hundreds of years.

As a result, Hillis suggests, we are now in a period that may be a maddening blank to future historians--a Dark Age--because nearly all of our art, science, news, and other records are being created and stored on media that we know can't outlast even our own lifetimes. We arrived at this situation partly because digitization otherwise offers so many profound benefits. We can now store, search, and cross-correlate literally everything. In fact, according to estimates by Bellcore's Michael Lesk, who calculated the total amount of data there is in the whole world, storage has now surpassed data, probably permanently. There is more room to store stuff than there is stuff to store. We need never again throw anything away. That particular role of archivists and curators has become obsolete.

A new history

If raw data can be kept accessible as well as stored, history will become a different discipline, closer to a science, because it can use marketers' data-mining techniques to detect patterns hidden in the data. You could fast-forward history, tease out correlated trends, zoom in on particular moments. Watershed events might be studied in the original--the actual force-feedback virtual-reality experiment that showed a new way to fold a protein that transformed medicine, plus the lab surveillance camera images of the event, as well as the phone calls, E-mail, and web searches that surrounded the discovery. Note, there are both passive and active digital records in that example.

The E-mail, phone calls, and photographs are passive; all you have to do is keep them readable. But the virtual-reality experiment is active--it was probably run on some experimental one-off piece of cobbled together lab equipment. Without that complex of then-current hardware, you can't replay the experiment. Preservation of such hardware-dependent digital experiences is nearly impossible. For instance, the elaborate virtual-reality model of Berlin that has been used for planning that city for years will almost certainly be lost, as will the U.S. Army's famous computer model of the pivotal tank battle in the Gulf War.

Storage vs. preservation

Digital storage is easy; digital preservation is not. Preservation means keeping the stored information cataloged, accessible, and usable on current media, which requires constant effort and expense. Furthermore, while contemporary information has economic value and pays its way, there is no business case for archives, so the creators or original collectors of digital information rarely have the incentive-- or skills, or continuity-to preserve their material. It's a task for long-lived nonprofit organizations such as libraries, universities, and government agencies, which may or may not have the mandate and funding to do the job. University of California, Berkeley, archivist Howard Besser points out that digital artifacts are increasingly complex to revive. For starters you've got the viewing problem--a book displays itself, but the contents of a CDROM are invisible until opened on something. Then there's the scrambling problem--the innumerable ways that files are compressed and, increasingly, encrypted. There are interrelationship problems--hypertext or web site links active in the original but now dead ends. And translation problems occur in the way different media behave--just as a photograph of a painting is not the same experience as the painting, looking through a screen is not the same as experiencing an immersion medium; watching a game is not the same as playing the game. For all these reasons, archivists now encourage tagging all digital artifacts with a rich supply of "metadata" --digital information about the artifact telling what it is and how it works. A number of professional organizations are working on setting consistent (and expandable) standards for metadata. Gradually a set of "best practices" is emerging for ensuring digital continuity: use the most common file formats, avoid compression where possible, keep a log of changes to a file, employ standard metadata, make multiple copies, and so forth.

And don't forget atomic backup--while the durability of bits is still moot, the atoms in ink on paper have great stability.

Net: haven or horror?

What about the net? Everything can be dumped there, everything can be retrieved there, and fairly universal standards such as TCP/IP emerge there. New talents emerge there as well. The net is responsible for the legions of "emulators" who keep finding new ways to revive old games such as "Pac Man" and "Frogger" for play on new computers. Vernacular archivists such as the emulators are one hopeful wave of the future. Massively distributed research like that can convene enormous power. Another example: thanks to the current interest in family genealogy, the thousands of users of a program called "Family Tree Maker" are linking their research into a "World Family Tree" on the web. So far it has tied together 75,000 family trees, a total of 50 million names. The goal, once unthinkable, is to eventually document and link every named human who ever lived. With the net, preservation goes fractal--infinitely branched instead of centralized. But that leaves the question: Is the net itself profoundly robust and immortal, or is it the most ephemeral digital artifact of all? At present the web has a "memory" of about two months, says web archivist Brewster Kahle.

What is the solution? We cannot reverse the digitization of everything. What we have to do is convert the design of software from brittle to resilient, from heedlessly headlong to responsible, and from time-corrupted to time-embracing. These are intractable problems. For certain, none of them can be solved in a year, but all of them can yield to decades of focused work, if the health of civilization is understood to be at stake.

Thinking in the long term

"The real problem" says computer designer Hillis, "is not technological. We have the technical understanding to solve problems such as digital degradation. What we don't have yet in our digital culture is the habit of long-term thinking that supports preservation .... In the early 2000s people will realize that we're not at the end of something-we're at the beginning. There really will be a year 3000 and 4000 and so on. Once that idea is more widely accepted, the engineers who are thinking about the next digital medium will naturally think about how it lasts .... "

Hillis is more of an optimist than I am. I think it will take insistent, knowledgeable, unremitting demand from librarians and archivists for long-lived digital media, or the engineers will never take the problem seriously enough. If that happens, then librarian Lyman's hope might be realized: "I'd say that what's motivating us is not just a fear of losing what we have, but of being able to build something new out of this digital rubble that we've created-to build something that's really quite amazing, that may be as much of a landmark on our civilization as the Library of Alexandria was in the ancient world."

This essay was first published in Library Journal vol. 124. Issue 2, p46-49. February, 1999 and is republished here for educational purposes.

Share on Facebook Share on Twitter

More from Digital Dark Age

What is the long now?

The Long Now Foundation is a nonprofit established in 01996 to foster long-term thinking. Our work encourages imagination at the timescale of civilization — the next and last 10,000 years — a timespan we call the long now.

Learn more

Join our newsletter for the latest in long-term thinking