We live in perilous times, and increasingly depend on digital technology to preserve our access to knowledge, as our human read/write memories inevitably fade. Yet the digital storage technologies on which datacentres rely have notorious weaknesses. Software environments and file formats quickly become obsolete. Hard drives fail and flash loses data; LTO tapes have a 30 year life span; even archive discs only last 50.
What if part of the answer is in rethinking basic building blocks of enterprise storage? Computer Weekly spoke to US non-profit the Arch Mission Foundation (AMF) about the ultra-long term storage in its Billion-Year Archive project.
Billed as civilisation’s backup plan, the initiative ultimately seeks to build an interplanetary cloud for apocalypse-level disaster recovery, with distributed data repositories on Earth, the moon, near-Earth asteroids, orbiting the sun, and other places in the solar system.
Nova Spivack, AMF co-founder (with Nick Slavin), says the vision is to ensure that human history, culture, science and technology is backed up and can be retrieved, whether by humans or another species, even in an extinction-level event.
“If you’re creating a digital preservation strategy, key principles include redundancy and multiple off-site backups. You cannot just print everything out, so I started looking at all the media typically used. None had a real shelf-life of longer than 10 years, or nobody will certify them for longer than 10 years, Even M-DISC – meant to last 1000 years – doesn’t, because the plastic housing the ceramic layer falls apart,” he says.
Spivack’s original idea was to put a backup of Wikipedia on the moon. Any Earth civilisation, he reasons, is going to look at the moon and eventually want to go there. The questions for years were how to do it, and what storage media to use. Anything on the moon must survive diurnal cycles of boiling to sub-zero temperatures as well as massive bursts of radiation: a USB key would be destroyed in a month.
Eventually, Spivack teamed up with Professor Peter Kazansky at the University of Southampton’s Optoelectronics Research Centre, who had invented 5D optical data storage or “Superman memory crystals”. This uses femtosecond laser writing on nanostructured quartz glass, delivering 360TB of capacity per data disc capable of withstanding 1000ºC and with a lifetime of 13.8 billion years at 190ºC.
First demonstrated using a 300kB text file in 2013, layers of voxel micro-storage can be written into such crystals, which can be read with an optical microscope and polariser, using machine learning algorithms that interpret polarised light as it is shone through the glass.
It’s 5D because information is encoded with the size and orientation as well as the position of the nanostructures in three dimensions. When the 2013 research was presented in San Jose that year, it was hailed as opening the door to “unlimited lifetime data storage”.
“But then, years ago, even writing a MB took like a month of work. We tested something smaller but symbolic: Isaac Asimov’s sci-fi classic, the Foundation trilogy, which is actually about people doing exactly what we’re trying to do, to preserve civilisation, helping the empire be more resilient to a future dark age,” says Spivack.
Today, writing to 5D optical can be done much faster – for example, using a new process tested by Microsoft. But at that time the AMF created just five copies of Foundation on the quartz, each worth about $1m. After many months of “maximum networking”, the team managed to get one copy sent to Mars in 2018 in the glovebox of Elon Musk’s cherry-red Tesla. Then the SpaceX mission veered off course, ending up orbiting the sun instead of crashing on Mars.
“By missing, they actually extended the shelf life of our first ‘Arch Library’ to 30-50 million years at least, before it hits anything,” says Spivack.
But AMF still wanted to send a copy of Wikipedia. After another couple of years, Spivack met former Kodak scientist Bruce Ha, who invented nanofiche – commercialised under the NanoArchival brand. Fine nickel foil is etched with text with a point size of 0.003; glass master files have a resolution of 300,000 dpi.
“It’s like microfiche but much smaller,” says Spivack. “Nickel is an element that doesn’t decay or oxidise and has a very high melting point:. You write physical characters at incredibly high speed into nickel. You first write it into glass, then grow the nickel onto the glass, atom by atom.”
This was still “very expensive”, and it costs another $1m per kg just to send the resulting payload into space. To bring the cost down further, AMF devised a multi-layered system with 20,000 pages or images on an analogue layer, at nanoscale. This can be read by anything that will deliver magnification of at least 200x, such as an 18th century microscope, but “ideally up to 1000x or even a little bit more”. The earliest files in the primer have been said to only require a magnifying glass.
Backing up everything?
The first four layers would be etched with 80,000 analogue images of carefully chosen information – a kind of picture-based primer to key concepts that would help someone with no prior knowledge of humanity build a computer for decoding and interpreting the next 21 layers of digital storage. Those digital layers comprise a 30 million page library, including Wikipedia, Project Gutenberg, and some 30,000 other books from private collections and academic libraries including codices to ancient civilisations, encyclopaedias, dictionaries to 5000 languages and more.
In 2000, IBM researchers including Raymond A Lorie proposed that a universal virtual computer could be developed to run any software in the future on a yet-unknown platform – so the idea is not at all outrageous.
“These are essentially DVD masters, thinner than a piece of paper, with each storing about 4.75GB, like a DVD. So it’s a little stack of 25 discs, only as thick as a single DVD and pure nickel,” says Spivack. “We don’t have every book ever published. But do we have every subject? 100% yes. We realised we have to communicate this to someone who may be a future human, or something that evolved on earth that’s not human, or an alien culture.”
Time went on, and when SpaceIL’s Beresheet lander crashed in the Sea of Serenity in April 2019 it had an Arch Library on it. After a lot of post-crash analysis, though, it was deduced that the Arch Library probably survived – although it might be in fragments, destroying the digital layers.
“In that case, the analogue data could be pieced together, like pottery from an archaeological dig. The first 80,000 pages would still be retrievable – a huge amount of knowledge,” he says.
“Best case is that there is a solid block of nickel, layers stuck together with epoxy resin, on the moon.”
Signature of life on Earth
Within the epoxy, the team included a few things that could be thought of as a “signature of life on Earth”, or even a traditional ship’s blessing – tiny bits of dust, sand and bone fragments from holy sites. Additionally, there was DNA from the hair follicles of 25 individuals of different sexes, ages, races and haploid groups, and scientists had donated a few tardigrades in suspended animation on a piece of paper which may or may not be retrievable, even revivable.
“We were never able to verify the tardigrades were there. But it caused a huge scandal: people had fantasies of herds of tardigrades eating all the cheese, then mutating and building a ray gun and destroying the Earth,” says Spivack.
The tardigrades might act as a beacon for attention in the future, sparking interest in seeking out the billion-year archives and the knowledge they contain, he suggests.
AMF is also working on an assortment of future space missions. With Hypergiant Galactic Systems, it’s helping launch AI-equipped satellites that will transmit data, using blockchain technology to solve “packet” synchronisation problems, to other space-based internet nodes including Earth itself. First proof of concept will be a transmission of an Arch Library. Another copy of Wikipedia is orbiting Earth on a SpaceChain miniature satellite, with another moon landing alongside Astrobotic scheduled to update the lunar library in 2021 or so.
“We will be looking at ways to encode knowledge into DNA – it’s nature’s digital so it’s more likely that future life will be able to understand it, rather than some file format from Microsoft,” says Spivack. “That might be easier than building a computer, and a computer industry, and a semiconductor industry. We’re looking at ways to preserve the data and maybe enable read/write access – so you know, maybe we do put a server on the moon, and then another one on Mars.”
Lessons for datacentres
Craig Lodzinski, chief technologist at solution provider Softcat, agrees that Billion-Year Archive technologies might help rethink the enterprise datacentre, which has been around for less time at this point than many media formats used in them.
“As data volumes continue to grow, this will be of increased importance, and enable new services for storing data that is not essential, but could be useful as circumstances change. An example of this could be medical data, with a large regulatory retention requirement. And a vast trove of data not presently regarded as useful could in future could prove vital as medical science advances.”
5D crystal, he points out, addresses three issues: durability, capacity and compatibility. Etched silica glass at scale might offer low-cost, low-maintenance “write and forget” archival storage. With advancements in AI, quantum computing and more, a cost-effective capability to store giant datasets for long periods might open up new possibilities in data-intensive applications and their outcomes.
Lodzinski notes that Cisco predicts video will soon account for more than 80% of web traffic. One 4k camera streaming video for a year will generate more than 23TB of data – costing $280 a year to store, even with services like AWS Glacier Deep Archive. “If technologies like 5D crystal can drop this cost dramatically, we may enter an era where we retain as much data as possible in case it becomes useful in the future.”
But what about nanofiche? Could it be worth taking another look at analogue file formats for long-term storage in the context of the future datacentre?
“Nanofiche gives me a weird nostalgia kick – as a child I was fascinated by the microfiche reader at my dad’s work,” says Lodzinski. “And one of our more durable long-term storage options even in 2020 is punched paper tape; we mustn’t be distracted by the new. A physical, analogue medium that only requires magnification to read offers outstanding longevity.”
The potential for something like nanofiche as a high-density Rosetta Stone guiding the use of other long-term technologies holds promise, he suggests. Most technologies require a degree of base understanding, especially since many stem from an amalgam of numerous fields. Removing layers of complexity and abstraction between the user and the data can enhance understanding and extend the useful half-life of stored data.
“For much of our stored data presently, the application or protocol layer is the key to unlocking the understanding and value of the data. By exploring the combination of analogue and digital formatting, new options are available to build services with different architectures,” says Lodzinski.
Long-term cold storage remains a challenge across the entire datacentre space, especially as data volumes continue to grow. Computing technology and datacentres consume large amounts of power, relying on components that are resource-intensive, often using rare earth minerals and other scarce or hard-to-extract materials. As global society lurches from one crisis to another, technologies that allow us to do more with data and computing while creating less waste and using fewer finite resources should be prioritised, he says.
“Computing is still a very new field focused on the newest, latest, and agile iteration. An alternative perspective on how we plan technology past the normal three to five year cycles opens up new avenues. Looking at how we make technology comprehensible over the very long term can change how we think about design and user experience. Thinking about data retention and durability in decades and centuries changes how we think about data classification, about the dependencies we place upon data, and our deployments.”
However, while building an interplanetary cloud is certainly useful if building a deep archive of last resort, there is a long way for such projects to go before there could be useful enterprise applications – even though the AMF has travelled some way towards working out how the information stored therein might be retrieved by anybody in future, whether this ever works remains something of a gamble.
“Ensuring that an interplanetary cloud can be used and understood centuries into the future, and potentially after a significant critical event, is a unique and significant challenge. I am confident that with the right investment and application, long term vaults of knowledge in interplanetary clouds can be established. What is far harder is ensuring that when that data is needed or simply discovered, that it is clear what it is, why it exists, and how to use it,” says Lodzinski.