27 Apr 2018
DNA storage

After six decades, present data storage technology may become obsolete. Humanity created more data in the past 2 years than in all of preceding history. As researchers try to overcome data “flood” they’ve come up with a new way to store digital data more efficiently and for a longer period of time. One such way is to encode digital data into DNA or to create the highest-density large-scale data storage scheme ever discovered. In theory, DNA is capable to store over 200 petabytes (or 200 million gigabytes) in a single gram, and it can last hundreds of thousands of years if kept in a cool, dry place.

Scientists have been storing digital data in DNA since 2012. That was when Harvard University geneticists George Church, Sri Kosuri, and colleagues encoded a 52,000-word book in thousands of snippets of DNA, using DNA’s four-letter alphabet of A, G, T, and C to encode the 0s and 1s of the digitized file. While their particular encoding scheme was somewhat inefficient, at just over 1 petabytes per gram of DNA, others have done much better. Unlike other high-density approaches, such as manipulating individual atoms on a surface, the latest DNA data storage technologies can write and read large amounts of DNA at a time, allowing it to be scaled up.

DNA storage

The latest DNA-based data storage converts files into binary strings of 0s and 1s, compressing them into one master file, and then splits the data into short strings of binary code. They formulated new algorithm and named it the DNA fountain. DNA fountain randomly package the strings into “droplets”, to which they added extra tags to help reassemble them in the proper order later. Using this approach, they generated a digital list of 72,000 DNA strands, each 200 bases long in one master and several other files!
DNA data storage technology is in its infancy and encoding data to DNA strands can be done only in specialized laboratories. One such lab is owned and operated by Twist Bioscience, a San Francisco startup, which then synthesized the DNA strands. To decode these synthesized DNA strands, one needs to utilize modern DNA sequencing technology and finally compute it back to original files using tags and binary data. The most fascinating thing is that after being decoded new files contained no errors whatsoever! Another great aspect is the fact that standard DNA copying techniques, such as polymerase chain reaction, can create almost unlimited number of copies or 1.6 bits of data per nucleotide.

Comparable to the 1950’s cost to store data on a hard drive, today the DNA technology cost $10,000 to synthesize and read about 2 megabytes of data. Also, compared with early hard drives, writing and reading to DNA is relatively slow and it is likely to be used first as some sort of archival application.

Until then, backup your data and keep your hard drives safe. Just in case, if you need data recovered from any type of data storage device Data Analyzers will be happy to assist. Including data stored within synthetic DNA!