A pair of researchers from Columbia University and the New York Genome Center (NYGC) have found a way to code information using nature’s storage system: DNA. Yaniv Erlich and Dina Zielinski: the duo that worked on the DNA data storage technology. (image via New York Genome Center)
Deoxyribonucleic Acid, or DNA, is the material that composes all humans and almost every other living organism. It contains the instructions for how we are to be assembled and maintained, and is coded using four chemical bases: Adenine (A), Thymine (T), Cytosine (C), and Guanine (G); A pairs with T and C pairs with G. These chemical base pairings are also connected to a phosphate molecule and a sugar molecule, which form what is called a nucleotide. DNA is in the form of a double helix, which looks somewhat like a ladder, where the chemical base pairings form the rungs, and the phosphate and sugar molecules form the strands that hold the rungs in place. This natural information storage technology has been adapted for other information storage purposes and has so far been used to encode a $50 Amazon gift card, a Pioneer plaque, an 1895 French film, a computer virus, a 1948 study by information theorist Claude Shannon, and a full operating system.
The data from these various files were split into strings of binary code (zeros and ones), and using what is called an “erasure-correcting algorithm,” which are also called “fountain codes,” the strings were randomly packaged into “droplets,” which are then encoded using the four nucleotide bases in DNA. Although the binary storage of DNA is theoretically limited to two binary digits per nucleotide, and practically limited to 1.8 digits per nucleotide, Erlich and Zielinski package an average of 1.6 digits per nucleotide, which is still 60% more than any previously published method. The algorithm excluded letter combinations that were known to cause errors and supplied a barcode for every droplet in order to help reassemble the files later using DNA sequencing technology.
What’s more is that this form of coding, storage, and retrieval is extremely reliable. In total, 72,000 DNA strands, each 200 bases long, were generated and sent as a text file to Twist Bioscience, a San Francisco DNA-synthesis startup. Twist specializes in transforming digital data into biological data, and after two weeks, Erlich and Zielinski received a vial with the freshly-coded DNA molecules, and ultimately the files were recovered without a single error. This technology is incredibly important not only because of its compact nature but also because of its ease of replicability and resistance to degradation. Unfortunately, it is an expensive process, and therefore might not replace current data storage methods just yet, but it is definitely a promising leap in information storage technology.
Have a story tip? Message me at: cabe(at)element14(dot)com