The concept behind storing trinary data in DNA (via Harvard)
A long DNA strand, weighing a gram and made up of 4 bases, adenine, thymine, cytosine and guanine, has the ability to store roughly the same amount of information as 100 billion DVDs. Scientists are already aware of this. Some researchers from the Wyss Institute at Harvard University encoded a 53,400-word book amounting to 5.27 million bits into DNA. They did this by using a binary method that assigned a “0” to the A or C bases and a “1” to the G or T bases. The downside to DNA storage is that the technology to encode and decode DNA is expensive and time consuming. Until now, no research has been done into developing a scalable and cost effective method. But scientists from the European Bioinformatics Institute in the UK have developed a method, which they say, is very scalable and has the potential to be cost effective in the next 10 years. The algorithm employed by the scientists during encoding includes strategies for error correction.
Researchers Nick Goldman, Paul Bertone, Christopher Dessimoz, Botond Sipos and Ewan Birney co-authored a paper published in the journal Nature, where they describe this DNA coding method. The team reveals they used a ternary encoding method, which means they used 0, 1 and 2. But instead of assigning a ternary number to a base, they assign numbers based on the most recently used base. This makes sure that there are no long runs of a single base, which is the number one cause of error in DNA decoding.
Researchers encoded their files into thousands of individual sections of DNA, each 117 bases long. Over 100 bases encode part of the desired file’s information and the other 17 bases index where each fragment belongs in the complete file. This encoding method includes “parity bits” used in regular programming. Another safeguard against error was made by overlapping 25 bases of each section to three other sections so they can be checked against one another.
Using this method, the UK scientists encoded 154 Shakespeare sonnets, an MP3 containing 26 seconds of Martin Luther King’s “I Have a Dream” speech, the paper written by James Watson and Francis Crick describing the double-helix structure of DNA and even pictures of the Bioinformatics Lab. Their ternary, error correction method allowed for this data to be decoded at 100% accuracy, a feat never achieved before.
To read back the information, many copies of each section are made, a DNA sequencing machine reads them back and the encoded indexing shows where to place each fragment.
When the binary method was employed by researchers at Harvard, they managed to accumulate 798 TB (10^12) per gram. The UK team was able to pack 2.2 Petabytes (10^15) per gram, setting a record for encoding 739.3 kilobytes of completely original information into DNA. Considering a single strand of human DNA contains all the information needed to make one of us, this order of magnitude of storage is not surprising. But, it is exciting to see we are capable of using it after decades of DNA research.
The drawbacks to DNA memory are speed and cost. It took the scientists two weeks to decode their files using their current machines but they say this could be shortened to one day using more advanced equipment and more DNA sequencing machines. The team’s method cost around $12,400 per megabyte, which is incredibly more expensive than current memory storage. Sounds like something harking back to the early days of data storage.
The team claims that if trends of technological advancement continue in the field of DNA encoding and decoding, their method will be cost effective in 10 years for archives desired to last 50 years or more. Storing DNA properly is very simple, as it only requires a dark, dry environment at a cool temperature. In this state, DNA can last thousands of years and furthermore, it can most likely be read by any advanced life form. Now, science fiction writers, scramble to create a story around this concept.