Wednesday, January 30, 2013

DNA disk

        Animals use urine to mark territory. By using urine, they broadcast the news: I am the boss of this place! Other animals read the information that the place has been owned by someone else from urine. Then the newcomers may choose to leave or stay to fight for the territory. In this case, urine was used to store information. Urine is so cheap that it is practical for animals to claim territory. However, urine cannot hold too much information. For example, it is hard for newcomers to tell how strong the settlers are so that they can decide if they might stay to fight. Urine cannot last too long either so animals have to pee frequently to make a territory claim. To sum up, urine is practical to store simple information for a short while, that can be read by other animals. From this example, we can learn a lesson about how humans store information. When we store information, we usually consider four factors: practicality (how easy the information can be produced), capacity (how much information can be accommodated), maintenance (how long information can be kept), and readability (how easy information can be read).
        Books written in papers are first milestone of information storage. Books are practical because papers are cheap and easy to handle. Books can last many years. People with normal intellect can learn how to read and write without too much difficulty. With the emergency of silicon chips, humans achieved tremendous progression in information storage. Simple symbols, namely 0s and 1s, were used to represent everything. Silicon chips beat books in every aspects: higher capacity, easier to maintain, and easier to read. Smaller hard drives with higher capacity are created continuously.
         However, we still face unmet need for information storage. More and more information was produced especially in biology field. For example, genome sequencing has produced and is producing a vast volume of information. How to store these massive amounts of information poses a huge challenge to humans.
        What should be the next-generation carrier of information deluge after books and silicon chips? DNA may be the qualified candidate since it has been used to store information by nature for millions of years. Scientists figured out how to use DNA to store information.
        Nick Goldman designed a method to store information in DNA. They tried to store in DNA Shakespeare’s sonnets (ASCII text), Waston and Crick’s paper about the identification of DNA double helix (PDF), a colorful photo of European Bioinformatics Institutes (JPEG 2000), Martin Luther King’s “I have a dream” speech (MP3), and Huffman code. These documents were encoded into binary text, namely 0s and 1s. Subsequently, these binary texts were translated into base 3 encoded files (0s, 1s and 2s) that correspond to long DNA sequences. Long DNA sequence is supposed to store information but it is hard to read by sequencing. Thus, short DNA sequences with overlapping segments were designed to represent long DNA.  Moreover, indexing information were added to short DNA sequences so that it is possible to find which DNA corresponds to which document. Overall, the five files listed above were represented by a total of 153,335 strings of DNA, each comprising 117 nucleotides (nt). The DNA was synthesized and lyophilized for shipment in ambient temperature from USA to Germany via UK. In Germany, DNA was resuspended, amplified, purified and sequenced. Then full-length DNA sequence corresponding original files was reconstructed based on sequencing results. For these five files, it turned out to be a 100% accuracy of reconstruction.  Martin Luther King's speech is still clear and heart-stirring. 
        Needless to say that DNA can hold much more information and can be maintained more easily than magnetic tapes. But Is DNA storage practical? Currently the cost of DNA storage is about $12400/MB for information storage and $220/MB for information decoding. This cost is much higher than magnetic tapes. However, information stored in magnetic tapes needs to be copied frequently in case they will not be extracted because magnetic tapes cannot last very long. By contrast, synthesized DNA can last thousands of years under normal maintenance. DNA encoding will also be cheaper if the current trend of DNA manipulating continues. In less than 50 years, DNA storage will be practical. The information writing and reading into DNA is not competitive with current technology but can be accelerated. In summary, DNA storage holds a big promise for massive amounts of information storage.
        It is possible to connect a DNA synthesizing machine and a DNA sequencing machine to a computer. Then we will be able to manipulate information in a DNA based format.  
[1] Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature (2013) doi:10.1038/nature11875 Received 15 May 2012 Accepted 12 December 2012 Published online 23 January 2013

No comments:

Post a Comment