by Ben Novak
On October 8th, 2013, Ryan Phelan, Stewart Brand, and I were graciously allowed to view a historic moment at Genentech Hall of the University of California San Francisco‘s Mission Bay campus. We were in the sequencing facility courtesy of Eric Chow, Jo Derisi, and Jessica Lund, who manage the sequencing facility and conduct fascinating research in the lab.
Our passenger pigeon DNA is in their hands now, as our genome candidate “Passenger Pigeon 1871″ (aka ROM 22.214.171.124, aka BN1-1), begins extensive DNA sequencing. The “1871” is for the year that the specimen was shot and then preserved.
The lab is awesome: machines upon machines for all manner of scientific work. 3-D printers are busy creating custom parts for specialized microscopes. Robots for DNA extraction stand ready for a task alongside incubators and refrigerators and devices that look like they are made of dozens of microtubes carrying reagents for all kinds of enzymatic reactions. And of course there are the big guns — the DNA sequencing machines.
“Passenger Pigeon 1871″ was selected as the candidate for the full genome sequence for its superb quality compared to other passenger pigeon specimens. Over the last two years Dr. Shapiro, myself and colleagues have scrutinized the quality of 77 specimens including bones and tissues. Our first glimpses of data confirmed that the samples would be able to provide the DNA needed for a full genome sequence, but as we delved into the work, the specimens exceeded our expectations. Not only do we have one specimen of high enough quality for a full genome, we have more than 20 specimens to perform population biology research with bits of DNA from all over the genome.
We were coming at last to the final stages for sequencing and assembling the passenger pigeon genome. It is no short or easy task. This genome is 1.3 billion base pairs of genetic code. They have to be pieced together from the fragmented bits of DNA left in museum skins, almost all of the fragments less than 150 base pairs in length. It will take more than 80 million fragments of DNA to be able to put the whole genome together in a way that overcomes misinterpreting damage to the DNA as genuine mutations. Next Generation Sequencing is the technology that gives us that amount of data.
The Illumina HiSeq 2500 is an impressive machine. It is capable of producing 600 billion base pairs of DNA sequence in one run. The DNA is sequenced using flow cells that look like glass microscope slides. The slides are two panes of glass separated into “lanes” which are channels for chemicals to flow through. The DNA to be sequenced is injected into the lanes and binds to the glass walls. A series of reactions allows cameras to image the DNA being sequenced, and thanks to modern computing power, the imaging can manage hundreds of millions of individual DNA fragments. The HiSeq pumps out DNA reads in mass quantity and does it fast.
As Jessica checks the machine‘s reagents and loads the flow cell carrying our “Passenger Pigeon 1871″ DNA, Eric informs us that the data will be finished generating and processing in less than 48 hours. A typical run would take nearly 10 days, which is still incredibly fast to sequence billions of base pairs of DNA. Our run is so much faster because we are sequencing much less DNA than in usual runs.
After 48 hours almost 25 billion base pairs of DNA will be sequenced from the DNA extract of our bird. Some of that DNA will be contaminants — from bacteria that were living on dust in the museum drawer or from the human DNA of museum curators over the past century. But if our calculations are within the ball park, we should get over 100 million DNA fragments from “Passenger Pigeon 1871,” more than enough to sequence and assemble the whole genome — nuclear DNA and mitochondrial DNA.
In just two days we will have billions of base pairs of DNA from the extinct passenger pigeon. It’s still hard to comprehend, because just ten years ago the first DNA sequences of this bird amounted to what was, at the time, an impressive 1,448 base pairs of DNA. When we started our work the total DNA sequenced from this bird was 1,892 base pairs of DNA painstakingly sequenced from hundreds of PCRs. And now in a matter of six months we have increased that count 50 million fold.
The power of having an entire genome to research is as limitless as the future of the science. The code of the genes can be studied for paleogenomic analysis of the bird’s evolutionary history. Some of the genes can be synthesized in bacteria to test the chemical reactivity of enzymes and proteins unique to passenger pigeons. The mutation diversity in the genome can be used to calculate how the entire species’ population levels fluctuated over the past 100,000 years. With that population curve we can analyze how passenger pigeons reacted to climate changes, changes in forest composition, and perhaps even how the birds reacted to the first arrival of humans in North America thousands of years ago.
But of course the big goal for us is to understand the genes and the regions of DNA that evolved to make a passenger pigeon the bird that it is, and to begin recreating those elements in a living pigeon genome, the band-tailed pigeon.
In these first stages of the grand process of de-extinction, “finishing” the genome sequence (taking the raw code and making sense of it into an assembled genome) is now the major challenge. That’s the focus of our work for the coming months.