Illustration by Lauren Nicholas

UF’s biology department is leading the charge in collecting specimens online

When Charlotte C. Germain-Aubrey, a researcher at the Florida Museum of Natural History, decided to study where plants will migrate as a result of climate change, she knew she would need as much data as possible to build a powerful model.

For most ecologists, “as much data as possible” is usually in the order of hundreds or possibly thousands. But as UF spearheads the effort to digitize and integrate the nation’s biological specimens, Germain-Aubrey soon became able to work with data in the hundreds of thousands.

Only a couple of years ago, organizing that much data would have been a person’s life’s work, with years of ‘round-the-clock labor devoted to inputting every specimen into a computer. But Germain-Aubrey’s study will only take a couple of years. And as one of the first to tap into the potential of the 22-million-and-growing specimens cataloged through the digitization project, called iDigBio, future studies will be simpler to conduct and have more powerful results.

Four years ago, UF received a grant from the National Science Foundation to set up a central, online hub where all digitized data from around the country could be stored and accessed, said Pam Soltis, director for research activities for iDigBio. Biologists have been digitizing their specimens for at least a decade, she said, but before iDigBio, finding and acquiring a specific specimen was nearly as time consuming as the process would take without a digital database. A researcher would have to call an institution, ask for the index and make sure he or she received it. With iDigBio, a scientist can pull up, for example, every bison record in America in 10 seconds.

The two largest university collections — UF’s and Harvard’s — have a combined 110 million specimens, which are gradually being added to iDigBio’s online repository. The Smithsonian’s collection of about 126 million specimens will be added. And hundreds of other institutions around the country will also contribute.

Physicists have long been using technology to collaborate on massive scales; for example, the Large Hadron Collider, which has allowed scientists to study the most basic parts of the universe. Then again, it’s easier to quantify tiny particles of matter than complex, organic life.

But now that technology has finally caught up, biology is pulling itself out of the era of drawn diagrams and filing cards.

Digitizing all of this work has not been a trivial task. Even the simplest specimens, perhaps a 100-year-old plant, must be pressed flat to a page and scanned using technology that can run into the hundreds of thousands of dollars. So how do you digitize a more complex specimen? A fish preserved in formaldehyde, or the pelt of an extinct species? How do you boil down the essence of a specimen?

Researchers are currently pioneering this work. In 2010 a Chilean construction crew widening a highway came upon a series of 3-million to 5 million-year-old whale fossils. If Cerro Ballenaor Whale Hill,as it would come to be known — had been discovered a few years earlier, the researchers would have only been able to preserve the fossils themselves, but not vital information like the bones’ exact orientation in the ground. But luckily, researchers had the technology, and the entire fossil site was recorded using massive quantities of high quality pictures, which were compiled to make a 3-D rendering. All this went online. And with iDigBio, you can access the data file and print it out in UF’s 3-D printing lab.

Now that this firehose of data has been opened, a new challenge has appeared. How do you work with such enormous mountains of information?

Soltis said that because they’re getting information from so many places, most of the work involves getting around the variety of data. Researchers have to figure out what to do with everything from a map to genetic sequences to phylogenetic trees.

“They all require different software and database resources,” Soltis said. “That disparity, that heterogeneity, can really be intimidating to most people.”

Because of this, Soltis, along with disciplinary and multidisciplinary teams, is developing software so that biology students can quickly integrate the myriad types of data.

With all this easily accessible information at hand, researchers can now conduct ambitious studies that no one thought possible just a few years ago. And the findings are essential to understanding our quickly changing world. Look at Germain-Aubrey: Her study on how climate change will affect Florida plant distribution had significant results. She found that by 2050, the state’s ecological landscape will dramatically change. Knowing this, she said, we can prepare for the future.

Behind museums’ mysterious personnel-only doors lie miles of hallways filled with specimens that have been collected over millenia. Darwin’s Finches, the birds that led him to the Theory of Evolution, are stored at the Natural History Museum in London. Lucy, our famous 3.2-million-year-old ancestor, just received a high-quality CT scan by the University of Texas.

And soon, all this information will be a 10-second download away.