Got big data?

What can one researcher do with a terabyte of raw data? Not much. But an Excel spreadsheet of data? That’s much more manageable.

The Bioinformatics Division of the Iowa Institute of Human Genetics (IIHG) helps researchers across the university analyze, reduce, and focus large data sets to further their research. “We take a terabyte of data down to an Excel spreadsheet,” Bioinformatics Director Tom Bair, PhD, explains.

Research Assistants Melissa Kurtz and Yani Chen in Mary Wilson's lab

Research Assistants Yani Chen and Melissa Kurtz in Mary Wilson’s lab

They provide the expertise and equipment to carry out analysis of complex data sets, including training researchers who want to do their own analyses and providing the software needed to do so.

Mary Wilson, MD, is a long-time user of the Bioinformatics Division for her research on leishmaniasis, a tropical parasitic infectious disease transmitted by sand flies. She hopes to learn how to prevent and better treat the disease by studying the pathogenesis of the disease, including:

  • immune responses of human and mouse models
  • genetic differences between parasite strains causing different disease forms
  • the influence of bacteria on the course of leishmaniasis

“The most valuable thing they’ve provided for us is teaching myself and members of my research group which programs to run and how to actually run the programs ourselves. You really get the best understanding [of the data] if you do some of the analysis yourself,” Wilson explains. “It’s also been very helpful to work with them through the entire process, including study design.”

More information

Partnering with other researchers from Brazil and India, her research team works closely with the Bioinformatics Division for their data analyses. They gather samples and information from parasites and infected humans or dogs around the world, including information about people with leishmaniasis and their families, DNA from parasite isolates, patients and their families, RNA from human tissues and parasites, and samples from infected sand flies.

Because of the limited access to samples, the Bioinformatics Division helps with the use of various platforms when samples are suboptimal, selection and use of tools for comparison of genomes, gene expression data and microbiomes, and implementation of these programs.

Services and increased computing power

The Bioinformatics Division has recently increased their personnel to handle more projects and therefore assist more researchers. Their updated website now has information on services, software, training resources, and fees. Initial consultation is free with more complex analyses available for a fee.

A few of the services include:

  • RNA-Seq (measuring differences in RNA expression)
  • ChIP-Seq (looking for where a particular protein is binding in the genome of an organism)
  • Methyl-Seq (looking for altered methylation patterns responsible for regulation)
  • Exome re-sequencing
  • Targeted re-sequencing
  • Microarray analysis
  • Pathway analysis
  • Other custom analyses
Bioinformatics Short Course

A Bioinformatics short course will be held May 18 to 20

The Division often uses Galaxy, a web-based tool that analyzes and reduces large volumes of data in an easy-to-perform fashion. It’s a framework on which other software can run and can be customized as needed, saving a researcher potentially dozens of time-consuming steps.

For larger jobs, they take advantage of the Amazon computing cloud.

“We meet the investigators in advance to find out what the experiment is trying to accomplish and then suggest the best way to do that,” Bair says. Galaxy and the storage resources allow the investigator to run their workflows on a very high performance computing cluster and be in control of the experiment without becoming experts in computing large quantities of data.

Even after the initial analysis is done, a thousand-line spreadsheet can still be an intimidating amount of data. The division provides tools like IPA, which pinpoints the data even further, to look at specific gene pathways and understand the core pathways involved in an experiment.

“A lot of really cool experiments can be done with next generation sequencing,” Bair says. “We’ve gone from about 96 reads per experiment 10 years ago to hundreds of millions of reads per experiment now.”

Empowering researchers

The Bioinformatics Division aims to empower researchers who want to do their own data analyses. The division:

  1. Provides software for data analyses and reduction, which could otherwise be very expensive
  2. Provides training to teach researchers how to do the analyses without needing their own bioinformatician
  3. Offers a Bioinformatics short course each summer, this year May 18 to 20, which includes hands-on sessions and speakers

Engaging students

Genetics Career Day

A group of students learn about the compute cluster during Genetics Career Day

One mission of the IIHG is introducing and inspiring students to careers in bioinformatics, an expanding field open to opportunity. Highly trained professionals are in great demand, and those with an interest in both biology and computers often fit the bill well. People from the departments of math, business, computer science, biology, nursing, engineering, and others have shown an interest in the career path.

The division offers a number of educational opportunities throughout the year, including a career event, summer course, workshops, user groups, and internship opportunities.