Mining the Data Boom

The role data science plays in new medicine development.

DNA becomes data in genetic research

This is the second article in a three-part series about data scientists.

Over the past decade, data science has transformed biomedical research. By sifting through mountains of information from medical records to DNA sequences, researchers are gaining a deeper understanding of biology, finding ways to target treatments more precisely, and speeding up medicine development.

At CSL, Milica Ng has led this transformation. In 2013, she was an intern with a fresh Ph.D., recruited to update the organization’s data storage and analysis systems. She quickly built up CSL’s capacity for fast and accurate genetic sequencing and analysis, and today she’s the head of a growing 11-member data science unit.

“We have mathematicians, statisticians and software people, and we work closely with biological scientists,” Ng said. “Bringing it all together is more like engineering, solving problems.”

Milica Ng CSL data scientistNg brought with her a unique set of skills in computing, business and bioinformatics. Leaving her native Yugoslavia during the civil war of the 1990s, she immigrated to Adelaide in South Australia with her family where she studied computer engineering. Then she had a high-paced job as a management consultant, travelling from place to place to help companies solve their computer problems.

When she had children, she decided to cut back on travel and instead started a Ph.D. in bioinformatics. Her project was to analyze the metabolism of the parasites that cause the common tropical disease Leishmaniasis, in hope of finding new medicine targets. She used novel algorithms to track carbon as it moves through the bodies of the tiny creatures.

Ng and her colleagues deal with formidable amounts of data. A high-speed RNA sequencing machine might produce a terabyte of information with each experiment, enough to fill the hard drive of an average desktop computer. A technique for imaging proteins, cryogenic electron microscopy, produces a thousand times as much data again.

Get the latest stories from Vita by signing up for our newsletter.

Even transmitting and storing these amounts of data is a significant task, and CSL has invested significant resources in improving its data infrastructure.

It’s worth the effort, Ng says, because the applications are almost limitless.

“There is systems biology – that’s understanding how whole biological systems work. And if you can do that, you can understand exactly what is going wrong in a disease and just treat that precise issue.”

Autoimmune diseases like rheumatoid arthritis and multiple sclerosis are one area where bioinformatics might help. These diseases have different types that can be hard to tell apart but require different treatments.

By comparing the DNA of people with different types of the disease, researchers may be able to understand the differences and quickly determine the best treatment for each individual.

Similar analysis may also deliver targeted treatments for cancer, lupus and other diseases.

“We want to know in advance who will benefit from a medicine,” Ng says. “And the next step is to move to gene therapy – instead of just treating symptoms you can get to the cause. We are doing that already with diseases that involve a single gene, but more complex conditions will follow.”

Data science is deployed all along the drug production pipeline.

“You can’t do medicine development without data science these days. There is a tsunami of data to manage and analyze. It makes discovery faster, but even after discovery you need more data for approvals.”