Interview with a Bioinformatician: Dr. Ryan Mills, Ph.D.

Author: Aiman Faruqi

Institution: University of Michigan

Dr. Ryan Mills, Ph.D., is an Assistant Professor in the Department of Computational Medicine & Bioinformatics, and the Department of Human Genetics, at the University of Michigan Medical School. Dr. Mills earned his Bachelor’s degree in biology at Wabash College, his Ph.D. in bioinformatics at Georgia Tech, and was a postdoctoral fellow at both Emory University and Harvard Medical School. His research is focused on developing algorithms for identifying structural variation in human genomes and assessing their role in various disease phenotypes.

JYI: What is your educational background? What interested you as a high school/college student and what did you major in?

Dr. Mills: I received an A.B. in biology from Wabash College in Indiana, with minors in chemistry and computer science. Through high school and most of my college years, I was dead set on attending medical school and becoming a physician, so biology and chemistry were natural areas to focus on. However, I have also been very interested in computers; my uncle gave us an old TI-99 in the mid 1980s along with some books on BASIC and ever since then I have been fascinated with the logical structure of programs and what you can do with them.

Toward the end of my college studies, two professors who knew my interest in both genetics and computer science approached me and suggested that I take a look at this emerging field called bioinformatics that combined both. This seemed like a perfect fit for me, and I sought out graduate programs that aligned with my interests. I ended up joining the lab of Mark Borodovsky at Georgia Tech who was renowned for his work in gene prediction algorithms and applications and taught me much about modeling and genetic sequence analysis.

JYI: What is your current research about? What interested you in this avenue of inquiry during your graduate/post-graduate work?

Dr. Mills: It was during my second year as a Ph.D. student that the Human Genome Project released its first draft, and it was of course a very exciting event. I started to become interested in ways to identify and analyze genetic variation between individuals, and my first postdoctoral position was focused on looking at small insertions and deletions of genetic material. Afterward, I joined another laboratory that was studying copy number and structural variation, which consist of large insertions and deletions but also includes balanced rearrangements such as inversions and translocations. My current research has built from this and my research group is actively developing methods for identifying complex structural rearrangements whereby a portion of the genome may undergo two or more of these types of events simultaneously at the same position.

We are also interested in other types of genetic variation; for example, we just published a paper where we track down pieces of the mitochondrial genome that have been copied and inserted into chromosomal DNA and are differentially present between different people. We even found one individual who had an entire mitochondria genome inserted in their nuclear genome! We are exploring the potential functional consequences of these types of events, but it is still exciting to think that such variation between individual people still exists to be found.

JYI: What is bioinformatics? What advances in biology laid the foundation for the emergence of the field?

Dr. Mills: There are a number of textbook definitions for Bioinformatics, but in a broad sense it is the development and application of computational approaches to study biological systems, with a particular emphasis on modeling the information that is stored and processed within various organic molecules (e.g. DNA, RNA and protein sequences). In the early days of bioinformatics when DNA and protein sequencing were in their infancy, many of these data were actually compiled and published in print, as was the case with Margaret O. Dayhoff’s Atlas of Protein Sequence and Structure in 1965. The next few decades had tremendous advances in both sequencing technologies and computing hardware, which together enabled the collection and analysis of an ever-growing amount of biological data, and the now widespread usage of the Internet has allowed for the creation of databases and repositories to store and disseminate data.

JYI: What role did bioinformatics play in the sequencing of the first human genome? Would such a tremendous scientific feat have been possible without bioinformatics?

Dr. Mills: Bioinformatics was a crucial aspect of the Human Genome Project (HGP). When the project first began in 1990, the technology already existed for sequencing small sections of an individual genome at a time, though of course there were expectations for additional advancements in this area by project end. However, there were many analytical aspects of the project that still needed to be developed. For example, the Phred and Phrap algorithms, which are used for identifying and scoring individual nucleotides from automated DNA sequencers and then using this information to assemble these sequences into contiguous stretches of DNA, were developed during this time period and are still in widespread use today. With these and other bioinformatic software advances (e.g. BLAST, GigAssembler), the HGP would have ended up a 3 million-piece puzzle set where most of the pieces are almost the exact same color and shape.

JYI: What sub-fields of biology are bioinformaticians involved in (e.g., evolutionary biology, molecular and developmental biology, etc.)? What are some of the newer fields bioinformatics has begun to have larger influence in?

Dr. Mills: While genetics and genomics are probably the first thing that people think of when they hear ‘bioinformatics’, there are many other sub-fields of biology that make use of computational strategies to either conduct or support their research. Aspects of structural biology combine physics, biochemistry and bioinformatics to identify patterns in sequences that define their form. Evolutionary biology makes use of bioinformatics to analyze genetic variation within and between species to examine adaptation to various phenomena.

However, we should be a little careful, as there are distinct differences in fields that make use of computational or mathematical biology, which are more theory based, as compared to bioinformatics, which tends to be more focused on applied methodology and analysis. There are also branched fields of study, such as Biomedical Informatics, which includes aspects of bioinformatics that are more directly translational into clinical applications (e.g. automated analysis of fMRI images). More recently, bioinformatics is playing a large role in the emergence of the ‘data scientist’ who will be critical in extracting knowledge from ‘Big Data’ collections of biologically related information. This can include longitudinal data, such as metabolomic and gene expression data across multiple hospital visits, as well the incorporation of other meta-information such as eating habits, exercise, and other lifestyle choices.

JYI: What are some of the “hot-topic” questions within the larger field of biology that bioinformatics may help answer in the coming years?

Dr. Mills: It’s hard to pick just a few questions, given how ubiquitous bioinformatics is becoming in so many fields. I think we will see a lot more longitudinal type studies where researchers ask how a particular system changes over time using high throughput sequencing and other technologies. There have already been a number of early small-scale projects showcasing the benefits of such analysis (e.g. the Snyderome) and other initiatives such as the 4D Nucleome are beginning to emerge with ‘time’ as a key feature. There are also a number of ongoing projects to examine somatic mutations within individual cells and their impact on various diseases such as schizophrenia and bipolar disorder, with the idea that even as an individual we are still a mosaic of distinct cells, each with their own genome with its corresponding mutational and gene expression profiles. All of these types of large-scale endeavors will require skilled bioinformaticians to help give a biological perspective to the vast amounts of data generated.

JYI: How has evolving computing technology changed the field of bioinformatics? Have entirely new avenues of scientific inquiry been opened as a result of this?

Dr. Mills: There is no question that advances in high throughput computing have revolutionized the way we study biology. For example, instead of studying the effect of a single microbe in a system, we can now study the collective effect of all microbes including those that had never been seen before because we now have the computational power to sort through these types of heterogeneous data. Improved and cheaper data storage solutions have enabled the creation of large, multi-faceted computational platforms such as transMart that allow for greater sharing, integration and collaborate analysis across the greater biomedical community.

However, there are also many historical questions that we are beginning to answer that have their roots from scientists over the past century. For the first time, we are able to explore hypotheses made by Haldane in the 1930s regarding mutations and selection at the level of entire populations through large initiatives like the HapMap and 1000 Genomes Projects. Around the same time period, observations were made regarding metabolism changes in tumors (Warburg Effect), and improvements in computing and sequencing technology have provided the infrastructure for determining that this is due in part to mutations in proliferating cancer cells. So, as technology evolves we are not able to explore new areas of scientific discovery, but also revisit old questions.

JYI: What are some of the ways bioinformaticians collaborate with biologists in more “traditional” research settings (e.g., wet labs)?

Dr. Mills: This is something I actually have a lot of experience with, as my postdoctoral work was done as the only bioinformatician (initially) in two different “bench” laboratories, one in a Biochemistry department and the other in Pathology. It has been interesting to watch how these interactions have changed over the past decade or so; in the early days, there seemed to be some hesitation on the part of tradition experimental scientists who were perhaps not aware of what types or to what extent bioinformatics analysis could enhance their research. This has gradually changed, and now most bench scientists have at least a rudimentary knowledge of how to incorporate bioinformatics in their studies, even if they do not know the specific methodologies that are utilized. Still, these types of environments are why having the ‘bio’ in bioinformatics is so important as it enables proper communication between different members of the research team. In many cases, it is crucial for the bioinformatician to understand the biology and bench experiments that are being conducted in order to develop analytical hypotheses and suggest the proper computational approaches to be used.

JYI: What does a typical workday look like for you?

Dr. Mills: I will typically arrive at my office each morning around 9AM, and spend the first 30 minutes going through email that will consist of a mix of collaborative and laboratory updates, lecture questions from students, and departmental committee responsibilities depending on the time of year. I will then either spend some time in the lab talking with my students and postdoc about their progress on various projects or work on ongoing grant proposals and manuscript drafts. I will also sometimes start new projects myself that I think might make a good rotation or undergraduate research project for a new student but that may need to be matured to a certain point before handing it off. After lunch, there are typically either lectures or lab sessions that I am either leading or attending, followed by various departmental and student seminars. I try to make an effort to get home by 6PM every night so I can spend some time with my children, however I am often back online by 10PM checking email and submitting or checking jobs on our computer cluster. This varies from week to week, however; for example, right now is the time for graduate admissions so I am spending a lot of time assessing and scoring applications to our program. Every week is something new!

JYI: Do you teach classes on top of doing research?

Dr. Mills: The primary course I co-teach is BINF527, “Introduction to Bioinformatics”. This serves as our department’s primary survey course and typically consists of new graduate students in our program, students from other programs with an interest in bioinformatics, and also some senior undergraduates. It is a challenging course to teach because the students can come from very diverse backgrounds (biology, mathematics, engineering, computer science, etc.) with previous experience in some areas of bioinformatics but not others. We cover a broad area of both biological and computational concepts in order to provide a uniform foundation for the students as they choose electives in future years, and are constantly tweaking our lectures every year to meet the fast pace of this evolving field. Indeed, we are currently exploring how to modify our teaching style to best work with such a multidisciplinary class, with thoughts towards ‘flipping’ the classroom so that there will be more supervised hands-on computer time instead of just a traditional lecture. We are also developing an online Coursera version of the course that should be available in the next few months.

JYI: What kinds of students would you encourage to look into the field of bioinformatics as a career in terms of their academic and personal qualities?

Dr. Mills: In my experience, there are primarily two types of people that you will find in a research group: those that focus heavily on a single problem (depth), and those that are interested in multiple questions (breadth). Both types are essential for advancing science, but bioinformaticians tend to be overwhelmingly in the latter group since by definition they already need to have some varied expertise in biology, statistics, and computer science to be successful. Also, you need to really enjoy working with data; there is a subsection of a website (Reddit) called “DataIsBeautiful” that I subscribe to because I find it fascinating to see the different ways that you can summarize and display information, even if it typically has nothing to do with my own research. If you are the type of student who gets as excited when generating and analyzing data as the results itself, then bioinformatics may be the right career for you.

In terms of job opportunities, bioinformatics is still rapidly growing and with the continuous generation of data, it is likely not going to change anytime soon. About half of the graduates in our department stay in academia and the others find positions in industry, so there many options depending on what your interests are. In addition, many of the skills you learn as a bioinformatician are transferrable to areas other than biology.

Author: Aiman Faruqi

Stay up-to-date on news and publications: