Getting Into The Nuts and Bolts of Bioinformatics and Vaccine Selection

Getting Into The Nuts and Bolts of Bioinformatics and Vaccine Selection

Dr. Yaochu Jin

By analyzing large amounts of data, researchers are learning to solve new and important problems in health and medicine. We’ve gained the ability to predict outbreaks of disease, to determine how genes will lead to disease, and to find the right vaccines to address diseases. The mainstream public is getting increasingly familiar with this concept of “big data.” Engineers are increasingly jumping into this field. The problem is, how do they start? How do they know what they should do with the data? These low-level questions are important for the new era of big data engineers. Dr. Yaochu Jin — Professor of Computational Intelligence at the University of Surrey, Head of the Nature Inspired Computing and Engineering (NICE) group, and Finland distinguished professor — is a scientist who specializes in the field of bioinformatics and can answer these questions. This week, Dr. Jin will get into the nuts and bolts of bioinformatics and big data. He’ll give us some detailed insights into how people can use bioinformatics to improve vaccine selection and other health outcomes.

What is your background and how did you choose this specialization?

I majored in automatic control in undergraduate study as industrial automation appeared fascinating to me at the time. During my postgraduate studies, my research topic was intelligent control of complex systems. I was so attracted by the rapid developments in the research field of artificial and computational intelligence that I decided to pursue a second PhD in computer science, focusing on evolutionary computation and learning systems. My current main research interests lie in understanding natural intelligence such as evolution, learning and development, and nature inspired designing computer algorithms inspired to solve real-world problems.

How can we use computing and new technologies to study and predict health?

This is a very big question, so I will just name a few examples I know. In general, techniques in computer and engineering science are creating increasing impact on healthcare, which is now known as health informatics and engineering for healthcare. For example, researchers in my group have been developing bio-inspired algorithms for effectively filtering normal retinal images for diabetic retinopathy screening. We have also been working on developing software tools for healthcare that can collect and analyze large amounts of data from past patients, which can then be re-used by medical doctors to make better decisions in dealing with new patients. Computer algorithms have been developed by one of my postdoctoral research associates for optimizing hospital resource allocation based on history data. The final example I would like mention is our on-going collaboration work with the Sleep Research Centre of University of Surrey on identifying biomarkers (genes or gene groups) that are relevant to the regulation of sleep-wake cycles. This would be of great importance for understanding biochemical principlesfor sleep disorders and other sleep related diseases.

You are involved in a lot of projects; please tell us a little about your current work in vaccine selection and prediction.

Yes, my current research is concerned with a number of applications, ranging from engineering design such as aircraft wing design, to self-organized warm robots. The current project onvaccine prediction and selection is a collaboration between me and my PhD student, Tameera Rahman and Dr Emma Laing in the Biological Department of University of Surrey, as well as Dr Mana Mahapatra from the Pirbright Institute (formerly known as Institute for Animal Health). The main approach so far relies on lab-based virus neutralising tests, which is very costly and unfriendly to animals (those after performing virus tests have to be killed), in particular when the virus in question has a strong variability in the immunogenically important structural proteins. Our project aims to develop an accurate in silico predictor of suitable vaccines that will significantly reduce response time and thus the overall biological and financial impact of an outbreak.

How do you choose what models to test in vaccine selection and prediction? How do you get feedback on your models?

Choosing the right model for vaccine selection and prediction is critical to the success of our research. This involves the choice of the model type, e.g., linear regression model, non-linear regression, black-box models such as artificial neural network and other machine learning models, and then determination of the parameters in the chosen model. The main challenge in this research lies in the fact that more than 300 hundred amino acids are believed to be relevant, however, only dozens of history data are available for determining the model parameters. This poses the notorious problem known as curse of dimensionality. In choosing the model, we aim at building a model that is powerful yet as simple as possible. We used two typical approaches to validate the model. The first is very popular for model selection in machine learning, where only part of the available data for training the model and the rest as “unseen” data for test. These technical are often known as cross validation. The second approach, which is also the ultimate goal of model building is to test the model using experimental data.

One of your papers suggests evolutionary algorithms are the best fit for vaccine selection — Can you give some details of that study and describe why you preferred to use evolutionary algorithms over other types of algorithms?

To address the challenge of curse of dimension, it is of paramount importance to choose a capable yet compact model, which requires a tool that is able to simultaneously determine the structure and parameters of the model. This is intractable for most existing statistical machine learning techniques. By contrast, evolutionary algorithms, inspired by biological mechanisms in natural evolution such as crossover, mutation and selection (survival of the fittest), are able to “evolve” both the structure and parameters of the model.

This has been proved by many successful application examples, including my own experience in optimizing models for engineering applications. Our work recently reported in Bioinformatics has again confirmed the power of evolutionary algorithms in creating capable and compact models. In that work, we intended to generate a nonlinear regression model, which is more capable in describing complex relationships, e.g., whether an amino acid contributes to the effectiveness of a vaccine than linear regression models used in previous work. Unfortunately, a canonical nonlinear regression model contains many more parameters than a linear counterpart, which causes problems to the quality of the model obtained by a small number of training data. One way to resolve the problem is to remove some of the “redundant” terms in the nonlinear regression model to reduce the number of parameters, while maintaining it strong approximation capability. To this end, we employed an evolutionary algorithm to determine the structure, i.e., which terms can be removed and the parameters of the model. We compared the model obtained by our method and those by standard methods as well as a linear regression model and the comparative results confirm that the model obtained by the evolutionary algorithm performs the best in test.

analysis-227172_1280 Nevertheless, this does not mean that the evolutionary algorithm we used is the best. Other meta-heuristics such as evolution strategies, particle swarm optimization and differential evolution may also be able to achieve similar results. In addition, we are currently also exploring other machine learning models and techniques to further improve the quality of the model and very promising results have been achieved on various case studies, including one for ebola virus.

What do you think will be the role of the computer scientist in the future of identifying vaccines for public health? Will computer scientists replace or complement existing researchers, like biologists, chemists, and immunologists?

No doubt that computer scientists will play increasingly important roles in vaccine selection, and in many other disciplines as well. Notwithstanding that, computer scientists will never replace researchers in other disciplines. In fact, computer scientists need to closely work with researchers in others disciplines in formulating problems, designing experiments and understand results. In short, computer scientists are complementary with researchers in other disciplines, and vice versa. This required by all interdisciplinary research.

Is there anything else that you want our readers to know about how this research – or any of your other projects – can affect their lives?

There results we have achieved are promising yet preliminary. I hope in the near future we can design vaccines that are tailored for a particular person or a group of persons that work most efficiently and cause minimal side effects. I also think that our research on engineering design optimization will contribute to cleaner and more environment-friendly products.

Imagine you’re talking to an entrepreneur who wants to create a new technology that will use machine learning models to improve vaccine selection. What 3 pieces of advice would you give as to how they should build their technology?

Well, I do hope that our techniques can be put into use in the near future and would be very keen to discuss this in greater details if someone is interested in our technologies. The key to success is to integrate cutting-edge knowledge and skills from both computer science and biology.

Psychology, Products, People