IIT Madras researchers develop model to detect cancerous mutations
Researchers from Indian Institute of Technology (IIT) Madras, have developed a machine learning model that can use genome sequencing data to detect cancerous mutations in the body.
Cancer is caused by mutation or growth of cells in the body. While cell mutation is a common phenomenon, only some mutations cause cancer. They are called driver mutations. Other mutations, which are a majority of all mutations, are benign and are called passengers.
The study by three researchers from the Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, found that the neighbouring gene sequences of driver mutations are significantly different from that of passengers. The researchers further designed NBDriver, a mathematical model based on artificial intelligence, to identify the pathogenic variants of mutations that can cause cancer. The study, funded by the Department of Biotechnology, government of India, was published in the peer-reviewed journal Cancers in May this year.
“One of the major challenges faced by cancer researchers involves the differentiation between the relatively small number of ‘driver’ mutations that enable the cancer cells to grow and the large number of ‘passenger’ mutations that do not have any effect on the progression of the disease,” said B Ravindran, co-author of the study, head of RBCDSAI and professor in the department of computer science and engineering.
The model developed by RBCDSAI will identify these driver mutations by looking at genome data around the mutations. “Think of it like looking for a spelling error in a sentence. We are feeding a sentence into the model and by looking at the words before and after a word, the model can identify the erroneous word,” said RBCDSAI member Karthik Raman, co-author of the study, associate professor at the Bhupat and Jyoti Mehta School of Biosciences and co-ordinator of the Centre for Integrative Biology and Systems Medicine at IIT Madras.
While global researchers have developed computational methods for distinguishing between driver and passenger mutations, limited literature exists on using raw nucleotide sequences surrounding a particular mutation as potential features for building predictive models.
The researchers at IIT Madras aimed to discover patterns in the DNA sequences – made up of four letters, or bases, A, T, G and C, surrounding a particular site of alteration. “The underlying hypothesis was that these patterns would be unique to individual types of mutations – drivers and passengers. Therefore, these patterns could be modelled mathematically to distinguish between the two classes,” said Raman.
The performance of NBDriver was tested on several open-source cancer mutation datasets. “Our model could distinguish between well-studied drivers and passenger mutations from cancer genes with an accuracy of 89%. Furthermore, combining the predictions from NBDriver and three others commonly used driver prediction algorithms resulted in an accuracy of 95%, significantly outperforming existing models,” said Ravindran.
Raman added, “Interestingly, NBDriver could accurately identify 85% of the rare driver mutations from patients diagnosed with Glioblastoma Multiforme (GBM), a particularly aggressive type of cancer affecting the brain or spine.”
NBDriver is available publicly and can be used to obtain predictions on any user-defined set of mutations.
Mumbai-based data scientist Anirvan Chatterjee, who was not a part of the study, said, “Multigenic disorders like cancer pose a significant challenge in understanding the effect of several hundreds of mutations in their contribution to the final disruption on cell repair.”
“In this study, the researchers have initiated a significant new direction in our ability to use massive scale genomic data to understand the heuristic relationship between causal mutations and effect mutations. Exploring the analytical framework suggested here by various research groups globally can potentially bring a ‘crowd-sourced’ high predictive mutation list which can potentially be triangulated with other interpretive methods, and create a robust catalogue,” said Chatterjee, who is also the founder of HaystackAnalytics—a healthcare startup incubated under the Society for Innovation and Entrepreneurship (SINE) at IIT Bombay. “The tool can be used to identify the likelihood of a mutation being cancerous. Medical professionals can run a genome sequencing on the tool and trace the culprit mutation. This will help design specific treatment strategies for patients,” said Raman.