AI tool to boost genomic data analysis

Press/Media: Research

Description

AI tool to boost genomic data analysis

PhD student Fuyi Li and Associate Professor Jiangning Song.
PhD student Fuyi Li and Associate Professor Jiangning Song.

An explosion in biomedical genomic data propelled by high-throughput sequencing and a rise in open-access data has left biologists lagging. Now, a new artificial intelligence (AI) tool developed by a team led of Monash Biomedicine Discovery Institute (BDI) scientists promises to help close the ever-widening gap between sequencing data and the ability to analyse and interpret it.

Associate Professor Jiangning Song said iLearn was the result of a two-year study by a cross-disciplinary bioinformatics team, building on previous software tools. The results of this study were published today in Briefings in Bioinformatics.

Associate Professor Song said the tool will help with the downstream stage of analysis and modelling of the three major types of sequence data including DNA, RNA and protein.

He stated that the fact that proteins have so many different functions meant there was an exponential number of protein functional features to be characterised at the end of the DNA sequencing process.

“After sequencing data the challenge is to characterise the function or sequence-structure-function relationship of the macromolecules – such as proteins – composed of a unique, specific arrangement of their amino acid residues,” Associate Professor Song said.

“It can take years in the lab to follow up the functional characterisation effort,” he said.

“We needed a tool or computer program that can enable users to learn or gain knowledge by building a model from the existing annotations of the data set and applying the learned model to their specific targets, whether they be DNA, RNA or protein.”

iLearn, which integrates five different machine-learning algorithms, was designed for users who only want to upload their dataset and select the functions they need to calculate from it. All necessary procedures and optimal settings are completed automatically by the software.

“The tool is user-friendly so biologists with limited programming knowledge can effectively use it,” Associate Professor Song said.

“The development and implementation of this AI tool will facilitate biological sequence-based biomarker discovery, prediction algorithm development using next-generation sequencing data, and drug development in the broader contexts of biomedical big data analytics,” he said.

“It will give biologists a very strong pillar to enable and empower them to train their own models by focusing on their specific target datasets.”

“It’s a very useful tool for generating novel knowledge that can be tested by going back to the lab. It allows unlimited hypotheses to be tested.”

First author and long-term collaborator, Dr Zhen Chen, Assistant Professor at Qingdao University in China said, “In addition to the multi-functionality, iLearn is also powerful in that it can communicate with other computational tools easily with high efficiency. We anticipate that iLearn will serve as a useful tool for sequence analysis and functional annotations.”

Fuyi Li, a PhD student in Associate Professor Song’s group at Monash BDI said, “iLearn shortens the formidable distance for biologists to use AI and makes the data analysis process easier and accessible. It facilitates biologists to embrace the new world of artificial intelligence for genomic data analytics.”

Associate Professor Song said the researchers expect that iLearn, successfully tested in two case studies in the paper, is likely to generate a lot of interest in the biomedical community. His team has developed more than 60 bioinformatics toolkits, webservers and software, many of them widely used by the international research community.

iLearn, an open source tool, was named to allude to the spirit of Monash University's motto Ancora Imparo, that is, “I am still learning".

A number of researchers across disciplines from Australia, China, Japan and the US worked on developing this software. Monash BDI Professors Roger Daly and Jian Li were key collaborators. Professor Ian Smith provided crucial technical support and Fuyi Li and fellow PhD student Jerico Revote were instrumental.

Professor Daly will apply iLearn to building more accurate models for the characterisation of cancer mutations and how they influence growth-regulating signals in cancer cells.

Professor Li’s group is currently working with Associate Professor Song’s team by applying iLearn to develop data-driven machine learning models to identify key antimicrobial resistance (AMR) genes and genetic elements that may lead to better understanding of the AMR mechanisms.

This work was supported by grants from the National Health and Medical Research Council of Australia (NHMRC), the Young Scientists Fund of the National Natural Science Foundation of China, the Australian Research Council (ARC), the National Institute of Allergy and Infectious Diseases of the National Institutes of Health, a Major Inter-Disciplinary Research project awarded by Monash University, and the Collaborative Research Program of Institute for Chemical Research, Kyoto University.

Read the full paper in Briefings in Bioinformatics titled iLearn: an integrated platform and meta-learner for feature engineering, machine learning analysis and modelling of DNA, RNA and protein sequence data.

About the Monash Biomedicine Discovery Institute

Committed to making the discoveries that will relieve the future burden of disease, the newly established Monash Biomedicine Discovery Institute at Monash University brings together more than 120 internationally-renowned research teams. Our researchers are supported by world-class technology and infrastructure, and partner with industry, clinicians and researchers internationally to enhance lives through discovery.

Period25 Apr 2019

Media contributions

1

Media contributions

  • TitleAI tool to boost genomic data analysis
    Media name/outletMonash BDI News
    CountryAustralia
    Date25/04/19
    DescriptionAI tool to boost genomic data analysis

    Share this page on Facebook
    Share this page on Twitter
    Share this page on Linkedin
    Share this page on Google Plus
    Share this page on Baidu
    Email this page
    Print this page
    +SHARE
    25 April 2019

    PhD student Fuyi Li and Associate Professor Jiangning Song.
    PhD student Fuyi Li and Associate Professor Jiangning Song.
    An explosion in biomedical genomic data propelled by high-throughput sequencing and a rise in open-access data has left biologists lagging. Now, a new artificial intelligence (AI) tool developed by a team led of Monash Biomedicine Discovery Institute (BDI) scientists promises to help close the ever-widening gap between sequencing data and the ability to analyse and interpret it.

    Associate Professor Jiangning Song said iLearn was the result of a two-year study by a cross-disciplinary bioinformatics team, building on previous software tools. The results of this study were published today in Briefings in Bioinformatics.

    Associate Professor Song said the tool will help with the downstream stage of analysis and modelling of the three major types of sequence data including DNA, RNA and protein.

    He stated that the fact that proteins have so many different functions meant there was an exponential number of protein functional features to be characterised at the end of the DNA sequencing process.

    “After sequencing data the challenge is to characterise the function or sequence-structure-function relationship of the macromolecules – such as proteins – composed of a unique, specific arrangement of their amino acid residues,” Associate Professor Song said.

    “It can take years in the lab to follow up the functional characterisation effort,” he said.

    “We needed a tool or computer program that can enable users to learn or gain knowledge by building a model from the existing annotations of the data set and applying the learned model to their specific targets, whether they be DNA, RNA or protein.”

    iLearn, which integrates five different machine-learning algorithms, was designed for users who only want to upload their dataset and select the functions they need to calculate from it. All necessary procedures and optimal settings are completed automatically by the software.

    “The tool is user-friendly so biologists with limited programming knowledge can effectively use it,” Associate Professor Song said.

    “The development and implementation of this AI tool will facilitate biological sequence-based biomarker discovery, prediction algorithm development using next-generation sequencing data, and drug development in the broader contexts of biomedical big data analytics,” he said.

    “It will give biologists a very strong pillar to enable and empower them to train their own models by focusing on their specific target datasets.”

    “It’s a very useful tool for generating novel knowledge that can be tested by going back to the lab. It allows unlimited hypotheses to be tested.”

    First author and long-term collaborator, Dr Zhen Chen, Assistant Professor at Qingdao University in China said, “In addition to the multi-functionality, iLearn is also powerful in that it can communicate with other computational tools easily with high efficiency. We anticipate that iLearn will serve as a useful tool for sequence analysis and functional annotations.”

    Fuyi Li, a PhD student in Associate Professor Song’s group at Monash BDI said, “iLearn shortens the formidable distance for biologists to use AI and makes the data analysis process easier and accessible. It facilitates biologists to embrace the new world of artificial intelligence for genomic data analytics.”

    Associate Professor Song said the researchers expect that iLearn, successfully tested in two case studies in the paper, is likely to generate a lot of interest in the biomedical community. His team has developed more than 60 bioinformatics toolkits, webservers and software, many of them widely used by the international research community.

    iLearn, an open source tool, was named to allude to the spirit of Monash University's motto Ancora Imparo, that is, “I am still learning".

    A number of researchers across disciplines from Australia, China, Japan and the US worked on developing this software. Monash BDI Professors Roger Daly and Jian Li were key collaborators. Professor Ian Smith provided crucial technical support and Fuyi Li and fellow PhD student Jerico Revote were instrumental.

    Professor Daly will apply iLearn to building more accurate models for the characterisation of cancer mutations and how they influence growth-regulating signals in cancer cells.

    Professor Li’s group is currently working with Associate Professor Song’s team by applying iLearn to develop data-driven machine learning models to identify key antimicrobial resistance (AMR) genes and genetic elements that may lead to better understanding of the AMR mechanisms.

    This work was supported by grants from the National Health and Medical Research Council of Australia (NHMRC), the Young Scientists Fund of the National Natural Science Foundation of China, the Australian Research Council (ARC), the National Institute of Allergy and Infectious Diseases of the National Institutes of Health, a Major Inter-Disciplinary Research project awarded by Monash University, and the Collaborative Research Program of Institute for Chemical Research, Kyoto University.

    Read the full paper in Briefings in Bioinformatics titled iLearn: an integrated platform and meta-learner for feature engineering, machine learning analysis and modelling of DNA, RNA and protein sequence data.

    About the Monash Biomedicine Discovery Institute

    Committed to making the discoveries that will relieve the future burden of disease, the newly established Monash Biomedicine Discovery Institute at Monash University brings together more than 120 internationally-renowned research teams. Our researchers are supported by world-class technology and infrastructure, and partner with industry, clinicians and researchers internationally to enhance lives through discovery.
    URLhttps://www.monash.edu/discovery-institute/news-and-events/news/2019-articles/ai-tool-to-boost-genomic-data-analysis
    PersonsJiangning Song, Roger Daly, Jian Li

Keywords

  • BDI
  • AI
  • bioinformatics
  • machine learning
  • genomic data
  • iLearn