Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. T&Cs apply

Bioinformatics Case Study

Watch the video to learn more.

In this video, Professor Khanh introduces the topic of feature learning. Feature learning involves using machine learning algorithms to learn the features extracted from data. The course caters to two types of students: those without programming knowledge who can use Weka, a tool for implementing machine learning algorithms, and those with programming skills who can use Python.

The content for the week focuses on solving algorithms and demonstrating the use of Weka for machine learning. The professor outlines four main topics covered in the lesson. Firstly, he presents a bioinformatics case study on predicting protein functions, specifically electron transport proteins, which play a vital role in cellular respiration. Secondly, he explains how to save and load machine learning models in Weka for future use and validation. The third topic covers achieving optimal performance through parameter tuning in Weka. Lastly, he addresses the challenge of loading CSV files into Weka, which typically accepts ARFF files.

Professor Lee then dives into the details of the first topic, providing an overview of electron transport proteins and their function in cellular respiration. He emphasizes the importance of predicting these proteins as they are essential for generating energy from glucose. The dataset used in the demonstration is from his previous paper and was initially analyzed using deep learning techniques. However, for this demonstration, traditional machine learning methods and bioinformatics features will be employed, allowing a comparison with the paper’s results.

The professor concludes by sharing the statistics of the dataset, which includes training and testing data for both electron transport and non-electron transport proteins. The training dataset consists of over 1,100 instances of electron transport proteins and 3,800 instances of non-electron transport proteins. The testing dataset contains more than 200 instances of electron transport proteins and over 700 instances of non-electron transport proteins.

Please check the link for the whole research, ET-GRU: using multi-layer gated recurrent units to identify electron transport proteins

This article is from the free online

AI and Bioinformatics: Genomic Data Analysis

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now