Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only T&Cs apply

Find out more

AI general workflow

And here is a general workflow of a machine learning model. And after this, I try to show you about how you can link between a general AI model and AI-based bioinformatics model. And for AI general workflow, there are two types you need to focus, here. The first one is the traditional machine learning flow. And the second one is the deep learning flow. For the traditional machine learning flow, we have a term here, we call the handcrafted, which means… handcrafted are features that are manually engineered by the data scientists. So you see that, the input, from the input data, you have a thing like a feature instructor here.
After the feature instructure, you generate a lot of handcrafted features, and you use the handcraft features to insert to traditional machine learning items. Here, you even can use the deep learning items here. After that, you can get the output. And that is traditional workflow. However, currently, if you want to… don’t want to use the traditional workflow, you want to use the learning features. What is the difference between handcraft features, the learn features which mean that the features that can be learnt automatically from the model. Like here, in deep learning algorithms, if you try to insert an input, and then the deep learning outcomes can help you to try to automatically generate the features.
And then generate the output, so that is the difference between deep learning items and machine learning items in learning the features. And in bioinformatics, you even use these tools, these two techniques to extract the features. I try to show you about how you can use traditional and you can even use deep learning items in bioinformatics. And first one for traditional way, how AI can learn how machine learning can learn about the data. So for example, you have this sequence, and this is for a protein sequence, and then you can insert into the AI model. So how AI can understand? So in this case, actually, the AI for the traditional machine learning items, they cannot learn this, such kind of features.
And to learn these features, you need to transform from the sequence into a vector contains a numbers Like this. So if this number, and then you can even… you can easy to insert into the AI. And then the AI can learn that features which means the vectors represented for the sequence. And how you can… and another another data, that you can insert into the bio… into the machine learning algorithms. We call the gene expression data. Because for the first one, I explained about the sequence. You know that, the sequence cannot be inserted into machine learning algorithms directly. However, here, if you use gene expression data, you can even, can insert directly into machine learning items. Why?
Because in gene expression data, it contains all of the values. It’s not as a sequence, and you can use the values, and then you can apply some machine learning on them. And then the machine learning can understand that later, so that’s why I say in the… for the gene expression data, you don’t need to extract the features. You need to extract the features using… if you use the bioinformatics sequence, and here is the step by step that I will show you, if you deal with the sequencing data, because as I mentioned for gene expression data, when you download the RNA sequencing, so you don’t need to process any kinds of data. And then you can insert directly.
However, for sequencing, traditional steps, you need to process. And here, from a protein sequence, and the first step is to extract the handcraft features. like… I will have a feature instruction steps here. And for future instruction step, there are some common steps, some common features that you can apply in source sequencing, like the amino acid composition, like dipeptide pair composition. And the last one is the position for scoring specific matrix here. So these are three common features for protein sequence. However, you can learn a lot of advanced features. In the future, and for after you generate the features, which means you already convert the protein sequence as sequence, and alphabet letter into vectors, contain some numbers.
So you will use this and then you insert into machine learning algorithms, you can… for this step, you’ve even… you can use deep neural networks to learn that features, It should be okay and here. I try to list some of the simple algorithms that you can apply in this step, like the RVF networks, like the support vector machine, or as I mentioned, also you can use the deep neutral networks, here. And finally, after you use the machine learning algorithms, you can get the outcomes from the results. So, you see, this is the step, step for bioinformatics study, you can convert from protein sequence to features, and learn features and then generate the outcome. And here is….
at the article that I try to show you about… their process. And you can see here, in their process, there are three different subprocesses. The first one is data collection. After they collect data, they will move into the feature extraction. And as I show in previous slides. Here they use the BSSM profiles to extract the features. And finally from the BSSM profiles, they use a convolutional neural networks to learn that features and generate the final model. So this is a very traditional step for bioinformatics. However, you need to follow step by step to generate a bioinformatics study for sequencing data.
And very similar, another examples is that, another study, when they try, you can see here, the inputs they use, …uh sequence, and after that, they use work embedding is… the word embedded is a natural language processing technique to try to trick the protein sequence as a natural word. And then they generate the features. After that, the features will be inserted into convolutional, deep neural networks, and then generate a binary classification.

In this video, Dr. Khanh will explain the difference between traditional machine learning workflow, and deep learning flow. Then, he will explain how AI machine learning gene expression data as binary data. Finally, he will explain feature extraction step by step.

This article is from the free online

Artificial Intelligence in Bioinformatics

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now