Skip main navigation

Demonstration on extracting bioinformatics features

VIDEO
15.7
After explaining about how you can extract the features from the protein sequence as well as DNA sequence. I will spend a small session to have a demonstration that can help you how to extract the bioinformatics features. And for extract bioinformatics features, I try to show you some of the ways how you can extract the features. And for the first one, we can use the web server to extract more features. And for the second one, if you’re familiar with python programming language, I suggest there’s some package that you can use like the iLearn, like Pse-in-One, and also PyBioMed, here. And if you don’t want to use python, you even… you can use the R language.
63.6
And then there’s some package that can help you to generate the features. And here for each web, I try to have one demonstration that can help you to generate the features, different features. And how you can change rates. And here I already open uh.. iFeature, web server here. And this is the interface for iFeature web server. It is a server that can help you generate different kinds of features from protein sequence. And here you can see that, there is…for the first step, you need to input the sequences in the FASTA format. And the FASTA format, what is the FASTA format? I already explained in the last uh.. lesson. And here is an example for uh..
105.9
one FASTA file, It includes a header and also the sequence like this. And after you insert a FASTA sequence here, you can have go to another step. This is a feature descriptors. There are a lot of feature descriptors, like AAC. Here is the one. I already explained. And also you can even use the dipeptide composition or three peptide compositions and so on. There are a lot of features that can be used to generate uh.. features generator vectors from the sequence. And here, in this demonstration, I just want to have a very simple one, AAC first. And after you select the AAC, so for the others, you don’t need to select, because you don’t use any clustering here.
154.4
So, and then the feature selection, you need to use, and you just want to submit. Because this is a very simple wait okay,. After you submit in the webserver, and your job is being processed from the server.
171.9
And just wait a moment just a few seconds, and then you can get the results. So here is the result and this is the.. uh.. the features that you already finished create. You are… from the corresponding sequence, and you can see that. You use the AAC and then, the web server can help you to generate the information regarding to the compositions of amino acids, here. Like from the A to Y here. And this is the occurrence, and the frequency of the amino acids in that sequence. Similar, this is the frequencies of amino acid C and you will use this one. As you can see that now. It is displayed as a vector.
210
And you can use this to insert into machine learning algorithms like insert to Weka or even into python to try to process this, and then get the results. And for this uh… For this results, you can even download the file and then, when you download, it can display as a CSV file, and you can use it in your own purpose And sim.. and for.. in addition to the web server, you can also try to go to their website, and to use the package. Like here. This is a package for iFeatures. For the package… using a package is written in python. Okay, after you install the package you can.. There is a one instruction to help you to install the package.
259.7
After you install the package, you can even use this in your computer. You don’t need to use the webserver, and there are some functions that can help you to generate the features, like this. And then, all of the descriptors also include in the python package, and after you installed it, you can refer to the instructions and then to perform the feature extraction by your own. And for the last one.. for the last one, I want to introduce about the one package. You can use even select the package that you want. However, here, I try to explain a package for the, to extract the sequence… the features from the protein sequence.
300.7
We call the “protr” and this is the R package for generally various numerical representation schemes of protein sequences. And here, you can see that. For this package, you just need to import the libraries, the “protr”, here. And in the package, they provide uh.. functions that can help you to read the FASTA file. After you read the FASTA file, you try to generate a lot of features. Like here. They try to extract the APAAC features. And then, you refer to the introduction, you can see that, there are a lot of features that can be that can be provided by this package, like the amino acid composition, like the autocorrelations, also CTD and so on.
345.6
So this means that you just try to… just write these functions and then the package can help you to generate the features from the corresponding sequence. Okay, I go back to my slides. After I show you some demonstrations, I hope that you can try to have some practice on the generating more features. And if you have some sequence, you just go to their… go to the web server or the using programming language or using R language to generate features by your own. After you use features try to insert into Weka to help to build some machine learning algorithms. And this is the end of this lesson today. Thank you very much for your attention.

In this video, Dr. Khanh will have a demo on different web server. Please check on these websites below.

If you are doing similar research, we encourage you to try to follow the demo and the installation. We hope that you can try to have some practice generating more features. If you have a sequence, you can go to the webserver to generate features on your own.

We hope these tools can be helpful for your research.

This article is from the free online

Artificial Intelligence in Bioinformatics

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education