Collecting bioinformatics data

And the next one, is that I try to show you about how you can collect the data, collect bioinformatics data, here. And there are some published resources I’ve already explained. However, for the, to show you how to collect data, I tried to use four different uh resources here. The first one is the NCBI that provides a lot of biomedical and genomic information. The second one is the UniProt, which provides a lot of information related to protein sequence. And Geneontology website to retrieve the functions of gene and also the proteins. And the last one how you can collect the 3D structures of the protein sequence. This is we can collect from the PDB website.
Here, I try to retrieve one information related to…I remember related to the variant. you can see here. There are trail items here. And then you can try to download it one by one. And in the left-hand side you can try to filter out some of the information that you want, like the genomics, and the DNA, RNA, also messenger RNA or even the proteins. Also, you can collect from here. And for the NCBI, you can retrieve data from different resources. Just like the information bank. Also even in the UniProt or even into the reference sequences. So here is how we can collect data from the NCBI query. And to download, you can see that you have the gene bank.
You also have the FASTA to download. And the last one you can solve some of the visualizations by the graphics function here. And the next one, if you want to collect the gene expression profiles, I suggest you to use the data from the GEO, which is the gene expression omnibus. And here is the GEO web. To collect data, you will go to the GEO web interface first. After tha,t you try to search for gene expression and GEO profiles. And for example here, I try to type the lung cancer in the text box, and then download the results.
And after you try to get the results, you can download one by one, or you can even download multiple files into your computer. And then you can open it. The next one is from the GDC. GDC is that you can collect a lot of cancer studies. As I mentioned in the previous session. So for the first one to collect data from the GDC, you need to select the project first. Like this one to… if you want to study lung cancer, you select a project related to lung cancer. And another one, you try to select which kinds of data.
Here you can see that from GDC, they provide a lot of different kinds of data like the sequence rate, like the transcriptomic profiles, or even the SNV or the DNA methylations, also the clinical information also. So in the GDC, the clinical information just help you to support your research. If you want to generate results, you want to propose some hypothesis, you need clinical information to understand. And here is another slide to show you, how you can perform data collection by using the UniProt website. And from UniProt website, there are two different kinds of data. The first one is the review and the second one is the unreview.
So mostly, if you want to perform some studies, you need to select the review; means the sequence already been reviewed by some experimental study. And from the UniProt textbook, you try to use the search, or you even can use an advanced search. After that, you can try to filter the results, and you can download all of the protein sequence as FASTA format here. Okay, so this is the end of the course and the lesson that I try to show today for the first week. Thank you.

In this video, Dr. Khanh Le will explain how to collect bioinformatics data from public resources.

Here are the resource mentioned in the chart:

He will demonstrate how you use these websites. Please take some time to check each of them. If you perform related studies, these are useful tools recommended for you.

