Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only T&Cs apply

Find out more

Public resources for bioinformatics

So, there are… I already show you some  of the examples, some examples of…   uh how you can… why you need to apply  AI in bioinformatics as well as how many people apply them, and then the next… let’s move to the next session for today.   I want to show you some of the  public resources for bioinformatics. Fortunately for all for bioinformatics  study, you can have a lot of public resources that you can use. Many research, many biologists, they try to retrieve the data from  published resources in bioinformatics. And here, the first one is a very famous  one, we call 1000 Genomes project.  So the 1000 Genomes Project  ran between 2008 and also 2015.
So it creates the largest public catalog  of human variation and also genotype data.  Here is a table that shows you about the  statistics for 1000 genome project until now. So there are three phases now, and you  can see the for the current release version. For phase three, you have around 84.4 million variants also 2504 individuals from 26 populations. So if you want to study about a variant from the genome information, you can get a lot of information from  the 1000 genomes project to collect. And the second one is the Ensemble Genomes. It also contains the genomic information. And also they contain the annotations  analysis also display of genomes. You can go to this website to  check the ensemble genomes.
Next one, uh, as I already mentioned about the Gene Expression profiles and how you can collect from gene expressions profiles. So you can use… you can go  to Gene Expression Omnibus.  This is a database that already be  interrelated in NCBI which provides a robust versatile database in which to efficiently  store high throughput functional genomic data.  And after that you can use some submission procedures and then you can collect  data for your research purpose. And for the GEO here, they  provide user-friendly mechanisms. And they can allow user to clearly  locate review and also download studies and change version profiles of interest. So this one is also very  useful in bioinformatics study.  And the next one, I want to introduce  is the Gene Ontology Resource.
And it is a comprehensive computational  model of biological systems ranging from the molecular to the organism level, across the multiplicity of species in the tree of life. And.. for here… for until now, Gene Ontology is the world’s largest source of information on the functions of genes. If you want to understand the gene functions or protein function like I explained, like I show before about a protein function prediction, you can use this website  to retrieve the functions of proteins. And then you can perform your research, your study. And here is the largest database called National  Center for Biotechnology Information, NCBI. So for this one, I think most of you, when you work in the medical biomedical field, you will know this database.
And for this one, they provide a lot  of information related to biomedical   like the, uh, like the papers and  also like the protein sequence,  DNA sequence or some nucleotide as  well as string expression profiles.  So for this one, this is a big database. And then if you want to perform some research, so you can try to go to this  website and download information. And if you try to create a research  more detail on protein sequence,  so you can go to the UniProt. UniProt is a comprehensive high quality and freely accessible resource for of protein  sequence, and also functional information. 
For this one, mostly they  focus on protein sequence.  However, protein sequence  is also a hot topic today. And last one, for cancer studies, if you study cancer genomics,  you can go to the Genomic  Data Command Data Portal. So for this one, it provides a robust  data-driven platform that allows cancer researchers and bioinformaticians to search  and download cancer data for analysis. So you see here. This is a figure that I take from   the GDC website. You can see that. If you study cancers, there are a lot of cancers that you can retrieve  data like lung cancer, and also GDM and so on.

Dr. Khanh Le will introduce you to different public databases for bioinformatics data. These are useful tools when you do research, learning medical knowledge.

The first is 1000 Genomes Project. it’s the largest public catalog of human variation and genotype data. You could check the link provided below.

Have you ever use any of them to do research? Would you share your experience on how you utilize these resources? Please reply to the comment section below.

This article is from the free online

Artificial Intelligence in Bioinformatics

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now