Introduction to annotation
In this article we set the scenario for the exploration of genome annotation.
DNA contains the code or instructions to make proteins and other components of cells. These instructions are put into genes, which can be considered as the smallest unit of information. Typically, a gene contains a stretch of protein coding sequence (or coding sequence for short or simply CDS) but may also contain other functional regions that do not encode for protein sequences. For example, certain regions of DNA have the role of providing a “meeting point” for the components of the transcription machinery that will then progress over the CDS section of the DNA to make a transcript. These regions are called promoters and can be considered parts of genes.
Another example: in the case of bacterial genomes, which are circular, initiating the replication of the bacterial chromosome is the role of a region of the genome. This region is commonly denominated as ori (origin). These are examples of important regions of the genome that are not protein coding. So how can we record where these sequences are in a given genome? We will try to answer this and other related questions in the coming activities.
One important problem presented by draft genomes is that their annotation may not be complete. Remember that draft genomes have typically a number of gaps (areas of unknown sequence). If a gene sequence were to fall in this gap, we would not be able to find the nucleotides that make that gene. In consequence, the gene might be missing from the genome assembly or truncated.
Annotation from finished genomes are much more reliable than those from draft genomes.
© Wellcome Genome Campus Advanced Courses and Scientific Conferences