Header image header background
The Orthologous Gene Database
line decor
Advanced Search  
home

Welcome to Streptococcus in Toto
In Toto User Guide

In Toto, Latin for alltogether or totally, is used to describe the comparative functionality of this site. Streptococcus in Toto presents data from many species of Streptococcus. Orthologous genes were identified across these species. Those genes were then combined into orthologous groups, which can be explored here.

If you know exactly what you are looking for, consider starting with an Advanced Search. The Advanced Search page contains examples that will assist you in your search. You can access the Advanced Search page from anywhere in the website via the Advanced Search link at the top of the page.

Introduction:

With the rising number of fully sequenced genomes, the importance of high-throughput genome annotation is increasing. Although several automatic annotation software systems do exist, their quality and flexibility are limited. Therefore, we have developed the semi-automatic approach called DENTOTI (Etymology: English - dental +Latin totus whole, entire +English -i) to facilitate annotation of oral pathogen genomes hosted in the ORALGEN database (http://www.oralgen.lanl.gov/). DENTOTI was designed and implemented to support oral pathogen sequence analysis, metabolic reconstruction and comparative genome analysis. The DENTOTI approach is based on a common heuristic method of identifying orthologs using ‘bidirectional best hits’ (Overbeek et al 1999): if the most similar sequence to protein A in genome 2 is B, and if the most similar sequence to protein B in genome 1 is A, then A and B are bidirectional best hits, and are operationally considered to be orthologs. This relationship is especially strong if the blast E value is very small and if the alignment of the proteins spans a majority of each sequence. Applying the DENTOTI approach to 5 closely related genomes from the Streptococcus family: Streptococcus agalactiae, Streptococcus mitis, Streptococcus mutans, Streptococcus pneumoniae, Streptococcus sanguinis and Streptococcus thermophilus, we were able to construct the Streptococcus in Toto database and transfer gene functional annotation from previously annotated Streptococcus genomes to approximately 70% of the genes in Streptococcus sanguinis and Streptococcus mitis. Of course, relying on this high-throughput ‘bidirectional best-hit’ database search criterion for annotation can be problematic, as in the case of multi-domain organization of proteins and non-orthologous gene displacement, this approach would incorrectly predict a functional linkage. Therefore, some manual curation is necessary to validate the results. This approach is very useful for the initial assignment of function to genes in groups of closely-related species. The key steps of this approach are described briefly below.

Methods:

A) To isolate genes from different organisms that play identical roles in each organism.

A simple method for prediction of orthologous proteins in two organisms is to search for a pair of sequences, Xa in organism Ga and Xb in organism Gb, such that (1) a search of the proteome of Gb with Xa finds Xb as the best hit, and (2) a search of the proteome of Ga with Xb finds Xa as the best hit. The method is called the bidirectional best-hit (BBH) method (Tatusov et al 1997, Overbeek et al 1999),and is depicted in Fig.1. This relationship is especially strong if the E-value is very small and if an alignment of the proteins spans a majority of each sequence. In this study, the cut-off was an E-value less than or equal to 10-15 was used as the cut-off. A gene is considered strain-specific if it has no hits with an E-value 10-5 or less.

Figure 1

Figure 1: The flowchart of orthologs identification using Bi-directional Best Hit (BBH).

B) To cluster Orthologous Groups (OG) genes together

We begin by forming sets of genes that we call “Orthologous Groups (OG).” An orthologous group is a set of genes such that each gene in the set is a bidirectional best hit with at least one other gene from a third organism in the set. The orthologous group cluster is “connected” in the sense that one could not split the group without separating two bidirectional best hits.

Figure 2

Figure 2: Illustration of the definitions of BBHs between Xa and Xb, Xb and Xc, but not Xc and Xa.

A unique id has been assigned to each orthologous group. The homologous relationships in the gene/protein universe can be represented as a network, in which nodes represent sequences and edges represent similarities between sequence pairs.

The orthologous group may not contain a pair of genes Xa and Xc from organisms Ga and Gc, respectively, such that Xa is a bidirectional best hit with Xb from Gb and Xb is a bidirectional best hit with Xc from Gc, but Xc is not Xa. (see Figure 2) This condition is frequently seen in cases of paralogs.

C) To transfer the orthologous group annotation for true orthologs

After the true orthologs were identified, the group annotation was transferred for those closely related species during the first-pass genome annotation process, and subsequently subjected to manual review to validate the annotation.

 

News
Oralgen News


 
 
left background fade Los Alamos National Laboratory . Est. 1943
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Inside | © Copyright 2006-7 Los Alamos National Security, LLC All rights reserved | Disclaimer/Privacy | Web Contact
header background