Scientific Background

The pragmatic species concept for Bacteria and Archaea is ultimately based on DNA-DNA hybridization (DDH). While enabling the taxonomist, in principle, to obtain an estimate of the overall similarity between the genomes of two strains, this technique is tedious and not easily be made reproducible between different labs. Furthermore, it cannot be used to incrementally built up a comparative database. Recent technological progress in the area of genome sequencing calls for bioinformatics methods to replace the wet-lab DDH by in-silico genome-to-genome comparison. This web service offers state-of-the-art methods for inferring whole-genome distances which are well able to mimic DDH. These distance functions can also cope with heavily reduced genomes and repetitive sequence regions. Some of them are also very robust against missing fractions of genomic information (due to incomplete genome sequencing). Our digitally derived genome-to-genome distances show a better correlation with 16S rRNA gene sequence distances than DDH values. Thus, this web service can be used for genome-based species delineation and genome-based subspecies delineation. Moreover, the GGDC reports the difference in G+C content, which can also be reliably used for species delineation. Once you have obtained complete or incomplete, assembled genomes sequences, the use is easy: upload your sequence files in our distance calculation form and let our server calculate intergenomic distances for you. These are converted into similarity values analogous to DDH and sent to you via e-mail to support your decision about the relatedness of your novel strain to known type strains.

The GGDC has been developed entirely independently of the ANI ("average nucleotide identity") concept and is in no way based on it. Indeed, the core of GGDC, the GBDP program for calculating intergenomic distances, has been published before the first paper on ANI. GBDP conducts a couple of corrections that are not found in the ANI programs such as JSpecies, and in contrast to them GBDP does not split the sequences into sections of an arbitrary length of about 1000 bp. In the studies listed below, GGDC yielded higher correlations with wet-lab DDH (without mimicking its pitfalls) than the ANI software, and as of version 2.0 GGDC uses statistical models that considerably improve on the linear models used by the ANI software and earlier versions of GGDC. A practical advantage of GGDC over ANI is that GGDC operates on the same scale than wet-lab DDH values, which makes comparisons much easier. But of course it has always been easy to calculate "average nucleotide identities" with GGDC, too. See the GGDC FAQ for details.

In one section of his acceptance speak for the Bergey Award 2014, Hans-Peter Klenk explained the advantages of GGDC for microbial species delineation over alternatives such as the ANI software (JSpecies etc.).

Important publications about GGDC, GBDP and both their diverse applications

The foundations are detailed in these papers (by using the GGDC you agree to cite at least one of them).

The delineation of subspecies with the GGDC has been introduced in the following study (by using the GGDC subspecies report you agree to cite this paper):

The taxonomic use of DNA G+C content as calculated from genome-sequence data and its relation to DDH was investigated here (by using the GGDC G+C difference report you agree to cite this paper):

The question of when a DDH experiment should be mandatory in microbial taxonomy given a certain 16S rRNA gene sequence similarity threshold, was revisited here:

The GBDP procedure has previously been introduced in the following study:

The suitability of GBDP for phylogenetic reconstruction is exemplified by a number of studies: