How to cite the GGDC in your publication

If you are using GGDC 3.0 results in your upcoming publication(s), please always cite:

Meier-Kolthoff, J.P., Sardà Carbasse, J., Peinado-Olarte, R.L., Göker, M. TYGS and LPSN: a database tandem for fast and reliable genome-based classification and nomenclature of prokaryotes. Nucleic Acid Res 50:D801–D807, 2022. [Citations]
The above work describes, among other topics, the role and function of the GGDC 3.0 compared to our TYGS, LPSN and DSMZ single-gene phylogeny servers.

Since the GGDC 3.0 builds on the GGDC 2.1, it would be good scientific practice to cite the previous GGDC 2.1 paper as well:

Meier-Kolthoff, J.P., Auch, A.F., Klenk, H.-P., Göker, M. Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics 14:60, 2013. [Citations]

If you have used the GGDC's subspecies concept in your upcoming publication(s), please also cite:

Meier-Kolthoff, J.P., Hahnke, R.L., Petersen, J., Scheuner, C., Michael, V., Fiebig, A., Rohde, C., Rohde, M., Fartmann, B., Goodwin, L.A., Chertkov, O., Reddy, T., Pati, A., Ivanova, N.N., Markowitz, V., Kyrpides, N.C., Woyke, T., Göker, M., Klenk, H-P. Complete genome sequence of DSM 30083^T, the type strain (U5/41^T) of Escherichia coli, and a proposal for delineating subspecies in microbial taxonomy. Standards in Genomic Sciences 10:2, 2014. [Citations]

If you have used GGDC's reported differences in genomic G+C content for e.g. species delineation, please also cite:

Meier-Kolthoff, J.P., Göker, M., Klenk, H.-P. Taxonomic use of DNA G+C content and DNA-DNA hybridization in the genomic age. Int J Syst Evol Microbiol 64:352-356, 2014. [Citations]

Please note: Many more relevant studies about the GGDC, the underlying GBDP method and various applications are found further below under Important publications.

Scientific Background

☞ Traditional vs. in silico species delineation

The pragmatic species concept for Bacteria and Archaea is ultimately based on DNA-DNA hybridization (DDH). While enabling the taxonomist, in principle, to obtain an estimate of the overall similarity between the genomes of two strains, this technique is tedious and not easily be made reproducible between different labs. Furthermore, it cannot be used to incrementally built up a comparative database. Recent technological progress in the area of genome sequencing calls for bioinformatics methods to replace the wet-lab DDH by in-silico genome-to-genome comparison. This web service offers state-of-the-art methods for inferring whole-genome distances which are well able to mimic DDH. These distance functions can also cope with heavily reduced genomes and repetitive sequence regions. Some of them are also very robust against missing fractions of genomic information (due to incomplete genome sequencing). Our digitally derived genome-to-genome distances show a better correlation with 16S rRNA gene sequence distances than DDH values. Thus, this web service can be used for genome-based species delineation and genome-based subspecies delineation. Moreover, the GGDC reports the difference in G+C content, which can also be reliably used for species delineation. Once you have obtained complete or incomplete, assembled genomes sequences, the use is easy: upload your sequence files in our distance calculation form and let our server calculate intergenomic distances for you. These are converted into similarity values analogous to DDH and sent to you via e-mail to support your decision about the relatedness of your novel strain to known type strains.

☞ Relationship between the GGDC and ANI

The GGDC has been developed entirely independently of the ANI ("average nucleotide identity") concept and is in no way based on it. Indeed, the core of GGDC, the GBDP program for calculating intergenomic distances, has been published before the first paper on ANI. GBDP conducts a couple of corrections that are not found in the ANI programs such as JSpecies, and in contrast to them GBDP does not split the sequences into sections of an arbitrary length of about 1000 bp. In the studies listed below, GGDC yielded higher correlations with wet-lab DDH (without mimicking its pitfalls) than the ANI software, and as of version 2.0 GGDC uses statistical models that considerably improve on the linear models used by the ANI software and earlier versions of GGDC. A practical advantage of GGDC over ANI is that GGDC operates on the same scale than wet-lab DDH values, which makes comparisons much easier. But of course it has always been easy to calculate "average nucleotide identities" with GGDC, too. See the GGDC FAQ for details.

In one section of his acceptance speak for the Bergey Award 2014, Hans-Peter Klenk explained the advantages of GGDC for microbial species delineation over alternatives such as the ANI software (JSpecies etc.).

Important publications about GGDC, GBDP and both their diverse applications

☞ The rationale of the distance calculation and its relation to DDH values

The foundations are detailed in these papers (by using the GGDC you agree to cite at least one of them).

Meier-Kolthoff, J.P., Auch, A.F., Klenk, H.-P., Göker, M. Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics 14:60, 2013. [Citations]
Auch, A.F., Klenk, H.-P., Göker, M. Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs. Standards in Genomic Sciences 2:142-148, 2010. [Citations]
Auch, A.F., Von Jan, M., Klenk, H.-P., Göker, M. Digital DNA-DNA hybridization for microbial species delineation by means of genome-to-genome sequence comparison. Standards in Genomic Sciences 2:117-134, 2010. [Citations]

See also the official press release of the DSMZ (in German) and our presentation given at the 3rd joint conference of the DGHM and the VAAM.

☞ Delineation of prokaryotic subspecies

The delineation of subspecies with the GGDC has been introduced in the following study (by using the GGDC subspecies report you agree to cite this paper):

Meier-Kolthoff, J.P., Hahnke, R.L., Petersen, J., Scheuner, C., Michael, V., Fiebig, A., Rohde, C., Rohde, M., Fartmann, B., Goodwin, L.A., Chertkov, O., Reddy, T., Pati, A., Ivanova, N.N., Markowitz, V., Kyrpides, N.C., Woyke, T., Göker, M., Klenk, H-P. Complete genome sequence of DSM 30083^T, the type strain (U5/41^T) of Escherichia coli, and a proposal for delineating subspecies in microbial taxonomy. Standards in Genomic Sciences 10:2, 2014. [Citations]

☞ The taxonomic use of DNA G+C content

The taxonomic use of DNA G+C content as calculated from genome-sequence data and its relation to DDH was investigated here (by using the GGDC G+C difference report you agree to cite this paper):

Meier-Kolthoff, J.P., Göker, M., Klenk, H.-P. Taxonomic use of DNA G+C content and DNA-DNA hybridization in the genomic age. Int J Syst Evol Microbiol 64:352-356, 2014. [Citations]

☞ When is a DDH experiment mandatory?

The question of when a DDH experiment should be mandatory in microbial taxonomy given a certain 16S rRNA gene sequence similarity threshold, was revisited here:

Meier-Kolthoff, J.P., Göker, M., Spröer, C., Klenk, H.-P. When should a DDH experiment be mandatory in microbial taxonomy? Archives of Microbiology 195:413-418, 2013. [free local copy] [Citations]

☞ The Genome BLAST Distance Phylogeny method (GBDP)

The GBDP procedure has previously been introduced and updated in the following studies:

Henz, S.R., Huson, D.H., Auch, A.F., Nieselt-Struwe, K., Schuster, S.C. Whole-genome prokaryotic phylogeny. Bioinformatics 21:2329-2335, 2005. [Citations]
Meier-Kolthoff, J.P., Auch, A.F., Klenk, H.-P., Göker, M. Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics 14:60, 2013. [Citations]
Meier-Kolthoff, J.P., Auch, A.F., Klenk, H.-P., Göker, M. Highly parallelized inference of large genome-based phylogenies. Concurr Comput Pr Exper (Special Issue) 26:1715-1729, 2014. [Citations]
Meier-Kolthoff, J.P., Auch, A.F., Klenk, H.-P., Göker, M. GBDP on the grid: a genome-based approach for species delimitation adjusted for an automated and highly parallel processing of large data sets. pp. 83-102 In: Schulz J, Hermann S (eds) Hochleistungsrechnen Baden-Württemb. – Ausgewählte Aktivitäten im bwGRiD 2012. KIT Scientific Publishing, Karlsruhe, 2014. [Citations]
Auch, A.F., Henz, S.R., Holland, B., Göker, M. Genome blast distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences. BMC Bioinformatics 7: 350, 2006. [Citations]
Auch, A.F., Henz. S.R., Göker, M. Phylogenies from whole genomes - Methodological update within a distance-based framework. German conference on Bioinformatics, Tübingen 2006. Published online via tobias-lib.

☞ Scientific studies making use of GBDP

The suitability of GBDP for phylogenetic reconstruction is exemplified by a number of studies. Here is a selection:

Hördt A., García-López M., Meier-Kolthoff J.P., Schleuning M., Weinhold L.M., Tindall B.J., Gronow S., Kyrpides N.C., Woyke T., and Göker M. Analysis of 1,000+ Type-Strain Genomes Substantially Improves Taxonomic Classification of Alphaproteobacteria. Front Microbiol. 11:468, 2020.
Dedysh S.N., Henke P., Ivanova A.A., Kulichevskaya I.S., Philippov D.A., Meier-Kolthoff J.P., Göker M., Huang S., and Overmann J. 100-year-old enigma solved: identification, genomic characterization and biogeography of the yet uncultured Planctomyces bekefii. Environ Microbiol. 22:198–211, 2020.
Strepis N., Naranjo H.D., Meier-Kolthoff J.P., Göker M., Shapiro N., Kyrpides N., Klenk H.-P., Schaap P.J., Stams A.J.M., and Sousa D.Z. Genome-guided analysis allows the identification of novel physiological traits in Trichococcus species. BMC Genomics. 21:1–13, 2020.
García-López M., Meier-Kolthoff J.P., Tindall B.J., Gronow S., Woyke T., Kyrpides N.C., Hahnke, R. L., Göker M. Analysis of 1,000 type-strain genomes improves taxonomic classification of Bacteroidetes. Front Microbiol. 10:2083, 2019.
Thorell K., Meier-Kolthoff J.P., Sjöling Å., Martín-Rodríguez A.J. Whole-Genome sequencing redefines Shewanella taxonomy. Front Microbiol. 10:1861, 2019.
Orata F.D., Meier-Kolthoff J.P., Sauvageau D., Stein L.Y. Phylogenomic analysis of the gammaproteobacterial methanotrophs (order Methylococcales) calls for the reclassification of members at the genus and species Levels. Front Microbiol. 9:3162, 2018.
Nouioui I., Carro L., García-López M., Meier-Kolthoff J.P., Woyke T., Kyrpides N.C., Pukall R., Klenk H.P., Goodfellow M. and Göker M. Genome-based taxonomic classification of the phylum Actinobacteria. Front Microbiol 9:1-119, 2018.
Hahnke, R. L., Meier-Kolthoff, J. P., García-López, M., Mukherjee, S., Huntemann, M, Ivanova, N. N., Woyke, T., Kyrpides, N. C., Klenk, H.-P., Göker, M. Genome-Based Taxonomic Classification of Bacteroidetes. Front Microbiol 7:2013, 2017.
Mukherjee, S., Seshadri, R., Varghese, N. J., Eloe-Fadrosh, E. A., Meier-Kolthoff, J. P., Göker, M., Coates, R. C., Hadjithomas, M., Pavlopoulos, G. A., Paez-Espino, D., Yoshikuni, Y., Visel, A., Whitman, W. B., Garrity, G. M., Eisen, J. A., Hugenholtz, P., Pati, A., Ivanova, N. N., Woyke, T., Klenk, H.-P., Kyrpides, N. C. 1,003 Reference Genomes of Bacterial and Archaeal Isolates Expand Coverage of the Tree of Life. Nat Biotechnol 35:676-683, 2017.
Riley, R., Haridas, S., Wolfe, K.H., Lopes, M.R., Hittinger, CT., Göker, M., Salamov, A., Wisecaver, J., Long, T.M., Aerts, A.L., Choi, C., Clum, A., Coughlan, A.Y., Deshpande, S., Alexander, P., Hanson, S.J., Klenk, H.-P., Labutti, K., Lapidus, A., Lindquist, E., Lipzen, A., Meier-Kolthoff, J.P., Ohm, R.A., Otillar, R.P., Pangilinan, J., Rokas, A., Rosa, C.A., Scheuner, C., Sibirny, A., Slot, JC., et, al. Comparative genomics of biotechnologically important yeasts. Proc Natl Acad Sci 113:9882-9887, 2016.
Lagkouvardos, I., Pukall, R., Abt, B., Foesel, B.U., Meier-Kolthoff, J.P., Kumar, N., Bresciani, A., Martínez, I., Just, S., Ziegler, C., Brugiroux, S., Garzetti, D., Wenning, M., Bui, T.P.N., Wang, J., Hugenholtz, F., Plugge, C.M., Peterson, D.A., Hornef, M.W., Baines, J.F., Smidt, H., Walter, J., Kristiansen, K., Nielsen, H.B., Haller, D., Overmann, J., Stecher, B., Clavel, T. The Mouse Intestinal Bacterial Collection (miBC) provides host-specific insight into cultured diversity and functional potential of the gut microbiota. Nat Microbiol 1:16131, 2016.
Peeters, C., Meier-Kolthoff, J.P., Verheyde, B., De Brandt, E., Cooper, V.S., Vandamme, P. Phylogenomic Study of Burkholderia glathei-like organisms, proposal of 13 novel Burkholderia species and emended descriptions of Burkholderia sordidicola, Burkholderia zhejiangensis, and Burkholderia grimmiae. Front Microbiol 7:1-19, 2016.
Garrido-Sanz, D., Meier-Kolthoff, J.P., Göker, M., Martin, M., Rivilla, R., Redondo-Nieto, M. Genomic and genetic diversity within the Pseudomonas fluorescens complex. PLoS ONE 11:e0150183, 2016.
Liu, Y., Lai, Q., Göker, M., Meier-Kolthoff, J.P., Wang, M., Sun, Y., Wang, L., Shao, Z. Genomic insights into the taxonomic status of the Bacillus cereus group. Scientific Reports 5:14082, 2015.
Patil, K.R., McHardy, A.C. Alignment-free genome tree inference by learning group-specific distance metrics. Genome Biol Evol 5:1470-84, 2013. [Citations]

GGDC Scientific Background