TY - JOUR
T1 - Defining the Rhizobium leguminosarum Species Complex
AU - Young, J. Peter W.
AU - Moeskjær, Sara
AU - Afonin, Alexey
AU - Rahi, Praveen
AU - Maluk, Marta
AU - James, Euan K.
AU - Cavassim, Maria Izabel A.
AU - Rashid, M. Harun-Or
AU - Aserse, Aregu Amsalu
AU - Perry, Benjamin J.
AU - Wang, En Tao
AU - Velázquez, Encarna
AU - Andronov, Evgeny E.
AU - Tampakaki, Anastasia
AU - Flores Félix, José David
AU - Rivas González, Raúl
AU - Youseif, Sameh H.
AU - Lepetit, Marc
AU - Boivin, Stéphane
AU - Jorrin, Beatriz
AU - Kenicer, Gregory J.
AU - Peix, Álvaro
AU - Hynes, Michael F.
AU - Ramírez-Bahena, Martha Helena
AU - Gulati, Arvind
AU - Tian, Chang-Fu
N1 - Funding: Funding for genome sequencing and analysis was received from Innovation Fund Denmark (4105-00007A, led by S. U. Andersen) to J.P.W.Y.; European Community FP7 ‘Legumes for the Agriculture of Tomorrow’ (LEGATO, FP7-613551) to M.L. and J.P.W.Y.; Agence Nationale de la Recherche (GrasP) to M.L.; Rural and Environment Science and Analytical Services (RESAS, Scotland) and ‘Transition paths to sustainable legume based systems in Europe’ (TRUE, EC Horizon 2020, 727973 led by P. Iannetta) to M.M.; Genomia Fund to E.K.J., RSF 17-76-30016 to A.A.; RSF 19-16- 00081 to E.E.A.; VA2I/463AC06 and CLU-2018-04 to R.R.G.; SERB-DST, India (YSS/2015/000149) to P.R.; a University of Otago Research Grant to B.J.P.; CSIR, India (BSC0117) and CSIR-HRDG, India (21(1023)/16/EMR-II) to A.G. The majority of the genomes used in this study were sequenced by MicrobesNG (http://www.microbesng.uk) which was supported by the BBSRC (grant number BB/L024209/1).
PY - 2021/1/18
Y1 - 2021/1/18
N2 - Bacteria currently included in Rhizobium leguminosarum are too diverse to be considered a single species, so we can refer to this as a species complex (the Rlc). We have found 429 publicly available genome sequences that fall within the Rlc and these show that the Rlc is a distinct entity, well separated from other species in the genus. Its sister taxon is R. anhuiense. We constructed a phylogeny based on concatenated sequences of 120 universal (core) genes, and calculated pairwise average nucleotide identity (ANI) between all genomes. From these analyses, we concluded that the Rlc includes 18 distinct genospecies, plus 7 unique strains that are not placed in these genospecies. Each genospecies is separated by a distinct gap in ANI values, usually at approximately 96% ANI, implying that it is a 'natural' unit. Five of the genospecies include the type strains of named species: R. laguerreae, R. sophorae, R. ruizarguesonis, "R. indicum" and R. leguminosarum itself. The 16S ribosomal RNA sequence is remarkably diverse within the Rlc, but does not distinguish the genospecies. Partial sequences of housekeeping genes, which have frequently been used to characterize isolate collections, can mostly be assigned unambiguously to a genospecies, but alleles within a genospecies do not always form a clade, so single genes are not a reliable guide to the true phylogeny of the strains. We conclude that access to a large number of genome sequences is a powerful tool for characterizing the diversity of bacteria, and that taxonomic conclusions should be based on all available genome sequences, not just those of type strains.
AB - Bacteria currently included in Rhizobium leguminosarum are too diverse to be considered a single species, so we can refer to this as a species complex (the Rlc). We have found 429 publicly available genome sequences that fall within the Rlc and these show that the Rlc is a distinct entity, well separated from other species in the genus. Its sister taxon is R. anhuiense. We constructed a phylogeny based on concatenated sequences of 120 universal (core) genes, and calculated pairwise average nucleotide identity (ANI) between all genomes. From these analyses, we concluded that the Rlc includes 18 distinct genospecies, plus 7 unique strains that are not placed in these genospecies. Each genospecies is separated by a distinct gap in ANI values, usually at approximately 96% ANI, implying that it is a 'natural' unit. Five of the genospecies include the type strains of named species: R. laguerreae, R. sophorae, R. ruizarguesonis, "R. indicum" and R. leguminosarum itself. The 16S ribosomal RNA sequence is remarkably diverse within the Rlc, but does not distinguish the genospecies. Partial sequences of housekeeping genes, which have frequently been used to characterize isolate collections, can mostly be assigned unambiguously to a genospecies, but alleles within a genospecies do not always form a clade, so single genes are not a reliable guide to the true phylogeny of the strains. We conclude that access to a large number of genome sequences is a powerful tool for characterizing the diversity of bacteria, and that taxonomic conclusions should be based on all available genome sequences, not just those of type strains.
KW - DNA, Bacterial/genetics
KW - Genome, Bacterial
KW - Phylogeny
KW - Rhizobium leguminosarum/classification
KW - Sequence Analysis, DNA
KW - Average nucleotide identity
KW - Genospecies
KW - Bacterial taxonomy
KW - Rhizobium
KW - Core genes
KW - Species complex
KW - Housekeeping genes
KW - Speciation
UR - http://www.scopus.com/inward/record.url?scp=85099913187&partnerID=8YFLogxK
U2 - 10.3390/genes12010111
DO - 10.3390/genes12010111
M3 - Article
C2 - 33477547
SN - 2073-4425
VL - 12
JO - Genes
JF - Genes
IS - 1
M1 - 111
ER -