The serum metabolome contains a plethora of biomarkers and causative agents of various diseases, some of which are endogenously produced and some that have been taken up from the environment1. The origins of specific compounds are known, including metabolites that are highly heritable2,3, or those that are influenced by the gut microbiome4, by lifestyle choices such as smoking5, or by diet6. However, the key determinants of most metabolites are still poorly understood. Here we measured the levels of 1,251 metabolites in serum samples from a unique and deeply phenotyped healthy human cohort of 491 individuals. We applied machine-learning algorithms to predict metabolite levels in held-out individuals on the basis of host genetics, gut microbiome, clinical parameters, diet, lifestyle and anthropometric measurements, and obtained statistically significant predictions for more than 76% of the profiled metabolites. Diet and microbiome had the strongest predictive power, and each explained hundreds of metabolites—in some cases, explaining more than 50% of the observed variance. We further validated microbiome-related predictions by showing a high replication rate in two geographically independent cohorts7,8 that were not available to us when we trained the algorithms. We used feature attribution analysis9 to reveal specific dietary and bacterial interactions. We further demonstrate that some of these interactions might be causal, as some metabolites that we predicted to be positively associated with bread were found to increase after a randomized clinical trial of bread intervention. Overall, our results reveal potential determinants of more than 800 metabolites, paving the way towards a mechanistic understanding of alterations in metabolites under different conditions and to designing interventions for manipulating the levels of circulating metabolites.
- Machine learning