Navigating the development challenges in creating complex data systems

Sören Dittmer (Lead / Corresponding author), Michael Roberts (Lead / Corresponding author), Julian Gilbey, Ander Biguri, , Jacobus Preller, James H. F. Rudd, John A. D. Aston, Carola Bibiane Schönlieb

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)


Data science systems (DSSs) are a fundamental tool in many areas of research and are now being developed by people with a myriad of backgrounds. This is coupled with a crisis in the reproducibility of such DSSs, despite the wide availability of powerful tools for data science and machine learning over the past decade. We believe that perverse incentives and a lack of widespread software engineering skills are among the many causes of this crisis and analyse why software engineering and building large complex systems is, in general, hard. Based on these insights, we identify how software engineering addresses those difficulties and how one might apply and generalize software engineering methods to make DSSs more fit for purpose. We advocate two key development philosophies: one should incrementally grow—not plan then build—DSSs, and one should use two types of feedback loop during development—one that tests the code’s correctness and another that evaluates the code’s efficacy.

Original languageEnglish
Pages (from-to)681-686
Number of pages6
JournalNature Machine Intelligence
Early online date1 Jun 2023
Publication statusPublished - 1 Jun 2023


  • Applied mathematics
  • Software

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications
  • Artificial Intelligence


Dive into the research topics of 'Navigating the development challenges in creating complex data systems'. Together they form a unique fingerprint.

Cite this