Projects per year
Description
A significant challenge in digital forensics is the lack of a framework for common language and knowledge. This creates barriers to communicating, collaborating and knowledge sharing amongst stakeholders. Methods for creating a comprehensive set of common terms on a topic includes Natural Language Processing (NLP) and Generative Artificial Intelligence (GenAI) algorithms. The efficiency of these algorithms depends on the coverage, quality and quantity of the training corpus. As far as we know, there is no such corpus that is readily available for training these algorithms.
This is a digital forensics practice and research corpus, validated by practitioners working in this domain. The corpus is ready for training new generations of NLP and GenAI algorithms. The associated paper also presents a systematic method of sharing a training corpus, where the data structure, such as folder and file names, make it convenient to programmatically interact with the data.
This is a digital forensics practice and research corpus, validated by practitioners working in this domain. The corpus is ready for training new generations of NLP and GenAI algorithms. The associated paper also presents a systematic method of sharing a training corpus, where the data structure, such as folder and file names, make it convenient to programmatically interact with the data.
Date made available | 24 May 2024 |
---|---|
Publisher | University of Dundee |
Temporal coverage | 1999 - 2021 |
Date of data production | 2022 |
Data Monitor categories
- Digital Forensics Corpus
- Natural Language Processing
- NLP
- Generative Artificial Intelligence
- GenAI
Projects
- 1 Active
-
Leverhulme Research Centre for Forensic Science (LRCFS)
Nic Daeid, N. (Investigator)
1/07/16 → 30/06/26
Project: Research