Time-constrained Multi-layer Corpus Creation

Katarzyna Budzynska, Martin Pereira-Fariña, Dominic De Franco, Rory Duthie, Núria Franco-Guillén, Annette Hautli-Janisz, Mathilde Janier, Marcin Koszowy, Luana Marinho, Elena Musi, Alison Pease, Brian Plüss, Chris Reed, Jacky Visser

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The paper proposes a new complex method of corpus creation under the constraint of bounded, short period of time available for the annotation process. One important consequence of such a constraint is that it does not leave time for the traditional techniques of corpus evaluation of Inter-Annotator Agreement, IAA. Therefore, we designed, tested and improved a multi-layer annotation process with each subsequent layer aiming to replace IAA with an alternative method allowing for the creation of high-quality corpus.

We built our method on two approaches to corpus creation: iterative enhancement (IE) which aims to improve the annotation in several iterations using automatic techniques to look for inconsistencies in the manual annotation, and agile corpus creation (ACC) which replaces the traditional, linear-phase approach with a cyclic and iterative small-step process. The layers in our approach can be viewed as such iterative cycles which aim to improve the result of the annotation, however, our process is also adapted to handle time-constraint and the annotation of complex linguistic phenomena (dialogical argumentation) where (semi-)automatic methods such as IE cannot be successfully applied. Moreover, the full multi-layer annotation process was iterated three times which allowed us to not only improve the corpus as in ACC, but also to improve the annotation process itself.
Original languageEnglish
Title of host publicationProceedings of the16th ArgDiaP Conference
Subtitle of host publicationArgumentation and Corpus Linguistics
PublisherARGDIAP
Pages31-36
Number of pages5
Publication statusPublished - 2018

Fingerprint Dive into the research topics of 'Time-constrained Multi-layer Corpus Creation'. Together they form a unique fingerprint.

Cite this