Abstract
The paper proposes a new complex method of corpus creation under the constraint of bounded, short period of time available for the annotation process. One important consequence of such a constraint is that it does not leave time for the traditional techniques of corpus evaluation of Inter-Annotator Agreement, IAA. Therefore, we designed, tested and improved a multi-layer annotation process with each subsequent layer aiming to replace IAA with an alternative method allowing for the creation of high-quality corpus.
We built our method on two approaches to corpus creation: iterative enhancement (IE) which aims to improve the annotation in several iterations using automatic techniques to look for inconsistencies in the manual annotation, and agile corpus creation (ACC) which replaces the traditional, linear-phase approach with a cyclic and iterative small-step process. The layers in our approach can be viewed as such iterative cycles which aim to improve the result of the annotation, however, our process is also adapted to handle time-constraint and the annotation of complex linguistic phenomena (dialogical argumentation) where (semi-)automatic methods such as IE cannot be successfully applied. Moreover, the full multi-layer annotation process was iterated three times which allowed us to not only improve the corpus as in ACC, but also to improve the annotation process itself.
We built our method on two approaches to corpus creation: iterative enhancement (IE) which aims to improve the annotation in several iterations using automatic techniques to look for inconsistencies in the manual annotation, and agile corpus creation (ACC) which replaces the traditional, linear-phase approach with a cyclic and iterative small-step process. The layers in our approach can be viewed as such iterative cycles which aim to improve the result of the annotation, however, our process is also adapted to handle time-constraint and the annotation of complex linguistic phenomena (dialogical argumentation) where (semi-)automatic methods such as IE cannot be successfully applied. Moreover, the full multi-layer annotation process was iterated three times which allowed us to not only improve the corpus as in ACC, but also to improve the annotation process itself.
Original language | English |
---|---|
Title of host publication | Proceedings of the16th ArgDiaP Conference |
Subtitle of host publication | Argumentation and Corpus Linguistics |
Publisher | ARGDIAP |
Pages | 31-36 |
Number of pages | 5 |
Publication status | Published - 2018 |