Intertextual correspondence for integrating corpora

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)
151 Downloads (Pure)

Abstract

We present intertextual correspondence (ITC) as an integrative technique for combining annotated text corpora. The topical correspondence between different texts can be exploited to establish new annotation connections between existing corpora. Although the general idea should not be restricted to one particular theoretical framework, we explain how the annotation of intertextual correspondence works for two corpora annotated with argumentative notions on the basis of Inference Anchoring Theory. The annotated corpora we take as examples are topically and temporally related: the first corpus comprises television debates leading up to the 2016 presidential elections in the United States, the second corpus consists of commentary on and discussion of those debates on the social media platform Reddit. The integrative combination enriches the existing corpora in terms of the argumentative density, conceived of as the number of inference, conflict and rephrase relations relative to the word count of the (sub-)corpus. ITC also affects the global properties of the corpus, such as the most divisive issue. Moreover, the ability to extend existing corpora whilst maintaining the level of internal cohesion is beneficial to the use of the integrated corpus as resource for text and argument mining based on machine learning.

Original languageEnglish
Title of host publicationLREC 2018, Eleventh International Conference on Language Resources and Evaluation
EditorsHitoshi Isahara, Bente Maegaard, Stelios Piperidis, Christopher Cieri, Thierry Declerck, Koiti Hasida, Helene Mazo, Khalid Choukri, Sara Goggi, Joseph Mariani, Asuncion Moreno, Nicoletta Calzolari, Jan Odijk, Takenobu Tokunaga
PublisherEuropean Language Resources Association
Pages3511-3517
Number of pages7
ISBN (Electronic)9791095546009
Publication statusPublished - 2018
Event11th International Conference on Language Resources and Evaluation, LREC 2018 - Miyazaki, Japan
Duration: 7 May 201812 May 2018

Conference

Conference11th International Conference on Language Resources and Evaluation, LREC 2018
Country/TerritoryJapan
CityMiyazaki
Period7/05/1812/05/18

Keywords

  • Argument
  • Corpus
  • Debate
  • Dialogue
  • Intertextuality
  • Reddit
  • US presidential elections

ASJC Scopus subject areas

  • Linguistics and Language
  • Education
  • Library and Information Sciences
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Intertextual correspondence for integrating corpora'. Together they form a unique fingerprint.

Cite this