Intertextual correspondence for integrating corpora

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)
8 Downloads (Pure)

Abstract

We present intertextual correspondence (ITC) as an integrative technique for combining annotated text corpora. The topical correspondence between different texts can be exploited to establish new annotation connections between existing corpora. Although the general idea should not be restricted to one particular theoretical framework, we explain how the annotation of intertextual correspondence works for two corpora annotated with argumentative notions on the basis of Inference Anchoring Theory. The annotated corpora we take as examples are topically and temporally related: the first corpus comprises television debates leading up to the 2016 presidential elections in the United States, the second corpus consists of commentary on and discussion of those debates on the social media platform Reddit. The integrative combination enriches the existing corpora in terms of the argumentative density, conceived of as the number of inference, conflict and rephrase relations relative to the word count of the (sub-)corpus. ITC also affects the global properties of the corpus, such as the most divisive issue. Moreover, the ability to extend existing corpora whilst maintaining the level of internal cohesion is beneficial to the use of the integrated corpus as resource for text and argument mining based on machine learning.

Original languageEnglish
Title of host publicationLREC 2018, Eleventh International Conference on Language Resources and Evaluation
EditorsHitoshi Isahara, Bente Maegaard, Stelios Piperidis, Christopher Cieri, Thierry Declerck, Koiti Hasida, Helene Mazo, Khalid Choukri, Sara Goggi, Joseph Mariani, Asuncion Moreno, Nicoletta Calzolari, Jan Odijk, Takenobu Tokunaga
PublisherEuropean Language Resources Association
Pages3511-3517
Number of pages7
ISBN (Electronic)9791095546009
Publication statusPublished - 2018
Event11th International Conference on Language Resources and Evaluation, LREC 2018 - Miyazaki, Japan
Duration: 7 May 201812 May 2018

Conference

Conference11th International Conference on Language Resources and Evaluation, LREC 2018
CountryJapan
CityMiyazaki
Period7/05/1812/05/18

Fingerprint

presidential election
social media
group cohesion
television
ability
resources
learning
Intertextual
Inference
Annotation

Keywords

  • Argument
  • Corpus
  • Debate
  • Dialogue
  • Intertextuality
  • Reddit
  • US presidential elections

Cite this

Visser, J., Duthie, R., Lawrence, J., & Reed, C. (2018). Intertextual correspondence for integrating corpora. In H. Isahara, B. Maegaard, S. Piperidis, C. Cieri, T. Declerck, K. Hasida, H. Mazo, K. Choukri, S. Goggi, J. Mariani, A. Moreno, N. Calzolari, J. Odijk, ... T. Tokunaga (Eds.), LREC 2018, Eleventh International Conference on Language Resources and Evaluation (pp. 3511-3517). European Language Resources Association.
Visser, Jacky ; Duthie, Rory ; Lawrence, John ; Reed, Chris. / Intertextual correspondence for integrating corpora. LREC 2018, Eleventh International Conference on Language Resources and Evaluation. editor / Hitoshi Isahara ; Bente Maegaard ; Stelios Piperidis ; Christopher Cieri ; Thierry Declerck ; Koiti Hasida ; Helene Mazo ; Khalid Choukri ; Sara Goggi ; Joseph Mariani ; Asuncion Moreno ; Nicoletta Calzolari ; Jan Odijk ; Takenobu Tokunaga. European Language Resources Association, 2018. pp. 3511-3517
@inproceedings{263caae9e13940929958b0a5f8dfc32b,
title = "Intertextual correspondence for integrating corpora",
abstract = "We present intertextual correspondence (ITC) as an integrative technique for combining annotated text corpora. The topical correspondence between different texts can be exploited to establish new annotation connections between existing corpora. Although the general idea should not be restricted to one particular theoretical framework, we explain how the annotation of intertextual correspondence works for two corpora annotated with argumentative notions on the basis of Inference Anchoring Theory. The annotated corpora we take as examples are topically and temporally related: the first corpus comprises television debates leading up to the 2016 presidential elections in the United States, the second corpus consists of commentary on and discussion of those debates on the social media platform Reddit. The integrative combination enriches the existing corpora in terms of the argumentative density, conceived of as the number of inference, conflict and rephrase relations relative to the word count of the (sub-)corpus. ITC also affects the global properties of the corpus, such as the most divisive issue. Moreover, the ability to extend existing corpora whilst maintaining the level of internal cohesion is beneficial to the use of the integrated corpus as resource for text and argument mining based on machine learning.",
keywords = "Argument, Corpus, Debate, Dialogue, Intertextuality, Reddit, US presidential elections",
author = "Jacky Visser and Rory Duthie and John Lawrence and Chris Reed",
note = "This research was supported by the Engineering and Physical Sciences Research Council in the UK under grants EP/M506497/1 and EP/N014871/1.",
year = "2018",
language = "English",
pages = "3511--3517",
editor = "Hitoshi Isahara and Bente Maegaard and Stelios Piperidis and Christopher Cieri and Thierry Declerck and Koiti Hasida and Helene Mazo and Khalid Choukri and Sara Goggi and Joseph Mariani and Asuncion Moreno and Nicoletta Calzolari and Jan Odijk and Takenobu Tokunaga",
booktitle = "LREC 2018, Eleventh International Conference on Language Resources and Evaluation",
publisher = "European Language Resources Association",

}

Visser, J, Duthie, R, Lawrence, J & Reed, C 2018, Intertextual correspondence for integrating corpora. in H Isahara, B Maegaard, S Piperidis, C Cieri, T Declerck, K Hasida, H Mazo, K Choukri, S Goggi, J Mariani, A Moreno, N Calzolari, J Odijk & T Tokunaga (eds), LREC 2018, Eleventh International Conference on Language Resources and Evaluation. European Language Resources Association, pp. 3511-3517, 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, 7/05/18.

Intertextual correspondence for integrating corpora. / Visser, Jacky; Duthie, Rory; Lawrence, John; Reed, Chris.

LREC 2018, Eleventh International Conference on Language Resources and Evaluation. ed. / Hitoshi Isahara; Bente Maegaard; Stelios Piperidis; Christopher Cieri; Thierry Declerck; Koiti Hasida; Helene Mazo; Khalid Choukri; Sara Goggi; Joseph Mariani; Asuncion Moreno; Nicoletta Calzolari; Jan Odijk; Takenobu Tokunaga. European Language Resources Association, 2018. p. 3511-3517.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Intertextual correspondence for integrating corpora

AU - Visser, Jacky

AU - Duthie, Rory

AU - Lawrence, John

AU - Reed, Chris

N1 - This research was supported by the Engineering and Physical Sciences Research Council in the UK under grants EP/M506497/1 and EP/N014871/1.

PY - 2018

Y1 - 2018

N2 - We present intertextual correspondence (ITC) as an integrative technique for combining annotated text corpora. The topical correspondence between different texts can be exploited to establish new annotation connections between existing corpora. Although the general idea should not be restricted to one particular theoretical framework, we explain how the annotation of intertextual correspondence works for two corpora annotated with argumentative notions on the basis of Inference Anchoring Theory. The annotated corpora we take as examples are topically and temporally related: the first corpus comprises television debates leading up to the 2016 presidential elections in the United States, the second corpus consists of commentary on and discussion of those debates on the social media platform Reddit. The integrative combination enriches the existing corpora in terms of the argumentative density, conceived of as the number of inference, conflict and rephrase relations relative to the word count of the (sub-)corpus. ITC also affects the global properties of the corpus, such as the most divisive issue. Moreover, the ability to extend existing corpora whilst maintaining the level of internal cohesion is beneficial to the use of the integrated corpus as resource for text and argument mining based on machine learning.

AB - We present intertextual correspondence (ITC) as an integrative technique for combining annotated text corpora. The topical correspondence between different texts can be exploited to establish new annotation connections between existing corpora. Although the general idea should not be restricted to one particular theoretical framework, we explain how the annotation of intertextual correspondence works for two corpora annotated with argumentative notions on the basis of Inference Anchoring Theory. The annotated corpora we take as examples are topically and temporally related: the first corpus comprises television debates leading up to the 2016 presidential elections in the United States, the second corpus consists of commentary on and discussion of those debates on the social media platform Reddit. The integrative combination enriches the existing corpora in terms of the argumentative density, conceived of as the number of inference, conflict and rephrase relations relative to the word count of the (sub-)corpus. ITC also affects the global properties of the corpus, such as the most divisive issue. Moreover, the ability to extend existing corpora whilst maintaining the level of internal cohesion is beneficial to the use of the integrated corpus as resource for text and argument mining based on machine learning.

KW - Argument

KW - Corpus

KW - Debate

KW - Dialogue

KW - Intertextuality

KW - Reddit

KW - US presidential elections

UR - http://www.scopus.com/inward/record.url?scp=85059899839&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85059899839

SP - 3511

EP - 3517

BT - LREC 2018, Eleventh International Conference on Language Resources and Evaluation

A2 - Isahara, Hitoshi

A2 - Maegaard, Bente

A2 - Piperidis, Stelios

A2 - Cieri, Christopher

A2 - Declerck, Thierry

A2 - Hasida, Koiti

A2 - Mazo, Helene

A2 - Choukri, Khalid

A2 - Goggi, Sara

A2 - Mariani, Joseph

A2 - Moreno, Asuncion

A2 - Calzolari, Nicoletta

A2 - Odijk, Jan

A2 - Tokunaga, Takenobu

PB - European Language Resources Association

ER -

Visser J, Duthie R, Lawrence J, Reed C. Intertextual correspondence for integrating corpora. In Isahara H, Maegaard B, Piperidis S, Cieri C, Declerck T, Hasida K, Mazo H, Choukri K, Goggi S, Mariani J, Moreno A, Calzolari N, Odijk J, Tokunaga T, editors, LREC 2018, Eleventh International Conference on Language Resources and Evaluation. European Language Resources Association. 2018. p. 3511-3517