TY - JOUR
T1 - Conceptualization in reference production
T2 - Probabilistic modeling and experimental testing
AU - van Gompel, Roger P. G.
AU - van Deemter, Kees
AU - Gatt, Albert
AU - Snoeren, Rick
AU - Krahmer, Emiel J.
PY - 2019/4
Y1 - 2019/4
N2 - In psycholinguistics, there has been relatively little work investigating conceptualization-how speakers decide which concepts to express. This contrasts with work in natural language generation (NLG), a subfield of artificial intelligence, where much research has explored content determination during the generation of referring expressions. Existing NLG algorithms for conceptualization during reference production do not fully explain previous psycholinguistic results, so we developed new models that we tested in three language production experiments. In our experiments, participants described target objects to another participant. In Experiment 1, either size, color, or both distinguished the target from all distractor objects; in Experiment 2, either color, type, or both color and type distinguished it from all distractors; In Experiment 3, color, size, or the border around the object distinguished the target. We tested how well the different models fit the distribution of description types (e.g., "small candle," "gray candle," "small gray candle") that participants produced. Across these experiments, the probabilistic referential overspecification model (PRO) provided the best fit. In this model, speakers first choose a property that rules out all distractors. If there is more than one such property, then they probabilistically choose one on the basis of a preference for that property. Next, they sometimes add another property, with the probability again determined by its preference and speakers' eagerness to overspecify.
AB - In psycholinguistics, there has been relatively little work investigating conceptualization-how speakers decide which concepts to express. This contrasts with work in natural language generation (NLG), a subfield of artificial intelligence, where much research has explored content determination during the generation of referring expressions. Existing NLG algorithms for conceptualization during reference production do not fully explain previous psycholinguistic results, so we developed new models that we tested in three language production experiments. In our experiments, participants described target objects to another participant. In Experiment 1, either size, color, or both distinguished the target from all distractor objects; in Experiment 2, either color, type, or both color and type distinguished it from all distractors; In Experiment 3, color, size, or the border around the object distinguished the target. We tested how well the different models fit the distribution of description types (e.g., "small candle," "gray candle," "small gray candle") that participants produced. Across these experiments, the probabilistic referential overspecification model (PRO) provided the best fit. In this model, speakers first choose a property that rules out all distractors. If there is more than one such property, then they probabilistically choose one on the basis of a preference for that property. Next, they sometimes add another property, with the probability again determined by its preference and speakers' eagerness to overspecify.
KW - Computational models
KW - Conceptualization
KW - Overspecification
KW - Reference production
KW - Referring expressions
UR - http://www.scopus.com/inward/record.url?scp=85063327318&partnerID=8YFLogxK
U2 - 10.1037/rev0000138
DO - 10.1037/rev0000138
M3 - Article
C2 - 30907620
AN - SCOPUS:85063327318
VL - 126
SP - 345
EP - 373
JO - Psychological Review
JF - Psychological Review
SN - 0033-295X
IS - 3
ER -