Automatic generation of naturalistic child—adult interaction data

Yevgen Matusevych, Afra Alishahi, Paul Vogt

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

The input to a cognitively plausible model of language acquisition must have the same information components and statistical properties as the child-directed speech. There are collections of child-directed utterances (e.g., CHILDES), but a realistic representation of their visual and semantic context is not available. We propose three quantitative measures for analyzing the statistical properties of a manually annotated sample of child-adult interaction videos, and compare these against the scene representations automatically generated from the same child-directed utterances, showing that these two datasets are significantly different. To address this problem, we propose an interaction-based framework for generating utterances and scenes based on the co-occurrence frequencies collected from the annotated videos, and show that the resulting interaction-based dataset is comparable to naturalistic data. We use an existing model of cross-situational word learning as a case study for comparing different datasets, and show that only interaction-based data preserve the learning task complexity.
Original languageEnglish
Pages (from-to)2996–3001
Number of pages6
JournalProceedings of the Annual Meeting of the Cognitive Science Society
Publication statusPublished - 2013
Event35th Annual Meeting of the Cognitive Science Society - Berlin, Germany
Duration: 31 Jul 20133 Aug 2013


Dive into the research topics of 'Automatic generation of naturalistic child—adult interaction data'. Together they form a unique fingerprint.

Cite this