Multimodal Egocentric Analysis of Focused Interactions

Sophia Bano, Tamas Suveges, Jianguo Zhang, Stephen McKenna (Lead / Corresponding author)

Research output: Contribution to journalArticle

3 Citations (Scopus)
132 Downloads (Pure)

Abstract

Continuous detection of social interactions from wearable sensor data streams has a range of potential applications in domains including health and social care, security, and assistive technology. We contribute an annotated, multimodal dataset capturing such interactions using video, audio, GPS and inertial sensing. We present methods for automatic detection and temporal segmentation of focused interactions using support vector machines and recurrent neural networks with features extracted from both audio and video streams. Focused interaction occurs when co-present individuals, having mutual focus of attention, interact by first establishing face-to-face engagement and direct conversation. We describe an evaluation protocol including framewise, extended framewise and event-based measures and provide empirical evidence that fusion of visual face track scores with audio voice activity scores provides an effective combination. The methods, contributed dataset and protocol together provide a benchmark for future research on this problem. The dataset is available at https://doi.org/10.15132/10000134
Original languageEnglish
Pages (from-to)37493-37505
Number of pages13
JournalIEEE Access
Volume6
Early online date25 Jun 2018
DOIs
Publication statusPublished - 25 Jun 2018

    Fingerprint

Keywords

  • Social interaction
  • egocentric sensing
  • multimodal analysis
  • temporal segmentation

Cite this