Finding Time Together

Detection and Classification of Focused Interaction in Egocentric Video

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)
68 Downloads (Pure)

Abstract

Focused interaction occurs when co-present individuals, having mutual focus of attention, interact by establishing face-to-face engagement and direct conversation. Face-toface engagement is often not maintained throughout the entirety of a focused interaction. In this paper, we present an online method for automatic classification of unconstrained egocentric (first-person perspective) videos into segments having no focused interaction, focused interaction when the camera wearer is stationary and focused interaction when the camera wearer is moving. We extract features from both audio and video data streams and perform temporal segmentation by using support vector machines with linear and non-linear kernels. We provide empirical evidence that fusion of visual face track scores, camera motion profile and audio voice activity scores is an effective combination for focused interaction classification.
Original languageEnglish
Title of host publication2017 IEEE International Conference on Computer Vision Workshop (ICCVW)
PublisherIEEE
Pages2322-2330
Number of pages9
ISBN (Electronic)9781538610343
ISBN (Print)9781538610350
DOIs
Publication statusPublished - 23 Jan 2018
EventIEEE International Conference on Computer Vision Workshops - Venice Convention Centre, Venice, Italy
Duration: 22 Oct 201729 Oct 2017
http://iccv2017.thecvf.com/

Publication series

NameProceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017
Volume2018-January

Conference

ConferenceIEEE International Conference on Computer Vision Workshops
Abbreviated titleICCV 2017
CountryItaly
CityVenice
Period22/10/1729/10/17
Internet address

Fingerprint

Cameras
Support vector machines
Fusion reactions

Keywords

  • Cameras
  • Face
  • Feature extraction
  • Tracking
  • Visualization
  • Legged locomotion

Cite this

Bano, S., Zhang, J., & McKenna, S. (2018). Finding Time Together: Detection and Classification of Focused Interaction in Egocentric Video. In 2017 IEEE International Conference on Computer Vision Workshop (ICCVW) (pp. 2322-2330). (Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017; Vol. 2018-January). IEEE. https://doi.org/10.1109/ICCVW.2017.274
Bano, Sophia ; Zhang, Jianguo ; McKenna, Stephen. / Finding Time Together : Detection and Classification of Focused Interaction in Egocentric Video. 2017 IEEE International Conference on Computer Vision Workshop (ICCVW). IEEE, 2018. pp. 2322-2330 (Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017).
@inproceedings{b418ae4e50cc49e7b09325362aa8ec7a,
title = "Finding Time Together: Detection and Classification of Focused Interaction in Egocentric Video",
abstract = "Focused interaction occurs when co-present individuals, having mutual focus of attention, interact by establishing face-to-face engagement and direct conversation. Face-toface engagement is often not maintained throughout the entirety of a focused interaction. In this paper, we present an online method for automatic classification of unconstrained egocentric (first-person perspective) videos into segments having no focused interaction, focused interaction when the camera wearer is stationary and focused interaction when the camera wearer is moving. We extract features from both audio and video data streams and perform temporal segmentation by using support vector machines with linear and non-linear kernels. We provide empirical evidence that fusion of visual face track scores, camera motion profile and audio voice activity scores is an effective combination for focused interaction classification.",
keywords = "Cameras, Face, Feature extraction, Tracking, Visualization, Legged locomotion",
author = "Sophia Bano and Jianguo Zhang and Stephen McKenna",
note = "This work is supported by the UK Engineering and Physical Sciences Research Council (EPSRC) under grant EP/N014278/1: ACE-LP: Augmenting Communication using Environmental Data to drive Language Prediction. The authors are grateful to Annalu Waller (University of Dundee), the ACE-LP team and CVIP members (University of Dundee) for useful discussions and assistance with dataset collection.",
year = "2018",
month = "1",
day = "23",
doi = "10.1109/ICCVW.2017.274",
language = "English",
isbn = "9781538610350",
series = "Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017",
publisher = "IEEE",
pages = "2322--2330",
booktitle = "2017 IEEE International Conference on Computer Vision Workshop (ICCVW)",

}

Bano, S, Zhang, J & McKenna, S 2018, Finding Time Together: Detection and Classification of Focused Interaction in Egocentric Video. in 2017 IEEE International Conference on Computer Vision Workshop (ICCVW). Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017, vol. 2018-January, IEEE, pp. 2322-2330, IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22/10/17. https://doi.org/10.1109/ICCVW.2017.274

Finding Time Together : Detection and Classification of Focused Interaction in Egocentric Video. / Bano, Sophia; Zhang, Jianguo; McKenna, Stephen.

2017 IEEE International Conference on Computer Vision Workshop (ICCVW). IEEE, 2018. p. 2322-2330 (Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017; Vol. 2018-January).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Finding Time Together

T2 - Detection and Classification of Focused Interaction in Egocentric Video

AU - Bano, Sophia

AU - Zhang, Jianguo

AU - McKenna, Stephen

N1 - This work is supported by the UK Engineering and Physical Sciences Research Council (EPSRC) under grant EP/N014278/1: ACE-LP: Augmenting Communication using Environmental Data to drive Language Prediction. The authors are grateful to Annalu Waller (University of Dundee), the ACE-LP team and CVIP members (University of Dundee) for useful discussions and assistance with dataset collection.

PY - 2018/1/23

Y1 - 2018/1/23

N2 - Focused interaction occurs when co-present individuals, having mutual focus of attention, interact by establishing face-to-face engagement and direct conversation. Face-toface engagement is often not maintained throughout the entirety of a focused interaction. In this paper, we present an online method for automatic classification of unconstrained egocentric (first-person perspective) videos into segments having no focused interaction, focused interaction when the camera wearer is stationary and focused interaction when the camera wearer is moving. We extract features from both audio and video data streams and perform temporal segmentation by using support vector machines with linear and non-linear kernels. We provide empirical evidence that fusion of visual face track scores, camera motion profile and audio voice activity scores is an effective combination for focused interaction classification.

AB - Focused interaction occurs when co-present individuals, having mutual focus of attention, interact by establishing face-to-face engagement and direct conversation. Face-toface engagement is often not maintained throughout the entirety of a focused interaction. In this paper, we present an online method for automatic classification of unconstrained egocentric (first-person perspective) videos into segments having no focused interaction, focused interaction when the camera wearer is stationary and focused interaction when the camera wearer is moving. We extract features from both audio and video data streams and perform temporal segmentation by using support vector machines with linear and non-linear kernels. We provide empirical evidence that fusion of visual face track scores, camera motion profile and audio voice activity scores is an effective combination for focused interaction classification.

KW - Cameras

KW - Face

KW - Feature extraction

KW - Tracking

KW - Visualization

KW - Legged locomotion

UR - http://www.scopus.com/inward/record.url?scp=85046168242&partnerID=8YFLogxK

U2 - 10.1109/ICCVW.2017.274

DO - 10.1109/ICCVW.2017.274

M3 - Conference contribution

SN - 9781538610350

T3 - Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017

SP - 2322

EP - 2330

BT - 2017 IEEE International Conference on Computer Vision Workshop (ICCVW)

PB - IEEE

ER -

Bano S, Zhang J, McKenna S. Finding Time Together: Detection and Classification of Focused Interaction in Egocentric Video. In 2017 IEEE International Conference on Computer Vision Workshop (ICCVW). IEEE. 2018. p. 2322-2330. (Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017). https://doi.org/10.1109/ICCVW.2017.274