Deep Bilinear Learning for RGB-D Action Recognition

Jian-Fang Hu, Wei-Shi Zheng (Lead / Corresponding author), Jiahui Pan, Jianhuang Lai, Jianguo Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

21 Citations (Scopus)


In this paper, we focus on exploring modality-temporal mutual information for RGB-D action recognition. In order to learn time-varying information and multi-modal features jointly, we propose a novel deep bilinear learning framework. In the framework, we propose bilinear blocks that consist of two linear pooling layers for pooling the input cube features from both modality and temporal directions, separately. To capture rich modality-temporal information and facilitate our deep bilinear learning, a new action feature called modality-temporal cube is presented in a tensor structure for characterizing RGB-D actions from a comprehensive perspective. Our method is extensively tested on two public datasets with four different evaluation settings, and the results show that the proposed method outperforms the state-of-the-art approaches.

Original languageEnglish
Title of host publicationECCV 2018
Subtitle of host publicationComputer Vision - ECCV 2018
EditorsVittorio Ferrari, Cristian Sminchisescu, Martial Hebert, Yair Weiss
Place of PublicationSwitzerland
Number of pages17
ISBN (Electronic)9783030012342
ISBN (Print)9783030012335
Publication statusPublished - 2018
EventEuropean Conference on Computer Vision 2018 - Munich, Germany
Duration: 8 Sept 201814 Sept 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11211 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


ConferenceEuropean Conference on Computer Vision 2018
Abbreviated titleECCV 2018
Internet address


  • Cube
  • Deep bilinear
  • Feature learning
  • RGB-D action

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'Deep Bilinear Learning for RGB-D Action Recognition'. Together they form a unique fingerprint.

Cite this