AbstractHuman action recognition is one of the important research areas in computer vision. Manipulation action recognition which contains complex human-object interactions is a challenging problem in this area. Especially for the manipulation actions in the kitchen scenario, occlusions among objects and human body parts and the transformations of objects increase the difficulty for action recognition.
Previous methods on manipulation action recognition often rely on high-level representation approaches such as object detection and human body part detection, which contain expensive work of object and human annotations. Meanwhile, the object transformation information has not been considered comprehensively.
This thesis proposes a method for manipulation action recognition by generating and mining superpixel groups in the videos. Manual annotations of objects and human body parts are not required in the method. We develop a new mid-level representation method called superpixel groups to capture object parts, human body parts and object transformation information in manipulation actions. A hierarchical structure can be built based on the superpixel groups. A participant-based mining algorithm is introduced in this thesis. The proposed mining approach combines discriminativity and representativity characteristics to retrieve discriminative patterns for each action, which is more effective than mining methods that only use discriminativity characteristic.
We evaluate the proposed method on two challenging manipulation action datasets, achieving state-of-the-art result on the 10-class classification problem in the 50 Salads dataset in terms of frame-wise accuracy. Our method also obtains comparable result with the methods using additional object detection and human skin detection in the "Actions for Cooking Eggs" dataset contest.
|Date of Award||2019|
|Supervisor||Stephen McKenna (Supervisor) & Jianguo Zhang (Supervisor)|