1 Aalborg U Robotics, The Faculty of Humanities, Aalborg University, VBN2 Robotics, Vision and Machine Intelligence (RVMI), The Faculty of Engineering and Science, Aalborg University, VBN3 Sektion København, The Technical Faculty of IT and Design, Aalborg University, VBN4 Aalborg University Copenhagen, The Faculty of Humanities, Aalborg University, VBN5 Robotics and Automation, The Faculty of Engineering and Science, Aalborg University, VBN6 Department of Mechanical and Manufacturing Engineering, The Faculty of Engineering and Science, Aalborg University, VBN7 The Faculty of Engineering and Science (ENG), Aalborg University, VBN8 unknown
In this paper we focus on the joint problem of tracking humans and recognizing human action in scenarios such as a kitchen scenario or a scenario where a robot cooperates with a human, e.g., for a manufacturing task. In these scenarios, the human directly interacts with objects physically by using/manipulating them or by, e.g., pointing at them such as in “Give me that…”. To recognize these types of human actions is difficult because (a) they ought to be recognized independent of scene parameters such as viewing direction and (b) the actions are parametric, where the parameters are either object-dependent or as, e.g., in the case of a pointing direction convey important information. One common way to achieve recognition is by using 3D human body tracking followed by action recognition based on the captured tracking data. For the kind of scenarios considered here we would like to argue that 3D body tracking and action recognition should be seen as an intertwined problem that is primed by the objects on which the actions are applied. In this paper, we are looking at human body tracking and action recognition from a object-driven perspective. Instead of the space of human body poses we consider the space of the object affordances, i.e., the space of possible actions that are applied on a given object. This way, 3D body tracking reduces to action tracking in the object (and context) primed parameter space of the object affordances. This reduces the high-dimensional joint-space to a low-dimensional action space. In our approach, we use parametric hidden Markov models to represent parametric movements; particle filtering is used to track in the space of action parameters. We demonstrate its effectiveness on synthetic and on real image sequences using human-upper body single arm actions that involve objects.
Computer Vision and Image Understanding, 2013, Vol 117, Issue 7