Abstract:
One of the driving forces of behavior recognition in video is the analysis of surveillance video. In this video, humans are monitored and their actions are classified as being normal or a deviation from the norm. Local spatio-temporal features have gained attention to be an effective descriptor for action recognition in video. The problem of using texture as local descriptor is relatively unexplored. In this paper, a work on human action recognition in video is presented by proposing a fusion of appearance, motion and texture as local descriptor for the bag-of-feature model. Rigorous experiments were conducted in the recorded UTP dataset using the proposed descriptor. The average accuracy obtained was 85.92% for the fused descriptor as compared to 75.06% for the combination of shape and motion descriptor. The result shows an improved performance for the proposed descriptor over the combination of appearance and motion as local descriptor of an interest point