Semantic-probabilistic network for video context recognition in video systems

Никита Владиславович Коваленко, Светлана Григорьевна Антощук

Abstract


In the recent years a considerable growth of video surveillance uses can be observed.

Due to the increasing scale and complexity of such systems manual maintenance becomes impossible, which raises a problem of developing automated intelligent surveillance systems. One of the most important tasks solved by surveillance systems is human behavior analysis and recognition, which has many applications from patient state monitoring in medical establishments to suspicious behavior detection and to crime prevention. Analysis shows, that graphical probabilistic models such as Bayesian networks are widely used and are highly effective approach for human behavior recognition.

However, a lack of strict data formalization and structuring makes the task of building a Bayesian network for complex human behavior recognition a highly difficult task. To surpass that limitation, we suggest introducing a domain ontology — a hierarchical decomposition of video contents in the terms of scenarios, situations, object roles and states, which are derived from the low-level features, computed from the annotated ground-truth video data using a set of computer vision methods, and then using this otology as a basis for Bayesian network structure learning.

The performance of the proposed framework was evaluated using a HMDB and a CAVIAR datasets, and we noticed an increased efficiency of human behavior recognition compared to other approaches


Keywords


human behavior; probabilistic models; Bayesian network; ontology; semantic models

References


Aggarwal, J. Human activity analysis: A review [Текст] / J. Aggarwal, M. Ryoo // ACM Computing Surveys. — 2011. — Vol. 43, Issue 3. — С. 43.

Aggarwal, J. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities [Текст] / J. Aggarwal, M. Ryoo // In IEEE International Conference on Computer Vision (ICCV). — 2009. — СС. 1593–1600.

Niebles, J. C. Unsupervised learning of human action categories using spatial-temporal words [Текст] / J.C. Niebles, H. Wang, L. Fei-Fei // International Journal of Computer Vision (IJCV). — 2008. — Vol. 79, Issue 3. — СС. 299–318.

Gupta, A. Objects in action: An approach for combining action understanding and object perception [Текст] / A. Gupta, L.S. Davis // In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). — 2007. — СС. 1–8.

Filipovych, R. Recognizing primitive interactions by exploring actor-object states [Текст] / R. Filipovych, E. Ribeiro // IEEE Conference on Computer Vision and Pattern Recognition. — 2008. — СС. 1–7.

Damen, D. Recognizing linked events: Searching the space of feasible explanations [Текст] / D. Damen, D. Hogg // IEEE Conference on Computer Vision and Pattern Recognition. — 2009. — СС. 927–934.

Joo, S.-W. Attribute Grammar-Based Event Recognition and Anomaly Detection [Текст] / S.-W. Joo, R. Chelappa // Conference on Computer Vision and Pattern Recognition Workshop. — 2006. — СС. 107–107.

Gupta, A. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos [Текст] / A. Gupta, P. Srinivasan, J. Shi, L.S. Davis // In IEEE Conference on Computer Vision and Pattern Recognition. — 2009. — СС. 2012–2019.

Malgireddy, M.R. A temporal Bayesian model for classifying, detecting and localizing activities in video sequences [Текст] / M.R. Malgireddy, I. Inwogu, V. Govindaraju // Computer Vision and Pattern Recognition Workshops. — 2012. — СС. 43–48.

Cooper, G. A Bayesian method for the induction of probabilistic networks from data [Текст] / G. Cooper, E. Herskovits // Machine Learning. — 1992. — Vol. 9. — СС. 309–347.

Murphy, K. Dynamic Bayesian Networks: Representation, Inference and Learning [Текст] / K. Murphy // PhD thesis, University of California at Berkley. — 2002.

HMDB: A Large Video Database for Human Motion Recognition [электронный ресурс] / SERRE LAB. A Brown University Research Group. — Режим доступа : WWW/ URL: http://serre-lab.clps.brown.edu/resources/HMDB/ — 2011. — Загл. с экрана.

CAVIAR Test Case Scenarios [электронный ресурс] / INRIA Labs at Grenoble, France. — Режим доступа : WWW/ URL: http://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1/ — 2004. — Загл. с экрана.

Aggarwal, J., Ryoo, M. (2011). Human activity analysis: A review. ACM Computing Surveys, 43 (3), 43.

Aggarwal, J., Ryoo, M. (2009). Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In IEEE International Conference on Computer Vision (ICCV), 1593–1600.

Niebles, J. C., Wang, H., Fei-Fei, L. (2008). Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision (IJCV), 79 (3), 299-318.

Gupta, A., Davis, L.S. (2007). Objects in action: An approach for combining action understanding and object perception. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1–8.

Filipovych, R., Ribeiro, E. (2008). Recognizing primitive interactions

by exploring actor-object states. IEEE Conference on Computer Vision and Pattern Recognition, 1–7.

Damen, D., Hogg, D. (2009). Recognizing linked events: Searching the space of feasible explanations. IEEE Conference on Computer Vision and Pattern Recognition, 927–934.

Joo, S.-W., Chelappa, R. (2006). Attribute Grammar-Based Event Recognition and Anomaly Detection. Conference on Computer Vision and Pattern Recognition Workshop, 107–107.

Gupta, A., Srinivasan, P., Shi, J. & Davis, L.S. (2009). Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In IEEE Conference on Computer Vision and Pattern Recognition, 2012–2019.

Malgireddy, M.R., Inwogu, I., Govindaraju, V. (2012). A temporal Bayesian model for classifying, detecting and localizing activities in video sequences. Computer Vision and Pattern Recognition Workshops, 43–48.

Cooper, G., Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309–347.

Murphy, K. (2002). Dynamic Bayesian Networks: Representation, Inference and Learning. PhD thesis, University of California at Berkley.

HMDB: A Large Video Database for Human Motion Recognition. (2011). Retrieved from http://serre-lab.clps.brown.edu/resources/HMDB/.

CAVIAR Test Case Scenarios. (2004). INRIA Labs at Grenoble, France. Retrieved from http://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1.


GOST Style Citations








Copyright (c) 2014 Никита Владиславович Коваленко, Светлана Григорьевна Антощук

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

ISSN (print) 1729-3774, ISSN (on-line) 1729-4061