Ivan M. Tolstoy
Saint-Petersburg State University of Aerospace Instrumentation (SUAI), Grad-uate Student, 67, Bolshaya Morskaya, Saint-Petersburg, 190000, Russia, tel.: +7(812)328-33-37, This email address is being protected from spambots. You need JavaScript enabled to view it.
Anton I. Saveliev
PhD in Technical Sciences, Saint-Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS), Senior Research Scientist, 39, 14 line V.O., Saint-Petersburg, 199178, Russia, tel.: +7(812)328-34-11, This email address is being protected from spambots. You need JavaScript enabled to view it.
Aleksandr V. Denisov
SPIIRAS, Junior Research Scientist, 39, 14 line V.O., Saint-Petersburg, 199178, Russia, tel.: +7(812)328-04-21, This email address is being protected from spambots. You need JavaScript enabled to view it.
Received 17 September 2018
Abstract
In this paper we present the development of a software interface for gestures recognition and classifying for execution of commands by computer in real-time mode. For its implementation, a comparative study of three different classifiers was carried out: the Viola-Jones method, and the convolutional neural networks MobileNets and Faster R-CNN. The results of classifiers' testing showed that the most preferable classifier for the task of gestures recognition is that based on the Faster R-CNN architecture with an average accuracy of 90%, whereas the similar MobileNets network has 85% accuracy, and the Viola-Jones algorithm – only 31%.
Key words
Artificial neural networks, convolutional neural networks, Viola-Jones method, gesture recognition, object detection.
Acknowledgements
Research was carried out with support of Federal Agency for Scientific Organizations (no. AAAA-A16-116033110095-0).
DOI
https://doi.org/10.31776/RTCJ.6404
Bibliographic description
Tolstoy, I., Saveliev, A. and Denisov, A. (2018). Development of gesture interface for user interaction with robotic devices. Robotics and Technical Cybernetics, 4(21), pp.24-35.
UDC identifier:
004.932.2
References
- Mikhalchenko, D., Ivin, A. and Malov, D. (2018). Obtaining depth map from 2D non stereo images using deep neural networks. International Journal of Intelligent Unmanned Systems, 6(3), pp. 134-146.
- Viola, P. and Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR.
- Papageorgiou, C.P., Oren, M. and Poggio, T. (1998). A general framework for object detection. In: Sixth International Conference on Computer Vision, pp. 555-562.
- Sermanet, P. and et al. (2014). Overfeat: Integrated recognition, localization and detection using convolutional networks. In: ICLR 2014 conference.
- Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, pp. 1097-1105.
- Sharif Razavian, A. and et al. (2014). CNN features off-the-shelf: an astounding baseline for recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 806-813.
- Chen, T. and et al. (2018). Learning to Segment Object Candidates via Recursive Neural Networks. IEEE Transactions on Image Processing, 27(12), pp. 5827-5839.
- Krahenbuhl, P. and Koltun, V. (2015). Learning to propose objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1574-1582.
- Arbelaez, P. and et al. (2014). Multiscale combinatorial grouping. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 328-335.
- Fidler, S. and et al. (2013). Bottom-up segmentation for top-down detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3294-3301.
- Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition. CVPR 2005. IEEE Computer Society Conference on, pp. 886-893.
- Borenstein, E. and Ullman, S. (2008). Combined top-down/bottom-up segmentation. IEEE Transactions on pattern analysis and machine intelligence, 30(12), pp. 2109-2125.
- Cho, M., Kwak S. and Schmid, C. (2015). Jean Ponce Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1201-1210.
- Gould, S., Gao, T. and Koller, D. (2009). Region-based segmentation and object detection. In: Advances in neural information processing systems, pp. 655-663.
- Girshick, R. and et al. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587.
- Uijlings, J. R. R. and et al. (2013). Selective search for object recognition. International journal of computer vision, 104(2), pp. 154-171.
- Cireşan, D.C., Giusti, A., Gambardella, L.M. and Schmidhuber, J. (2013). Mitosis detection in breast cancer histology images with deep neural networks. In: International Conference on medical Image Computing and Computer-assisted Intervention, pp. 411-418.
- Milletari, F., Navab, N. and Ahmadi, S.A. (2016). V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565-571.
- Chen, X., Xiang, S., Liu, C.-L. and Pan, C.-H. (2014). Vehicle detection in satellite images by hybrid deep convolutional neural networks. IEEE Geoscience and Remote Sensing Letters, 11(10), pp. 1797-1801.
- Lawrence, S., Giles, C.L., Tsoi, A.C. and Back, A.D. (1997). Face recognition: A convolutional neural-network approach. IEEE Transactions on Neural, 8(1), pp. 98-113.
- Parkhi, O.M., Vedaldi, A. and Zisserman, A. (2015). Deep face recognition. BMVC, 1(3), p. 6.
- Simard, P. Y., Steinkraus, D. and Platt, J. C. (2003). Best practices for convolutional neural networks applied to visual document analysis. In: Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings, p. 958.
- Long, J., Shelhamer, E. and Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431-3440.
- Howard, A.G. and et al. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. [on-line] Cornell University Library. Available at: https://arxiv.org/abs/1704.04861 [Accessed 05.06.2018].
- Ren, S., He, K., Girshick, R. and Sun, J. (2017). Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis & Machine Intelligence, 6, pp. 1137-1149.
- Karpov, A.A. and Ronzhin, A.L. (2005). Mnogomodal'nye interfejsy v avtomatizirovannyh sistemah upravleniya [Multimodal interfaces in automated control systems]. Izv. vyssh. uchebn. zavedenij: Priborostroenie [Proceedings of the higher educational institutions: Instrumentation], 48(7), pp. 9-14.
- Levonevsky, D.K., Vatamaniuk, I.V. and Saveliev, A.I. (2017). Mnogomodal'naya informacionno-navigacionnaya oblachnaya sistema MINOS dlya korporativnogo kiberfizicheskogo intellektual'nogo prostranstva [MINOS Multimodal Information and Navigation Cloud System for the Corporate Cyber-Physical Smart Space]. Programmnaya inzheneriya [Software engineering], 3, pp. 120-128.
- Levonevsky, D.K., Vatamaniuk, I.V., Saveliev, A.I. and Denisov, A.V. (2016). Korporativnaya informacionnaya sistema obsluzhivaniya pol'zovatelej kak komponent kiberfizicheskogo intellektual'nogo prostranstva [Corporate information service system of users as a component of cyber-physical intellectual space]. Izv. vyssh. uchebn. zavedenij: Priborostroenie. [Proceedings of the higher educational institutions: Instrumentation], 11(59), pp. 906-913.