Development of gesture interface for user interaction with robotic devices

Ivan M. Tolstoy
Saint-Petersburg State University of Aerospace Instrumentation (SUAI), Grad-uate Student, 67, Bolshaya Morskaya, Saint-Petersburg, 190000, Russia, tel.: +7(812)328-33-37, This email address is being protected from spambots. You need JavaScript enabled to view it.

Anton I. Saveliev
PhD in Technical Sciences, Saint-Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS), Senior Research Scientist, 39, 14 line V.O., Saint-Petersburg, 199178, Russia, tel.: +7(812)328-34-11, This email address is being protected from spambots. You need JavaScript enabled to view it.

Aleksandr V. Denisov
SPIIRAS, Junior Research Scientist, 39, 14 line V.O., Saint-Petersburg, 199178, Russia, tel.: +7(812)328-04-21, This email address is being protected from spambots. You need JavaScript enabled to view it.

Received 17 September 2018

Abstract
In this paper we present the development of a software interface for gestures recognition and classifying for execution of commands by computer in real-time mode. For its implementation, a comparative study of three different classifiers was carried out: the Viola-Jones method, and the convolutional neural networks MobileNets and Faster R-CNN. The results of classifiers' testing showed that the most preferable classifier for the task of gestures recognition is that based on the Faster R-CNN architecture with an average accuracy of 90%, whereas the similar MobileNets network has 85% accuracy, and the Viola-Jones algorithm – only 31%.

Key words
Artificial neural networks, convolutional neural networks, Viola-Jones method, gesture recognition, object detection.

Acknowledgements
Research was carried out with support of Federal Agency for Scientific Organizations (no. AAAA-A16-116033110095-0).

DOI
https://doi.org/10.31776/RTCJ.6404

Bibliographic description
Tolstoy, I., Saveliev, A. and Denisov, A. (2018). Development of gesture interface for user interaction with robotic devices. Robotics and Technical Cybernetics, 4(21), pp.24-35.

UDC identifier:
004.932.2

References

Mikhalchenko, D., Ivin, A. and Malov, D. (2018). Obtaining depth map from 2D non stereo images using deep neural networks. International Journal of Intelligent Unmanned Systems, 6(3), pp. 134-146.
Viola, P. and Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR.
Papageorgiou, C.P., Oren, M. and Poggio, T. (1998). A general framework for object detection. In: Sixth International Conference on Computer Vision, pp. 555-562.
Sermanet, P. and et al. (2014). Overfeat: Integrated recognition, localization and detection using convolutional networks. In: ICLR 2014 conference.
Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, pp. 1097-1105.
Sharif Razavian, A. and et al. (2014). CNN features off-the-shelf: an astounding baseline for recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 806-813.
Chen, T. and et al. (2018). Learning to Segment Object Candidates via Recursive Neural Networks. IEEE Transactions on Image Processing, 27(12), pp. 5827-5839.
Krahenbuhl, P. and Koltun, V. (2015). Learning to propose objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1574-1582.
Arbelaez, P. and et al. (2014). Multiscale combinatorial grouping. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 328-335.
Fidler, S. and et al. (2013). Bottom-up segmentation for top-down detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3294-3301.
Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition. CVPR 2005. IEEE Computer Society Conference on, pp. 886-893.
Borenstein, E. and Ullman, S. (2008). Combined top-down/bottom-up segmentation. IEEE Transactions on pattern analysis and machine intelligence, 30(12), pp. 2109-2125.
Cho, M., Kwak S. and Schmid, C. (2015). Jean Ponce Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1201-1210.
Gould, S., Gao, T. and Koller, D. (2009). Region-based segmentation and object detection. In: Advances in neural information processing systems, pp. 655-663.
Girshick, R. and et al. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587.
Uijlings, J. R. R. and et al. (2013). Selective search for object recognition. International journal of computer vision, 104(2), pp. 154-171.
Cireşan, D.C., Giusti, A., Gambardella, L.M. and Schmidhuber, J. (2013). Mitosis detection in breast cancer histology images with deep neural networks. In: International Conference on medical Image Computing and Computer-assisted Intervention, pp. 411-418.
Milletari, F., Navab, N. and Ahmadi, S.A. (2016). V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565-571.
Chen, X., Xiang, S., Liu, C.-L. and Pan, C.-H. (2014). Vehicle detection in satellite images by hybrid deep convolutional neural networks. IEEE Geoscience and Remote Sensing Letters, 11(10), pp. 1797-1801.
Lawrence, S., Giles, C.L., Tsoi, A.C. and Back, A.D. (1997). Face recognition: A convolutional neural-network approach. IEEE Transactions on Neural, 8(1), pp. 98-113.
Parkhi, O.M., Vedaldi, A. and Zisserman, A. (2015). Deep face recognition. BMVC, 1(3), p. 6.
Simard, P. Y., Steinkraus, D. and Platt, J. C. (2003). Best practices for convolutional neural networks applied to visual document analysis. In: Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings, p. 958.
Long, J., Shelhamer, E. and Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431-3440.
Howard, A.G. and et al. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. [on-line] Cornell University Library. Available at: https://arxiv.org/abs/1704.04861 [Accessed 05.06.2018].
Ren, S., He, K., Girshick, R. and Sun, J. (2017). Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis & Machine Intelligence, 6, pp. 1137-1149.
Karpov, A.A. and Ronzhin, A.L. (2005). Mnogomodal'nye interfejsy v avtomatizirovannyh sistemah upravleniya [Multimodal interfaces in automated control systems]. Izv. vyssh. uchebn. zavedenij: Priborostroenie [Proceedings of the higher educational institutions: Instrumentation], 48(7), pp. 9-14.
Levonevsky, D.K., Vatamaniuk, I.V. and Saveliev, A.I. (2017). Mnogomodal'naya informacionno-navigacionnaya oblachnaya sistema MINOS dlya korporativnogo kiberfizicheskogo intellektual'nogo prostranstva [MINOS Multimodal Information and Navigation Cloud System for the Corporate Cyber-Physical Smart Space]. Programmnaya inzheneriya [Software engineering], 3, pp. 120-128.
Levonevsky, D.K., Vatamaniuk, I.V., Saveliev, A.I. and Denisov, A.V. (2016). Korporativnaya informacionnaya sistema obsluzhivaniya pol'zovatelej kak komponent kiberfizicheskogo intellektual'nogo prostranstva [Corporate information service system of users as a component of cyber-physical intellectual space]. Izv. vyssh. uchebn. zavedenij: Priborostroenie. [Proceedings of the higher educational institutions: Instrumentation], 11(59), pp. 906-913.