A3NET: FAST END-TO-END OBJECT DETECTOR ON NEURAL NETWORK FOR SCENES WITH ARBITRARY SIZE

A.A. Alexeev
Saint-Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University), Postgraduate Student, 49, Kronverksky pr, Saint-Petersburg, 197101, Russia, This email address is being protected from spambots. You need JavaScript enabled to view it.

Yu.N. Matveev
Doctor of Technical Science, ITMO University, Head of Сhair of Speech Information Systems, 49, Kronverksky pr, Saint-Petersburg, 197101, Russia, This email address is being protected from spambots. You need JavaScript enabled to view it.

G.A. Kukharev
Doctor of Technical Science, West Pomeranian University of Technology in Szczecin, Professor, 17, al. Piastów, 70-310, Szczecin, Poland, This email address is being protected from spambots. You need JavaScript enabled to view it.

Received 30 July 2018

Abstract
This paper observes new object detector with use of convolution network with convolution kernel of NiN-type (Network in Network type). Detection means simultaneous object localization and recognition in the scene.
Detector's work is possible for scenes with arbitrary size. For network learning by the supervised learning method the 100x100 pixels frames are used. Offered method has high computational efficiency; time of HD image processing with single CPU core is 300 ms. As it will be clear from the paper high level of network operation repeatability creates conditions for stream parallel data processing by GPU with estimated execution time less than 10 ms. Our method is robust to small overlapping and to the not so high quality of detected objects' images; it is end-to-end learning model; it gives the bounding rectangles and object classes for whole image at its output. In the paper the russian open database of images, received from drive recorders is used for estimation of the object detection algorithm. Similar approach is usable for detection and estimation of other types of objects such as human faces. This method extends beyond processing of single object type; simultaneous detection of objects' combination is possible. Algorithmic validation for detector's work was carried out basing on own A3Net framework without third-party neural network programs.

Key words
Object, detection, region proposal, CNN, NiN.

https://doi.org/10.31776/RTCJ.6305

Bibliographic description
Alexeev, A., Matveev, Y. and Kukharev, G. (2018). A3Net: fast end-to-end object detector on neural network for scenes with arbitrary size. Robotics and Technical Cybernetics, 3(20), pp.43-52.

UDC identifier:
004

References

Ren, S., He, K., Girshick, R. and Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. ArXiv e-prints.
Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2015). You Only Look Once: Unified, Real-Time Object Detection. [online] Arxiv.org. Available at: http://arxiv.org/abs/1506.02640 [Accessed 18 Jul. 2018].
Long, J., Shelhamer, E. and Darrell, T. (2014). Fully Convolutional Networks for Semantic Segmentation. [online] Arxiv.org. Available at: http://arxiv.org/abs/1411.4038 [Accessed 18 Jul. 2018].
He, K., Gkioxari, G., Dollar, P. and Girshick, R. (2017). Mask R-CNN. [online] Arxiv.org. Available at: http://arxiv.org/abs/1703.06870 [Accessed 19 Jul. 2018].
Lee, H. and Kim, K. (2018). Simultaneous Traffic Sign Detection and Boundary Estimation using Convolutional Neural Network. [online] arXiv.org. Available at: https://arxiv.org/abs/1802.10019 [Accessed 19 Jul. 2018].
Lin, M., Chen, Q. and Yan, S. (2013). Network In Network. [online] Arxiv.org. Available at: http://arxiv.org/abs/1312.4400 [Accessed 19 Jul. 2018].
Pang, Y., Sun, M., Jiang, X. and Li, X. (2016). Convolution in Convolution for Network in Network. [online] Arxiv.org. Available at: http://arxiv.org/abs/1603.06759 [Accessed 20 Jul. 2018].
Chang, J. and Chen, Y. (2015). Batch-normalized Maxout Network in Network. [online] Arxiv.org. Available at: http://arxiv.org/abs/1511.02583 [Accessed 20 Jul. 2018].
Lecun, Y., Bottou, L., Bengio, Y. and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), pp.2278-2324.
Shakhuro, V. and Konushin, A. (2016). Russian traffic sign images dataset. Computer Optics, 40(2), pp.294-300.
Girshick, R. (2015). Fast R-CNN. [online] Arxiv.org. Available at: http://arxiv.org/abs/1504.08083 [Accessed 18 Jul. 2018].
Nasir Uddin Laskar, M., Giraldo, L. and Schwartz, O. (2018). Correspondence of Deep Neural Networks and the Brain for Visual Textures. ArXiv e-prints.
Kingma, D. and Ba, J. (2014). Adam: A Method for Stochastic Optimization. [online] Arxiv.org. Available at: http://arxiv.org/abs/1412.6980 [Accessed 19 Jul. 2018].
Keskar, N., Mudigere, D., Nocedal, J., Smelyanskiy, M. and Tang, P. (2016). On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. [online] Arxiv.org. Available at: http://arxiv.org/abs/1609.04836 [Accessed 19 Jul. 2018].
Perera, P. and Patel, V. (2018). Learning Deep Features for One-Class Classification. [online] Arxiv.org. Available at: http://arxiv.org/abs/1801.05365 [Accessed 17 Jul. 2018].
Sabour, S., Frosst, N. and Hinton, G. (2017). Dynamic Routing Between Capsules. [online] Arxiv.org. Available at: http://arxiv.org/abs/1710.09829 [Accessed 16 Jul. 2018].
Hinton, G., Sabour, S. and Frosst, N. (2018). Matrix capsules with EM routing. [online] Openreview.net. Available at: https://openreview.net/forum?id=HJWLfGWRb [Accessed 15 Jul. 2018].
Abadi, M. and et al. (2016). TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. [online] Arxiv.org. Available at: http://arxiv.org/abs/1603.04467 [Accessed 17 Jul. 2018].
Eldan, R. and Shamir, O. (2015). The Power of Depth for Feedforward Neural Networks. [online] Arxiv.org. Available at: http://arxiv.org/abs/1512.03965 [Accessed 16 Jul. 2018].