Improving a model of object recognition in images based on a convolutional neural network
DOI:
https://doi.org/10.15587/1729-4061.2021.233786Keywords:
image processing, object recognition, convolutional neural networks, unmanned aerial vehicleAbstract
This paper considers a model of object recognition in images using convolutional neural networks; the efficiency of the model-based process involving the training of deep layers in convolutional neural networks has been studied. There are objective difficulties associated with determining the optimal characteristics of neural networks, so there is an issue related to retraining a neural network. Eliminating the retraining by determining only the optimal number of epochs is insufficient since it does not provide high accuracy.
The requirements for the set of images for model training and verification have been defined. These requirements are better met by the INRIA image set (France).
GoogLeNet (USA) has been established to be a trained model that can perform object recognition on images but the object recognition reliability is insufficient. Therefore, it becomes necessary to improve the effectiveness of object recognition in images. It is advisable to use the GoogLeNet architecture to build a specialized model that, by changing the parameters and retraining some layers, could allow for better recognition of objects in images.
Ten models were trained using the following parameters: learning speed, the number of epochs, an optimization algorithm, the type of learning speed change, a gamma or power coefficient, a pre-trained model.
A convolutional neural network has been developed to improve the precision and efficiency of object recognition in images. The optimal neural network training parameters were determined: training speed, 0.000025; the number of epochs, 100; a power coefficient, 0.25, etc. A 3 % increase in precision was obtained, which makes it possible to assert the proper choice of the architecture for the developed network and the selection of its parameters. That allows this network to be used for practical tasks of object recognition in images.
References
- Bilinskiy, Y. Y., Knysh, B. P., Kulyk, Y. А. (2017). Quality estimation methodology of filter performance for suppression noise in the mathcad package. Herald of Khmelnytskyi national university, 3, 125–130. Available at: http://ir.lib.vntu.edu.ua/bitstream/handle/123456789/23238/47857.pdf?sequence=2&isAllowed=y
- Gall, J., Razavi, N., Van Gool, L. (2012). An Introduction to Random Forests for Multi-class Object Detection. Outdoor and Large-Scale Real-World Scene Analysis, 243–263. doi: https://doi.org/10.1007/978-3-642-34091-8_11
- Viola, P., Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001. doi: https://doi.org/10.1109/cvpr.2001.990517
- Weiming Hu, Wei Hu, Maybank, S. (2008). AdaBoost-Based Algorithm for Network Intrusion Detection. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 38 (2), 577–583. doi: https://doi.org/10.1109/tsmcb.2007.914695
- Shang, W., Sohn, K., Almeida, D., Honglak, L. (2016). Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units. Proceedings of The 33rd International Conference on Machine Learning, 48, 2217–2225. Available at: http://proceedings.mlr.press/v48/shang16.html
- Simonyan, K., Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. ICLR. Available at: https://arxiv.org/pdf/1409.1556.pdf
- Redmon, J., Divvala, S., Girshick, R., Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi: https://doi.org/10.1109/cvpr.2016.91
- Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D. et. al. (2015). Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi: https://doi.org/10.1109/cvpr.2015.7298594
- Prathap, G., Afanasyev, I. (2018). Deep Learning Approach for Building Detection in Satellite Multispectral Imagery. 2018 International Conference on Intelligent Systems (IS). doi: https://doi.org/10.1109/is.2018.8710471
- Wu, K., Chen, Z., Li, W. (2018). A Novel Intrusion Detection Model for a Massive Network Using Convolutional Neural Networks. IEEE Access, 6, 50850–50859. doi: https://doi.org/10.1109/access.2018.2868993
- Maggiori, E., Tarabalka, Y., Charpiat, G., Alliez, P. (2017). Can semantic labeling methods generalize to any city? The inria aerial image labeling benchmark. 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). doi: https://doi.org/10.1109/igarss.2017.8127684
- Knysh, B., Kulyk, Y. (2021). Development of an image segmentation model based on a convolutional neural network. Eastern-European Journal of Enterprise Technologies, 2 (2 (110)), 6–15. doi: https://doi.org/10.15587/1729-4061.2021.228644
- Krizhevsky, A., Sutskever, I., Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. NIPS'12: Proceedings of the 25th International Conference on Neural Information Processing Systems, 1097–1105. Available at: https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
- Zeiler, M. D., Fergus, R. (2014). Visualizing and Understanding Convolutional Networks. Lecture Notes in Computer Science, 818–833. doi: https://doi.org/10.1007/978-3-319-10590-1_53
- Deep Learning: GoogLeNet Explained. Towards Data Science. Available at: https://towardsdatascience.com/deep-learning-googlenet-explained-de8861c82765
- Tao, A., Barker, J., Sarathy, S. (2016). DetectNet: Deep Neural Network for Object Detection in DIGITS. NVidia developer blog. Available at: https://developer.nvidia.com/blog/detectnet-deep-neural-network-object-detection-digits
- Kingma, D. P., Ba, J. (2015). Adam: a method for stochastic optimization. ICLR 2015. Available at: https://arxiv.org/pdf/1412.6980.pdf
- Kvetny, R. N., Masliy, R. V., Kyrylenko, O. M. (2020). Detection and classification of traffic objects using the environment digits. Optoelectronic Information-Power Technologies, 1 (39), 14–20. doi: https://doi.org/10.31649/1681-7893-2020-39-1-14-20
- Wilson, A. C., Roelofs, R., Stern, M., Srebro, N., Recht, B. (2017). The marginal value of adaptive gradient methods in machine learning. 31st Conference on Neural Information Processing Systems (NIPS 2017). Available at: https://arxiv.org/pdf/1705.08292v2.pdf
- Guo, Z., Chen, Q., Wu, G., Xu, Y., Shibasaki, R., Shao, X. (2017). Village Building Identification Based on Ensemble Convolutional Neural Networks. Sensors, 17 (11), 2487. doi: https://doi.org/10.3390/s17112487
- Erdem, F., Avdan, U. (2020). Comparison of Different U-Net Models for Building Extraction from High-Resolution Aerial Imagery. International Journal of Environment and Geoinformatics, 7 (3), 221–227. doi: https://doi.org/10.30897/ijegeo.684951
- Nvidia Aerial Drone Dataset. Available at: https://nvidia.box.com/shared/static/ft9cc5yjvrbhkh07wcivu5ji9zola6i1.gz
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 Bogdan Knysh, Yaroslav Kulyk
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with TECHNOLOGY CENTER PC, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher TECHNOLOGY CENTER PC does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.