Exploring the efficiency of the combined application of connection pruning and source data preprocessing when training a multilayer perceptron
DOI:
https://doi.org/10.15587/1729-4061.2020.200819Keywords:
multilayer perceptron, neural network, pruning, regularization, learning curve, weight coefficientsAbstract
A conventional scheme to operate neural networks until recently has been assigning the architecture of a neural network and its subsequent training. However, the latest research in this field has revealed that the neural networks that had been set and configured in this way exhibited considerable redundancy. Therefore, the additional operation was to eliminate this redundancy by pruning the connections in the architecture of a neural network. Among the many approaches to eliminating redundancy, the most promising one is the combined application of several methods when their cumulative effect exceeds the sum of effects from employing each of them separately. We have performed an experimental study into the effectiveness of the combined application of iterative pruning and pre-processing (pre-distortions) of input data for the task of recognizing handwritten digits with the help of a multilayer perceptron. It has been shown that the use of input data pre-processing regularizes the procedure of training a neural network, thereby preventing its retraining. The combined application of the iterative pruning and pre-processing of input data has made it possible to obtain a smaller error in the recognition of handwritten digits, 1.22 %, compared to when using the thinning only (the error decreased from 1.89 % to 1.81 %) and when employing the predistortions only (the error decreased from 1.89 % to 1.52 %). In addition, the regularization involving pre-distortions makes it possible to receive a monotonously increasing number of disconnected connections while maintaining the error at 1.45 %. The resulting learning curves for the same task but corresponding to the onset of training under different initial conditions acquire different values both in the learning process and at the end of the training. This shows the multi-extreme character of the quality function – the accuracy of recognition. The practical implication of the study is our proposal to run the multiple training of a neural network in order to choose the best resultReferences
- Nikolenko, S., Kadurin, A., Arhangel'skaya, E. (2018). Glubokoe obuchenie. Sankt-Peterburg: Piter, 480.
- Denil, M., Shakibi, B., Dinh, L., Ranzato, M. A., De Freitas, N. (2014). Predicting Parameters in Deep Learning. ArXiv. Available at: https://arxiv.org/pdf/1306.0543v2.pdf
- Han, S., Pool, J., Tran, J., Dally, W. J. (2015). Learning both Weights and Connections for Efficient Neural Networks. ArXiv. Available at: https://arxiv.org/pdf/1506.02626v3.pdf
- Cun, Y. L., Denker, J. S., Solla, S. A. (1990). Optimal Brain Damage. NIPS. Available at: http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf
- Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. In NIPS, 1269–1277.
- Sainath, T. N., Kingsbury, B., Sindhwani, V., Arisoy, E., Ramabhadran, B. (2013). Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. doi: https://doi.org/10.1109/icassp.2013.6638949
- Molchanov, D., Ashukha, A., Vetrov, D. (2017). Variational dropout sparsifies deep neural networks. arXiv. Available at: https://arxiv.org/pdf/1701.05369.pdf
- Han, S., Mao, H., Dally, W. J. (2016). Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv. Available at: https://arxiv.org/pdf/1510.00149.pdf
- Qiu, J., Song, S., Wang, Y., Yang, H., Wang, J., Yao, S. et. al. (2016). Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA’16. doi: https://doi.org/10.1145/2847263.2847265
- Alford, S., Robinett, R., Milechin, L., Kepner, J. (2019). Training Behavior of Sparse Neural Network Topologies. 2019 IEEE High Performance Extreme Computing Conference (HPEC). doi: https://doi.org/10.1109/hpec.2019.8916385
- Lee, N., Ajanthan, T., Torr, P. H. S. (2019). SNIP: Single-Shot Network Pruning Based on Connection Sensitivity. International Conference on Learning Representations (ICLR 2019).
- Li, Y., Zhao, W., Shang, L. (2019). Really should we pruning after model be totally trained? Pruning based on a small amount of training. arXiv. Available at: https://arxiv.org/pdf/1901.08455v1.pdf
- Loquercio, A., Torre, F. D., Buscema, M. (2017). Computational Eco-Systems for Handwritten Digits Recognition. arXiv. Available at: https://arxiv.org/pdf/1703.01872v1.pdf
- LeCun, Y., Cortes, C., Burges, C. J. C. The MNIST Database of Handwritten Digits. Available at: http://yann.lecun.com/exdb/mnist/
- Tabik, S., Peralta, D., Herrera-Poyatos, A., Herrera, F. (2017). A snapshot of image pre-processing for convolutional neural networks: case study of MNIST. International Journal of Computational Intelligence Systems, 10 (1), 555. doi: https://doi.org/10.2991/ijcis.2017.10.1.38
- Cireşan, D. C., Meier, U., Gambardella, L. M., Schmidhuber, J. (2010). Deep, Big, Simple Neural Nets for Handwritten Digit Recognition. Neural Computation, 22 (12), 3207–3220. doi: https://doi.org/10.1162/neco_a_00052
- Simard, P. Y., Steinkraus, D., Platt, J. C. (2003). Best practices for convolutional neural networks applied to visual document analysis. Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings. doi: https://doi.org/10.1109/icdar.2003.1227801
- Tarik, R. (2017). Sozdaem neyronnuyu set'. Sankt-Peterburg: OOO “Al'fa-kniga”, 272.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2020 Oleg Galchonkov, Alexander Nevrev, Maria Glava, Mykola Babych
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with TECHNOLOGY CENTER PC, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher TECHNOLOGY CENTER PC does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.