Development of a thematic and neural network model for data learning

Akerke Аkanova; Nazira Ospanova; Saltanat Sharipova; Gulalem Мauina; Zhanat Abdugulova

doi:10.15587/1729-4061.2022.263421

Authors

Akerke Аkanova S. Seifullin Kazakh Agro Technical University, Kazakhstan https://orcid.org/0000-0002-7178-2121
Nazira Ospanova Toraighyrov University, Kazakhstan https://orcid.org/0000-0003-0100-1008
Saltanat Sharipova S. Seifullin Kazakh Agro Technical University, Kazakhstan https://orcid.org/0000-0001-7267-3261
Gulalem Мauina S. Seifullin Kazakh Agro Technical University, Kazakhstan https://orcid.org/0000-0001-9753-6781
Zhanat Abdugulova L. N. Gumilyov Eurasian National University , Kazakhstan https://orcid.org/0000-0001-7462-4623

DOI:

https://doi.org/10.15587/1729-4061.2022.263421

Keywords:

multilayer neural network, LDA model, deep learning, backpropagation

Abstract

Research in the field of semantic text analysis begins with the study of the structure of natural language. The Kazakh language is unique in that it belongs to agglutinative languages and requires careful study. The object of this study is the text in the Kazakh language. Existing approaches to the study of the semantic analysis of text in the Kazakh language do not consider text analysis using the methods of thematic modeling and learning of neural networks. The purpose of this study is to determine the quality of a topic model based on the LDA (Latent Dirichlet Allocation) method with Gibbs sampling, through neural network learning. The LDA model can determine the semantic probability of the keywords of a single document and give them a rating score. To build a neural network, one of the widely used LSTM architectures was used, which has proven itself well in working with NLP (Natural Language Processing). As a result of learning, it is possible to see to what extent the text was trained and how the semantic analysis of the text in the Kazakh language went. The system, developed on the basis of the LDA model and neural network learning, combines the detected keywords into separate topics. In general, the experimental results showed that the use of deep neural networks gives the expected results of the quality of the LDA model in the processing of the Kazakh language. The developed model of the neural network contributes to the assessment of the accuracy of the semantics of the used text in the Kazakh language. The results obtained can be applied in systems for processing text data, for example, when checking the compliance of the topic and content of the proposed texts (abstracts, term papers, theses, and other works).

Author Biographies

Akerke Аkanova, S. Seifullin Kazakh Agro Technical University

PhD, Senior Lecturer

Department of Computer Engineering and Software

Nazira Ospanova, Toraighyrov University

PhD, Associate Professor, Head of Department

Department of Information Technology

Saltanat Sharipova, S. Seifullin Kazakh Agro Technical University

Master of Science in Informatics

Department of Computer Engineering and Software

Gulalem Мauina, S. Seifullin Kazakh Agro Technical University

Master of Engineering and Technology

Department of Information Systems

Zhanat Abdugulova, L. N. Gumilyov Eurasian National University

Candidate of Economic Sciences, Associate Professor

Department of Systems Analysis and Management

References

Garcia-Arroyo, J. L., Garcia-Zapirain, B. (2017). Segmentation of skin lesions based on fuzzy classification of pixels and histogram thresholding. arXiv. doi: https://doi.org/10.48550/arXiv.1703.03888
De Falco, I., De Pietro, G., Della Cioppa, A., Sannino, G., Scafuri, U., Tarantino, E. (2019). Evolution-based configuration optimization of a Deep Neural Network for the classification of Obstructive Sleep Apnea episodes. Future Generation Computer Systems, 98, 377–391. doi: https://doi.org/10.1016/j.future.2019.01.049
Jafari, M., Xu, H. (2018). Intelligent Control for Unmanned Aerial Systems with System Uncertainties and Disturbances Using Artificial Neural Network. Drones, 2 (3), 30. doi: https://doi.org/10.3390/drones2030030
Feng, B., Xu, J., Lin, Y., Li, P. (2020). A Period-Specific Combined Traffic Flow Prediction Based on Travel Speed Clustering. IEEE Access, 8, 85880–85889. doi: https://doi.org/10.1109/access.2020.2992657
Mehedi, I. M., Bassi, H., Rawa, M. J., Ajour, M., Abusorrah, A., Vellingiri, M. T. et. al. (2021). Intelligent Machine Learning With Evolutionary Algorithm Based Short Term Load Forecasting in Power Systems. IEEE Access, 9, 100113–100124. doi: https://doi.org/10.1109/access.2021.3096918
Christianto, Christian, J., Rusli, A. (2020). Evaluating RNN Architectures for Handling Imbalanced Dataset in Multi-Class Text Classification in Bahasa Indonesia. International Journal of Advanced Trends in Computer Science and Engineering, 9 (5), 8418–8423. doi: https://doi.org/10.30534/ijatcse/2020/217952020
Shen, M., Lei, J., Du, F., Bi, Z. (2020). Power Micro-Blog Text Classification Based on Domain Dictionary and LSTM-RNN. Testbeds and Research Infrastructures for the Development of Networks and Communications, 36–45. doi: https://doi.org/10.1007/978-3-030-43215-7_3
Shi, M., Liu, J., Zhou, D., Tang, M., Cao, B. (2017). WE-LDA: A Word Embeddings Augmented LDA Model for Web Services Clustering. 2017 IEEE International Conference on Web Services (ICWS). doi: https://doi.org/10.1109/icws.2017.9
Hwang, M.-H., Ha, S., In, M., Lee, K. (2018). A Method of Trend Analysis using Latent Dirichlet Allocation. International Journal of Control and Automation, 11 (5), 173–182. doi: https://doi.org/10.14257/ijca.2018.11.5.15
Huang, B., Yang, Y., Mahmood, A., Wang, H. (2012). Microblog Topic Detection Based on LDA Model and Single-Pass Clustering. Lecture Notes in Computer Science, 166–171. doi: https://doi.org/10.1007/978-3-642-32115-3_19
Gao, L., Eldin, N. (2014). Employers’ Expectations: A Probabilistic Text Mining Model. Procedia Engineering, 85, 175–182. doi: https://doi.org/10.1016/j.proeng.2014.10.542
Tang, Z., Zhang, X., Niu, J. (2020). LDA Model and Network Embedding-Based Collaborative Filtering Recommendation. 2019 6th International Conference on Dependable Systems and Their Applications (DSA). doi: https://doi.org/10.1109/dsa.2019.00043
Xu, Y., Zuo, X. (2016). A LDA model based text-mining method to recommend reviewer for proposal of research project selection. 2016 13th International Conference on Service Systems and Service Management (ICSSSM). doi: https://doi.org/10.1109/icsssm.2016.7538568
Xu, G., Wu, X., Yao, H., Li, F., Yu, Z. (2019). Research on Topic Recognition of Network Sensitive Information Based on SW-LDA Model. IEEE Access, 7, 21527–21538. doi: https://doi.org/10.1109/access.2019.2897475
Akanova, A., Ospanova, N., Kukharenko, Y., Abildinova, G. (2019). Development of the algorithm of keyword search in the Kazakh language text corpus. Eastern-European Journal of Enterprise Technologies, 5 (2 (101)), 26–32. doi: https://doi.org/10.15587/1729-4061.2019.179036
Blei, D. M., Ng, A. Y., Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning researchm 3, 993–1022. Available at: https://jmlr.org/papers/volume3/blei03a/blei03a.pdf
lda: Topic modeling with latent Dirichlet Allocation. Available at: https://lda.readthedocs.io/en/latest/
Buddana, H. V. K. S., Kaushik, S. S., Manogna, P., P.S., S. K. (2021). Word Level LSTM and Recurrent Neural Network for Automatic Text Generation. 2021 International Conference on Computer Communication and Informatics (ICCCI). doi: https://doi.org/10.1109/iccci50826.2021.9402488
Zhang, D., Luo, T., Wang, D. (2016). Learning from LDA Using Deep Neural Networks. Lecture Notes in Computer Science, 657–664. doi: https://doi.org/10.1007/978-3-319-50496-4_59
Understanding LSTM Networks. Available at: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Graves, A. (2014). Generating Sequences With Recurrent Neural Networks. arxiv.org. doi: https://doi.org/10.48550/arXiv.1308.0850
Kingma, D., Ba, J. (2015). Adam: A Method for Stochastic Optimization. Published as a conference paper at the 3rd International Conference for Learning Representations. San Diego. doi: https://doi.org/10.48550/arXiv.1412.6980
Smirnova, O. S., Shishkov, V. V. (2016). The choice of the topology of neural networks and their use for the classification of small texts. International Journal of Open Information Technologies, 4 (8), 50–54. Available at: https://cyberleninka.ru/article/n/vybor-topologii-neyronnyh-setey-i-ih-primenenie-dlya-klassifikatsii-korotkih-tekstov

Development of a thematic and neural network model for data learning

Authors

DOI:

Keywords:

Abstract

Author Biographies

Akerke Аkanova, S. Seifullin Kazakh Agro Technical University

Nazira Ospanova, Toraighyrov University

Saltanat Sharipova, S. Seifullin Kazakh Agro Technical University

Gulalem Мauina, S. Seifullin Kazakh Agro Technical University

Zhanat Abdugulova, L. N. Gumilyov Eurasian National University

References

Downloads

Published

How to Cite

Issue

Section

License

Language

Information

Make a Submission

Developed By

Current Issue