Development of a thematic and neural network model for data learning

Authors

DOI:

https://doi.org/10.15587/1729-4061.2022.263421

Keywords:

multilayer neural network, LDA model, deep learning, backpropagation

Abstract

Research in the field of semantic text analysis begins with the study of the structure of natural language. The Kazakh language is unique in that it belongs to agglutinative languages and requires careful study. The object of this study is the text in the Kazakh language. Existing approaches to the study of the semantic analysis of text in the Kazakh language do not consider text analysis using the methods of thematic modeling and learning of neural networks. The purpose of this study is to determine the quality of a topic model based on the LDA (Latent Dirichlet Allocation) method with Gibbs sampling, through neural network learning. The LDA model can determine the semantic probability of the keywords of a single document and give them a rating score. To build a neural network, one of the widely used LSTM architectures was used, which has proven itself well in working with NLP (Natural Language Processing). As a result of learning, it is possible to see to what extent the text was trained and how the semantic analysis of the text in the Kazakh language went. The system, developed on the basis of the LDA model and neural network learning, combines the detected keywords into separate topics. In general, the experimental results showed that the use of deep neural networks gives the expected results of the quality of the LDA model in the processing of the Kazakh language. The developed model of the neural network contributes to the assessment of the accuracy of the semantics of the used text in the Kazakh language. The results obtained can be applied in systems for processing text data, for example, when checking the compliance of the topic and content of the proposed texts (abstracts, term papers, theses, and other works).

Author Biographies

Akerke Аkanova, S. Seifullin Kazakh Agro Technical University

PhD, Senior Lecturer

Department of Computer Engineering and Software

Nazira Ospanova, Toraighyrov University

PhD, Associate Professor, Head of Department

Department of Information Technology

Saltanat Sharipova, S. Seifullin Kazakh Agro Technical University

Master of Science in Informatics

Department of Computer Engineering and Software

Gulalem Мauina, S. Seifullin Kazakh Agro Technical University

Master of Engineering and Technology

Department of Information Systems

Zhanat Abdugulova, L. N. Gumilyov Eurasian National University

Candidate of Economic Sciences, Associate Professor

Department of Systems Analysis and Management

References

  1. Garcia-Arroyo, J. L., Garcia-Zapirain, B. (2017). Segmentation of skin lesions based on fuzzy classification of pixels and histogram thresholding. arXiv. doi: https://doi.org/10.48550/arXiv.1703.03888
  2. De Falco, I., De Pietro, G., Della Cioppa, A., Sannino, G., Scafuri, U., Tarantino, E. (2019). Evolution-based configuration optimization of a Deep Neural Network for the classification of Obstructive Sleep Apnea episodes. Future Generation Computer Systems, 98, 377–391. doi: https://doi.org/10.1016/j.future.2019.01.049
  3. Jafari, M., Xu, H. (2018). Intelligent Control for Unmanned Aerial Systems with System Uncertainties and Disturbances Using Artificial Neural Network. Drones, 2 (3), 30. doi: https://doi.org/10.3390/drones2030030
  4. Feng, B., Xu, J., Lin, Y., Li, P. (2020). A Period-Specific Combined Traffic Flow Prediction Based on Travel Speed Clustering. IEEE Access, 8, 85880–85889. doi: https://doi.org/10.1109/access.2020.2992657
  5. Mehedi, I. M., Bassi, H., Rawa, M. J., Ajour, M., Abusorrah, A., Vellingiri, M. T. et. al. (2021). Intelligent Machine Learning With Evolutionary Algorithm Based Short Term Load Forecasting in Power Systems. IEEE Access, 9, 100113–100124. doi: https://doi.org/10.1109/access.2021.3096918
  6. Christianto, Christian, J., Rusli, A. (2020). Evaluating RNN Architectures for Handling Imbalanced Dataset in Multi-Class Text Classification in Bahasa Indonesia. International Journal of Advanced Trends in Computer Science and Engineering, 9 (5), 8418–8423. doi: https://doi.org/10.30534/ijatcse/2020/217952020
  7. Shen, M., Lei, J., Du, F., Bi, Z. (2020). Power Micro-Blog Text Classification Based on Domain Dictionary and LSTM-RNN. Testbeds and Research Infrastructures for the Development of Networks and Communications, 36–45. doi: https://doi.org/10.1007/978-3-030-43215-7_3
  8. Shi, M., Liu, J., Zhou, D., Tang, M., Cao, B. (2017). WE-LDA: A Word Embeddings Augmented LDA Model for Web Services Clustering. 2017 IEEE International Conference on Web Services (ICWS). doi: https://doi.org/10.1109/icws.2017.9
  9. Hwang, M.-H., Ha, S., In, M., Lee, K. (2018). A Method of Trend Analysis using Latent Dirichlet Allocation. International Journal of Control and Automation, 11 (5), 173–182. doi: https://doi.org/10.14257/ijca.2018.11.5.15
  10. Huang, B., Yang, Y., Mahmood, A., Wang, H. (2012). Microblog Topic Detection Based on LDA Model and Single-Pass Clustering. Lecture Notes in Computer Science, 166–171. doi: https://doi.org/10.1007/978-3-642-32115-3_19
  11. Gao, L., Eldin, N. (2014). Employers’ Expectations: A Probabilistic Text Mining Model. Procedia Engineering, 85, 175–182. doi: https://doi.org/10.1016/j.proeng.2014.10.542
  12. Tang, Z., Zhang, X., Niu, J. (2020). LDA Model and Network Embedding-Based Collaborative Filtering Recommendation. 2019 6th International Conference on Dependable Systems and Their Applications (DSA). doi: https://doi.org/10.1109/dsa.2019.00043
  13. Xu, Y., Zuo, X. (2016). A LDA model based text-mining method to recommend reviewer for proposal of research project selection. 2016 13th International Conference on Service Systems and Service Management (ICSSSM). doi: https://doi.org/10.1109/icsssm.2016.7538568
  14. Xu, G., Wu, X., Yao, H., Li, F., Yu, Z. (2019). Research on Topic Recognition of Network Sensitive Information Based on SW-LDA Model. IEEE Access, 7, 21527–21538. doi: https://doi.org/10.1109/access.2019.2897475
  15. Akanova, A., Ospanova, N., Kukharenko, Y., Abildinova, G. (2019). Development of the algorithm of keyword search in the Kazakh language text corpus. Eastern-European Journal of Enterprise Technologies, 5 (2 (101)), 26–32. doi: https://doi.org/10.15587/1729-4061.2019.179036
  16. Blei, D. M., Ng, A. Y., Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning researchm 3, 993–1022. Available at: https://jmlr.org/papers/volume3/blei03a/blei03a.pdf
  17. lda: Topic modeling with latent Dirichlet Allocation. Available at: https://lda.readthedocs.io/en/latest/
  18. Buddana, H. V. K. S., Kaushik, S. S., Manogna, P., P.S., S. K. (2021). Word Level LSTM and Recurrent Neural Network for Automatic Text Generation. 2021 International Conference on Computer Communication and Informatics (ICCCI). doi: https://doi.org/10.1109/iccci50826.2021.9402488
  19. Zhang, D., Luo, T., Wang, D. (2016). Learning from LDA Using Deep Neural Networks. Lecture Notes in Computer Science, 657–664. doi: https://doi.org/10.1007/978-3-319-50496-4_59
  20. Understanding LSTM Networks. Available at: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  21. Graves, A. (2014). Generating Sequences With Recurrent Neural Networks. arxiv.org. doi: https://doi.org/10.48550/arXiv.1308.0850
  22. Kingma, D., Ba, J. (2015). Adam: A Method for Stochastic Optimization. Published as a conference paper at the 3rd International Conference for Learning Representations. San Diego. doi: https://doi.org/10.48550/arXiv.1412.6980
  23. Smirnova, O. S., Shishkov, V. V. (2016). The choice of the topology of neural networks and their use for the classification of small texts. International Journal of Open Information Technologies, 4 (8), 50–54. Available at: https://cyberleninka.ru/article/n/vybor-topologii-neyronnyh-setey-i-ih-primenenie-dlya-klassifikatsii-korotkih-tekstov

Downloads

Published

2022-08-31

How to Cite

Аkanova A., Ospanova, N., Sharipova, S., Мauina G., & Abdugulova, Z. (2022). Development of a thematic and neural network model for data learning. Eastern-European Journal of Enterprise Technologies, 4(2(118), 40–50. https://doi.org/10.15587/1729-4061.2022.263421