Development of a thematic and neural network model for data learning
DOI:
https://doi.org/10.15587/1729-4061.2022.263421Keywords:
multilayer neural network, LDA model, deep learning, backpropagationAbstract
Research in the field of semantic text analysis begins with the study of the structure of natural language. The Kazakh language is unique in that it belongs to agglutinative languages and requires careful study. The object of this study is the text in the Kazakh language. Existing approaches to the study of the semantic analysis of text in the Kazakh language do not consider text analysis using the methods of thematic modeling and learning of neural networks. The purpose of this study is to determine the quality of a topic model based on the LDA (Latent Dirichlet Allocation) method with Gibbs sampling, through neural network learning. The LDA model can determine the semantic probability of the keywords of a single document and give them a rating score. To build a neural network, one of the widely used LSTM architectures was used, which has proven itself well in working with NLP (Natural Language Processing). As a result of learning, it is possible to see to what extent the text was trained and how the semantic analysis of the text in the Kazakh language went. The system, developed on the basis of the LDA model and neural network learning, combines the detected keywords into separate topics. In general, the experimental results showed that the use of deep neural networks gives the expected results of the quality of the LDA model in the processing of the Kazakh language. The developed model of the neural network contributes to the assessment of the accuracy of the semantics of the used text in the Kazakh language. The results obtained can be applied in systems for processing text data, for example, when checking the compliance of the topic and content of the proposed texts (abstracts, term papers, theses, and other works).
References
- Garcia-Arroyo, J. L., Garcia-Zapirain, B. (2017). Segmentation of skin lesions based on fuzzy classification of pixels and histogram thresholding. arXiv. doi: https://doi.org/10.48550/arXiv.1703.03888
- De Falco, I., De Pietro, G., Della Cioppa, A., Sannino, G., Scafuri, U., Tarantino, E. (2019). Evolution-based configuration optimization of a Deep Neural Network for the classification of Obstructive Sleep Apnea episodes. Future Generation Computer Systems, 98, 377–391. doi: https://doi.org/10.1016/j.future.2019.01.049
- Jafari, M., Xu, H. (2018). Intelligent Control for Unmanned Aerial Systems with System Uncertainties and Disturbances Using Artificial Neural Network. Drones, 2 (3), 30. doi: https://doi.org/10.3390/drones2030030
- Feng, B., Xu, J., Lin, Y., Li, P. (2020). A Period-Specific Combined Traffic Flow Prediction Based on Travel Speed Clustering. IEEE Access, 8, 85880–85889. doi: https://doi.org/10.1109/access.2020.2992657
- Mehedi, I. M., Bassi, H., Rawa, M. J., Ajour, M., Abusorrah, A., Vellingiri, M. T. et. al. (2021). Intelligent Machine Learning With Evolutionary Algorithm Based Short Term Load Forecasting in Power Systems. IEEE Access, 9, 100113–100124. doi: https://doi.org/10.1109/access.2021.3096918
- Christianto, Christian, J., Rusli, A. (2020). Evaluating RNN Architectures for Handling Imbalanced Dataset in Multi-Class Text Classification in Bahasa Indonesia. International Journal of Advanced Trends in Computer Science and Engineering, 9 (5), 8418–8423. doi: https://doi.org/10.30534/ijatcse/2020/217952020
- Shen, M., Lei, J., Du, F., Bi, Z. (2020). Power Micro-Blog Text Classification Based on Domain Dictionary and LSTM-RNN. Testbeds and Research Infrastructures for the Development of Networks and Communications, 36–45. doi: https://doi.org/10.1007/978-3-030-43215-7_3
- Shi, M., Liu, J., Zhou, D., Tang, M., Cao, B. (2017). WE-LDA: A Word Embeddings Augmented LDA Model for Web Services Clustering. 2017 IEEE International Conference on Web Services (ICWS). doi: https://doi.org/10.1109/icws.2017.9
- Hwang, M.-H., Ha, S., In, M., Lee, K. (2018). A Method of Trend Analysis using Latent Dirichlet Allocation. International Journal of Control and Automation, 11 (5), 173–182. doi: https://doi.org/10.14257/ijca.2018.11.5.15
- Huang, B., Yang, Y., Mahmood, A., Wang, H. (2012). Microblog Topic Detection Based on LDA Model and Single-Pass Clustering. Lecture Notes in Computer Science, 166–171. doi: https://doi.org/10.1007/978-3-642-32115-3_19
- Gao, L., Eldin, N. (2014). Employers’ Expectations: A Probabilistic Text Mining Model. Procedia Engineering, 85, 175–182. doi: https://doi.org/10.1016/j.proeng.2014.10.542
- Tang, Z., Zhang, X., Niu, J. (2020). LDA Model and Network Embedding-Based Collaborative Filtering Recommendation. 2019 6th International Conference on Dependable Systems and Their Applications (DSA). doi: https://doi.org/10.1109/dsa.2019.00043
- Xu, Y., Zuo, X. (2016). A LDA model based text-mining method to recommend reviewer for proposal of research project selection. 2016 13th International Conference on Service Systems and Service Management (ICSSSM). doi: https://doi.org/10.1109/icsssm.2016.7538568
- Xu, G., Wu, X., Yao, H., Li, F., Yu, Z. (2019). Research on Topic Recognition of Network Sensitive Information Based on SW-LDA Model. IEEE Access, 7, 21527–21538. doi: https://doi.org/10.1109/access.2019.2897475
- Akanova, A., Ospanova, N., Kukharenko, Y., Abildinova, G. (2019). Development of the algorithm of keyword search in the Kazakh language text corpus. Eastern-European Journal of Enterprise Technologies, 5 (2 (101)), 26–32. doi: https://doi.org/10.15587/1729-4061.2019.179036
- Blei, D. M., Ng, A. Y., Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning researchm 3, 993–1022. Available at: https://jmlr.org/papers/volume3/blei03a/blei03a.pdf
- lda: Topic modeling with latent Dirichlet Allocation. Available at: https://lda.readthedocs.io/en/latest/
- Buddana, H. V. K. S., Kaushik, S. S., Manogna, P., P.S., S. K. (2021). Word Level LSTM and Recurrent Neural Network for Automatic Text Generation. 2021 International Conference on Computer Communication and Informatics (ICCCI). doi: https://doi.org/10.1109/iccci50826.2021.9402488
- Zhang, D., Luo, T., Wang, D. (2016). Learning from LDA Using Deep Neural Networks. Lecture Notes in Computer Science, 657–664. doi: https://doi.org/10.1007/978-3-319-50496-4_59
- Understanding LSTM Networks. Available at: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
- Graves, A. (2014). Generating Sequences With Recurrent Neural Networks. arxiv.org. doi: https://doi.org/10.48550/arXiv.1308.0850
- Kingma, D., Ba, J. (2015). Adam: A Method for Stochastic Optimization. Published as a conference paper at the 3rd International Conference for Learning Representations. San Diego. doi: https://doi.org/10.48550/arXiv.1412.6980
- Smirnova, O. S., Shishkov, V. V. (2016). The choice of the topology of neural networks and their use for the classification of small texts. International Journal of Open Information Technologies, 4 (8), 50–54. Available at: https://cyberleninka.ru/article/n/vybor-topologii-neyronnyh-setey-i-ih-primenenie-dlya-klassifikatsii-korotkih-tekstov
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Akerke Аkanova, Nazira Ospanova, Saltanat Sharipova, Gulalem Мauina, Zhanat Abdugulova
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with TECHNOLOGY CENTER PC, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher TECHNOLOGY CENTER PC does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.