Development of a multi-agent system for solving domain dictionary construction problem

Vadym Yaremenko; Oleksandr Syrotiuk

doi:10.15587/2706-5448.2020.208400

Authors

Vadym Yaremenko National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute», 37, Peremohy ave., Kyiv, Ukraine, 03056, Ukraine https://orcid.org/0000-0001-8557-6938
Oleksandr Syrotiuk National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute», 37, Peremohy ave., Kyiv, Ukraine, 03056, Ukraine https://orcid.org/0000-0002-4531-6290

DOI:

https://doi.org/10.15587/2706-5448.2020.208400

Keywords:

TF-IDF, RAKE, TextRank, Word2Vec, Schulze method, text data, frequency analysis, parallel computing, multi-agent system

Abstract

The object of research is the use of multi-agent systems for text data mining. The need for this study arose with a tendency to increase the amount of textual information generated in the world. Accordingly, it is necessary to develop and research methods of its processing, as well as ways to use the results of this processing, because the methods can’t exist in isolation from practice. At the same time, there is a development of multi-agent systems (MAS), where agents are endowed with some kind of intelligence, these systems can be easily scaled. The use of MAS for text analysis is a promising area.

The following methods of text data analysis were used in this study: TF-IDF and RAKE methods, Word2Vec neural network models, and TextRank. The algorithms were compared for their work and the results were compared. The corpus of documents (10–12 texts, 5732–12331 words) from the subject areas of physics and biology were used as a test set. According to the results of the study, one method was chosen, on the basis of which the MAS was built to solve the problem. Additionally, Schulze methods (with one and several winners) were used for voting. With the received system additional researches concerning accuracy and speed of work, and also – influence are carried out system parameters for its operation.

It has been found that TF-IDF-based analysis is useful for finding terms in documents with a weak context. The resulting system shows an accuracy of 75 % (3 of the 4 words proposed by the system are terms). The maximum operating time on test cases is 2–3 seconds, which is achieved through the use of parallel calculations and modification of the Schulze method. The results obtained in this paper are heuristic (ontology is a rather vague concept) and require additional elaboration by experts in the relevant fields. However, the results are positive within this experiment

Author Biographies

Vadym Yaremenko, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute», 37, Peremohy ave., Kyiv, Ukraine, 03056

Postgraduent Student, Assistant

Department of System Design

Oleksandr Syrotiuk, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute», 37, Peremohy ave., Kyiv, Ukraine, 03056

Department of System Design

References

Mikolov, T., Le, Q. V., Sutskever, I. (2013). Exploiting Similarities among Languages for Machine Translation. ArXiv. Available at: https://arxiv.org/abs/1309.4168
Wu, H. C., Luk, R. W. P., Wong, K. F., Kwok, K. L. (2008). Interpreting TF-IDF term weights as making relevance decisions. ACM Transactions on Information Systems, 26 (3), 1–37. doi: http://doi.org/10.1145/1361684.1361686
Aref, M.M. (2003). A multi-agent system for natural language understanding. IEMC '03 Proceedings. Managing Technologically Driven Organizations: The Human Side of Innovation and Change (IEEE Cat. No.03CH37502), 36–40. doi: http://doi.org/10.1109/kimas.2003.1245018
Fum, D., Guida, G., Tasso, C. (1988). A distributed multi-agent architecture for natural language processing. Proceedings of the 12th conference on Computational linguistics, 812–814. doi: http://doi.org/10.3115/991719.991801
Mihalcea, R., Tarau, P. (2004). TextRank: Bringing Order into Text. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 404–411.
Rose, S. R., Engel, D., Cramer, N., Cowley, W. (2010). Automatic keyword extraction from individual documents. Text Mining. doi: http://doi.org/10.1002/9780470689646.ch1
Twardowski, B., Ryzko, D. (2014). Multi-agent Architecture for Real-Time Big Data Processing. 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 3, 333–337. doi: http://doi.org/10.1109/wi-iat.2014.185
Kiran, M., Murphy, P., Monga, I., Dugan, J., Baveja, S. S. (2015). Lambda architecture for cost-effective batch and speed big data processing. 2015 IEEE International Conference on Big Data (Big Data), 2785–2792. doi: http://doi.org/10.1109/bigdata.2015.7364082
Singh, K., Behera, R., Mantri, J. (2019). Big Data Ecosystem: Review on Architectural Evolution. Advances in Intelligent Systems and Computing, 335–345. doi: http://doi.org/10.1007/978-981-13-1498-8_30
Schulze, M. (2018). The Schulze Method of Voting. ArXiv. Available at: https://arxiv.org/abs/1804.02973
Amdahl, Gene. (2007). Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities, Reprinted from the AFIPS Conference Proceedings, Vol. 30 (Atlantic City, N. J., Apr. 18–20). IEEE Solid-State Circuits Newsletter, 12, 19–20. doi: http://doi.org/10.1109/n-ssc.2007.4785615

Development of a multi-agent system for solving domain dictionary construction problem

Authors

DOI:

Keywords:

Abstract

Author Biographies

Vadym Yaremenko, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute», 37, Peremohy ave., Kyiv, Ukraine, 03056

Oleksandr Syrotiuk, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute», 37, Peremohy ave., Kyiv, Ukraine, 03056

References

Downloads

Published

How to Cite

Issue

Section

License

Information site

Language

Information

Developed By

Current Issue