Method of automated construction of explanatory dictionary of subject area
DOI:
https://doi.org/10.15587/2312-8372.2015.40895Keywords:
dictionary, term, subject area, synonym, group nameAbstract
The article deals with the method of the automated construction of explanatory dictionary based on the processing of many texts from a specific subject area.
The technology of selection and grouping of source texts, based on inter-document and intra-document clustering, which allows save significant terms in the dictionary.
It is developed the procedure of selection of terms (individual words and phrases) from documents, based on the calculation of the frequency of their occurrence in the text.
The technique of finding of synonyms, definitions, and using other dictionaries is proposed.
The formula that allows you to estimate the time spent on the various stages of compiling the dictionary is given.
The results of experiments that confirm the effectiveness of the proposed method of construction of dictionary of subject area are given.
The proposed method of automatic compilation of the dictionary of subject area can be used to determine the stage of requirements for software products in information systems and artificial intelligence systems.
References
- Chertkova, E. A. (2005). Modelirovanie predmetnoi oblasti dlia proektirovaniia komp'iuternyh obuchaiushchih sistem. Kongress konferentsii "Informatsionnye tehnologii v obrazovanii". Sektsiia VII. Available: http://ito.edu.ru/2005/Moscow/VII/VII-0-5032.html
- JaLingo. Available: http://jalingo.sourceforge.net/
- Kungurtsev, A. B., Barykina, I. V. (2006). Formirovanie slovaria predmetnoi oblasti. Iskusstvennyi intellekt, № 1, 144–151.
- Kunhurtsev, A. B., Borodavkin, S. M. (2009). Zastosuvannia merezh freimiv dlia pobudovy modeli vyluchennia faktiv z tekstiv na pryrodnii movi. Iskusstvennyi intellekt, № 4, 202–207.
- Kungurtsev, A., Borodavkin, S., Golub, A. (2012). Method of creation of domains dictionaries for extraction of the facts from texts in the natural language. Eastern-European Journal Of Enterprise Technologies, 1(4(43)), 32-36. Available: http://journals.uran.ua/eejet/article/view/2550
- Bourigault, D. (1992). Surface grammatical analysis for the extraction of terminological noun phrases. Proceedings of the 14th conference on Computational linguistics. Association for Computational Linguistics (ACL), 977–981. doi:10.3115/993079.993111
- Baroni, M., Bernardini, S. (2004). Bootstrapping Corpora and Terms from the Web. Proceedings of LREC. Lisbon: ELDA, 1313–1316.
- Programmnyi paket sintaksicheskii analiz. Proekt AOT. Available: http://www.aot.ru/docs/synan.html
- Shelov, S. D. (2001). Terminovedenie: sem' voprosov i sem' otvetov po semantike termina. NTI. Ser. 2. Informatsionnye protsessy i sistemy, № 2, 1–11.
- Liashevskaia, O. N., Sharov, S. A. (2009). Chastotnyi slovar' sovremennogo russkogo iazyka (na materialah Natsional'nogo korpusa russkogo iazyka). M.: Azbukovnik. Available: http://dict.ruslang.ru/freq.php
- Ozhegov, S. I., Shvedova, N. Yu. (2004). Tolkovyi slovar' russkogo iazyka. M.: ONIKS 21 vek: Mir i Obrazovanie, 1198.
- Programmnyi paket sintaksicheskogo razbora i mashinnogo perevoda. (2008). Available: http://cs.isa.ru:10000/dwarf/
- Knut, D. E. (2007). Iskusstvo programmirovaniia. Tom 3. Sortirovka i poisk. M.: Izdatel'skii dom Vil'iams, 800.
- Horstmann, K., Kornell, G. (2014). Java. Tom 2. Biblioteka professionala. M.: Izdatel'skii dom Vil'iams, 864.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2016 Алексей Борисович Кунгурцев, Яна Владимировна Поточняк, Дмитрий Александрович Силяев
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.