Method of determining keywords for English texts based on DKPro Core
DOI:
https://doi.org/10.15587/2312-8372.2015.37274Keywords:
method, keywords, English, linguistic package, DKPro Core, syntactic analysisAbstract
The approaches to search of keywords in text that are divided into two linguistic and statistical categories are considered. Linguistic methods are based on the meaning of words, especially using ontologies and semantic information of words. Unfortunately, these methods are resource-intensive in the early stages - development of ontologies, for example, is very time-consuming process.
It is proposed a new method for determining the keywords based on finding connections between word forms of the English text with the instrumental capabilities of package DKPro Core. The method, which illustrated with examples of analysis, aimed at solving problems of efficient processing of text documents - indexing, abstracting, clustering and classification.
As a result of theoretical and experimental studies it is found that the developed method found more keywords, specified by the author of the text, compared to analogues. In addition, the proposed method without additional filters at least 5 times reduces the number of stop words among the top ten important (key) words. The results can be used to improve the accuracy of the content analysis of the site and raise the site position in search results.
Unlike the existing methods the proposed method of determining the keywords based on the use of additional information about complex relationships between members of the English sentence. For the functional implementation of text analyzer it is selected the popular linguistic package DKPro Core. Experimental studies of theoretical substantiation of method are proved its quality advantages in comparison with known analogues.
References
- Ershov, Yu. S. (2014). Vydelenie kliuchevyh slov v russkoiazychnyh tekstah. Molodezhnyi nauchno-tehnicheskii vestnik. M.: FGBOU VPO "MGTU im. N. E. Baumana". Available: http://sntbul.bmstu.ru/file/out/730754. Last accessed 21.01.2015.
- Andreev, A. M., Berezkin, D. V., Siuzev, V. V., Shabanov, V. I. (2003). Modeli i metody avtomaticheskoi klassifikatsii tekstovyh dokumentov. Vestnik MGTU im. N. E. Baumana. Ser. Priborostroenie, № 4. Available: http://vestnikprib.bmstu.ru/articles/397/html/files/assets/basic-html/page1.html. Last accessed 21.01.2015.
- Joachims, T. (1998). Text categorization with Support Vector Machines: Learning with many relevant features. Machine Learning: ECML-98 Lecture Notes in Computer Science, Vol. 1398, 137–142. doi:10.1007/bfb0026683
- Jensen, R. (2000). A Rough Set-Aided System for Sorting WWW Bookmarks. The University of Edinburgh. Available: http://users.aber.ac.uk/rkj/research/mscthesis.pdf. Last accessed 21.01.2015.
- Larkey, L. S., Croft, W. B. (1996). Combining classifiers in text categorization. Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’96. ACM Press, 289-297. doi:10.1145/243199.243276
- Scott, S., Matwin, S. (1998). Text Classification Using WordNet Hypernyms. University of Ottawa. Available: http://www.aclweb.org/anthology/W98-0706. Last accessed 21.01.2015.
- Darkulova, K. N., Ergeshova, G. (2014). Neobhodimost' vydeleniia kliuchevyh slov dlia sviortyvaniia teksta. VI Mezhdunarodnaia studencheskaia elektronnaia nauchnaia konferentsiia «Studencheskii nauchnyi forum» 15 fevralia – 31 marta 2014 goda. Lingvisticheskii analiz nauchnogo teksta. Yuzhno-Kazahstanskii gosudarstvennyi universitet im. Muhtara Auezova Shymkent. Available: http://www.scienceforum.ru/2014/476/70. Last accessed 21.01.2015.
- Bisikalo, O. V. (2013). Kontseptualna model systemy obraznoho analizu i syntezu pryrodno-movnykh konstruktsii. Matematychni mashyny i systemy, № 2, 184–187. ISSN 1028-9763.
- Bisikalo, O. V. (2013). Formalni metody obraznoho analizu ta syntezu pryrodno-movnykh konstruktsii. Vinnytsia: VNTU, 316. ISBN 978-966-641-528-1.
- Natural Language Processing: Integration of Automatic and Manual Analysis. (2014). Technischen Universität Darmstadt. Available: http://tuprints.ulb.tu-darmstadt.de/4151/1/rec-thesis-final.pdf. Last accessed 21.01.2015.
- Gurevych, I., Muhlhauser, M., Muller, Ch., Steimle, J., Weimer, M., Zesch, T. (2007, February 9). Darmstadt Knowledge Processing Repository Based on UIMA. Available: https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2007/gldv-uima-ukp.pdf. . Last accessed 21.01.2015.
- Burgareli, L. A. (2009, Jul.-Dec.). Variability management in software product lines using adaptive object and reflection. Journal of Aerospace Technology and Management, V. 1, № 2. Available: http://www.jatm.com.br/papers/vol1_n2/JATMv1n2_thesis_abstracts.pdf. Last accessed 21.01.2015.
- Address by President of the Russian Federation. Available: http://eng.kremlin.ru/transcripts/6402. Last accessed 21.01.2015.
- Address by President of the Russian Federation. Available: http://eng.kremlin.ru/news/6889. Last accessed 21.01.2015.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2016 Олег Володимирович Бісікало, Олександр Вікторович Яхимович
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.