Method of determining keywords for English texts based on DKPro Core

Authors

DOI:

https://doi.org/10.15587/2312-8372.2015.37274

Keywords:

method, keywords, English, linguistic package, DKPro Core, syntactic analysis

Abstract

The approaches to search of keywords in text that are divided into two linguistic and statistical categories are considered. Linguistic methods are based on the meaning of words, especially using ontologies and semantic information of words. Unfortunately, these methods are resource-intensive in the early stages - development of ontologies, for example, is very time-consuming process.

It is proposed a new method for determining the keywords based on finding connections between word forms of the English text with the instrumental capabilities of package DKPro Core. The method, which illustrated with examples of analysis, aimed at solving problems of efficient processing of text documents - indexing, abstracting, clustering and classification.

As a result of theoretical and experimental studies it is found that the developed method found more keywords, specified by the author of the text, compared to analogues. In addition, the proposed method without additional filters at least 5 times reduces the number of stop words among the top ten important (key) words. The results can be used to improve the accuracy of the content analysis of the site and raise the site position in search results.

Unlike the existing methods the proposed method of determining the keywords based on the use of additional information about complex relationships between members of the English sentence. For the functional implementation of text analyzer it is selected the popular linguistic package DKPro Core. Experimental studies of theoretical substantiation of method are proved its quality advantages in comparison with known analogues.

Author Biographies

Олег Володимирович Бісікало, Vinnytsia National Technical University, Khmelnytsky Shosse 95, Vinnitsa, Ukraine, 21000

Doctor of Technical Sciences, Professor, Director of Institute of Automation, Electronics and Computer Systems

Department of Automation and Information Measuring Devices

Олександр Вікторович Яхимович, Vinnytsia National Technical University, Khmelnytsky Shosse 95, Vinnitsa, Ukraine, 21000

Department of Automation and Information Measuring Devices

References

  1. Ershov, Yu. S. (2014). Vydelenie kliuchevyh slov v russkoiazychnyh tekstah. Molodezhnyi nauchno-tehnicheskii vestnik. M.: FGBOU VPO "MGTU im. N. E. Baumana". Available: http://sntbul.bmstu.ru/file/out/730754. Last accessed 21.01.2015.
  2. Andreev, A. M., Berezkin, D. V., Siuzev, V. V., Shabanov, V. I. (2003). Modeli i metody avtomaticheskoi klassifikatsii tekstovyh dokumentov. Vestnik MGTU im. N. E. Baumana. Ser. Priborostroenie, № 4. Available: http://vestnikprib.bmstu.ru/articles/397/html/files/assets/basic-html/page1.html. Last accessed 21.01.2015.
  3. Joachims, T. (1998). Text categorization with Support Vector Machines: Learning with many relevant features. Machine Learning: ECML-98 Lecture Notes in Computer Science, Vol. 1398, 137–142. doi:10.1007/bfb0026683
  4. Jensen, R. (2000). A Rough Set-Aided System for Sorting WWW Bookmarks. The University of Edinburgh. Available: http://users.aber.ac.uk/rkj/research/mscthesis.pdf. Last accessed 21.01.2015.
  5. Larkey, L. S., Croft, W. B. (1996). Combining classifiers in text categorization. Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’96. ACM Press, 289-297. doi:10.1145/243199.243276
  6. Scott, S., Matwin, S. (1998). Text Classification Using WordNet Hypernyms. University of Ottawa. Available: http://www.aclweb.org/anthology/W98-0706. Last accessed 21.01.2015.
  7. Darkulova, K. N., Ergeshova, G. (2014). Neobhodimost' vydeleniia kliuchevyh slov dlia sviortyvaniia teksta. VI Mezhdunarodnaia studencheskaia elektronnaia nauchnaia konferentsiia «Studencheskii nauchnyi forum» 15 fevralia – 31 marta 2014 goda. Lingvisticheskii analiz nauchnogo teksta. Yuzhno-Kazahstanskii gosudarstvennyi universitet im. Muhtara Auezova Shymkent. Available: http://www.scienceforum.ru/2014/476/70. Last accessed 21.01.2015.
  8. Bisikalo, O. V. (2013). Kontseptualna model systemy obraznoho analizu i syntezu pryrodno-movnykh konstruktsii. Matematychni mashyny i systemy, № 2, 184–187. ISSN 1028-9763.
  9. Bisikalo, O. V. (2013). Formalni metody obraznoho analizu ta syntezu pryrodno-movnykh konstruktsii. Vinnytsia: VNTU, 316. ISBN 978-966-641-528-1.
  10. Natural Language Processing: Integration of Automatic and Manual Analysis. (2014). Technischen Universität Darmstadt. Available: http://tuprints.ulb.tu-darmstadt.de/4151/1/rec-thesis-final.pdf. Last accessed 21.01.2015.
  11. Gurevych, I., Muhlhauser, M., Muller, Ch., Steimle, J., Weimer, M., Zesch, T. (2007, February 9). Darmstadt Knowledge Processing Repository Based on UIMA. Available: https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2007/gldv-uima-ukp.pdf. . Last accessed 21.01.2015.
  12. Burgareli, L. A. (2009, Jul.-Dec.). Variability management in software product lines using adaptive object and reflection. Journal of Aerospace Technology and Management, V. 1, № 2. Available: http://www.jatm.com.br/papers/vol1_n2/JATMv1n2_thesis_abstracts.pdf. Last accessed 21.01.2015.
  13. Address by President of the Russian Federation. Available: http://eng.kremlin.ru/transcripts/6402. Last accessed 21.01.2015.
  14. Address by President of the Russian Federation. Available: http://eng.kremlin.ru/news/6889. Last accessed 21.01.2015.

Published

2015-01-29

How to Cite

Бісікало, О. В., & Яхимович, О. В. (2015). Method of determining keywords for English texts based on DKPro Core. Technology Audit and Production Reserves, 1(2(21), 26–30. https://doi.org/10.15587/2312-8372.2015.37274

Issue

Section

Information Technologies: Original Research