Indexing text documents for problem advanced information search by keyword
DOI:
https://doi.org/10.15587/1729-4061.2014.20332Keywords:
full-text search, intelligent systems, indexing, morphological analysis, automation of library activitiesAbstract
The issue of indexing full-text documents automatically for solving the problem of intellectualizing data searches is considered in the paper. The main objective of the study lies in developing a full-text keyword search model, taking into account morphological features of Russian, as well as in developing algorithms of indexing and a fulltext search. For a practical implementation of the system in the form of a web application, the PHP programming language was chosen, as a relational full-text index database, i.e. DBMS MySQL. For a morphological analysis a “demon’’ normalizer, serving as a tcp-server and including the Dialing morph-analyzer, was developed. The given system retains a number of features: it can be used by several users simultaneously, operate great indices, maintain the optimum ratio of selectivity and sensitivity at searching.
The research results can be used by analytical linguists, specialists in the field of automation of library activities, as well as other specialists and experts in creating automated library information systems, automatic abstracting systems, etc. Thus, using the abovedescribed software and applications has allowed developing an effective system of indexing full-text documents and full-text keyword searching
References
- Ландэ, Д. В. Основы интеграции информационных потоков [Текст]: монография / Д. В. Ландэ. – К.: Инжиниринг, 2006. – 240 с.
- Ландэ, Д. В. Основы концепции глубинного анализа текстов (Text Mining) [Электронный ресурс] / Д. В. Ландэ. – Режим доступа : http://download.yandex.ru/class/lande/lande-11-tmining.ppt.
- Бондаренко, М. Ф. О прикладных задачах машинной лингвистики, решаемых подсчетом частот слов и выражений [Текст] / М. Ф. Бондаренко, В. И. Рублинецкий, В. А. Чикина // Проблемы бионики. – Х. : ХИРЭ. – 1999. – Вып. 50. – С. 5-15.
- Алисейко, З. А. Автоматизированное индексирование полнотекстовых документов ключевыми словами [Текст] / З. А. Алисейко, О. В. Канищева // Вестник Херсонского национального технического университета. – Херсон : ХНТУ. – 2007. – № 4(27). – С. 269-272.
- Алисейко, З. А. Исследование проблем ранжирования и релевантности полнотекстовых документов в информационном поиске [Текст] / З. А. Алисейко, Н. В. Шаронова // Вестник Херсонского национального технического университета. – Херсон : ХНТУ. – 2006. – № 1(24). – С. 232-236.
- Хайрова, Н. Ф. Автоматизированные информационные системы: задачи обработки информации [Текст] / Н. Ф. Хайрова, Н. В. Шаронова. – Х.: ХГУ «НУА», 2002. – 120 с.
- Кочуева, З. А. Моделирование процедур систематизации и классификации информационных объектов методом компараторной идентификации [Текст] / Н. В. Борисова, З. А. Кочуева, Н. В. Шаронова, Н.Ф. Хайрова // Вестник Херсонского национального технического университета. – Херсон : ХНТУ. – 2012. – № 1(44). – С. 91-95.
- Автоматизированная обработка текста [Электронный ресурс]. – Режим доступа : http://www.aot.ru/.
- Зализняк, А. А. Грамматический словарь русского языка: Словоизменение [Текст] / А. А. Зализняк. – М.: Рус. яз., 1980. – 880 с.
- Бондаренко, М. Ф. Автоматическая обработка информации на естественном языке: Учебное пособие [Текст] / М. Ф. Бондаренко, А. Ф. Осыка. – К.: УМК ВО, 1991. – 144 с.
- Маннинг, К. Введение в информационный поиск [Текст] / К. Маннинг, П. Рагхаван, Х. Шютце. – М.: Вильямс, 2011. – 528 с.
- Lande, D. V. (2006). Fundamentals of integration of information flows. Kyiv, 240.
- Lande, D. V. Basis of the concept of deep analysis of texts. Available at: http://download.yandex.ru/class/lande/lande-11-tmining.ppt.
- Bondarenko, M. F. (1999). About Applied Linguistics machine problems that can be solved by counting the frequency of words and expressions. Problems of bionics, 50, 5-15.
- Aliseyko, Z. A. (2007). Automated indexing full-text documents keyword. Bulletin of Kherson National Technical University, 4 (27), 269-272.
- Aliseyko, Z. A. (2006). Study of the problems of ranking and relevance of full-text documents in information retrieval, Bulletin of Kherson National Technical University, № 1 (24), 232-236.
- Khayrova, N. F., Sharonova, N. V. (2002). Automated information systems. Information processing tasks. Kharkov, 120.
- Kochueva, Z. A., Borisova, N. V., Sharonova, N. V., Khayrova, N. F. (2012). Modeling procedures systematization and classification of data objects by identifying comparator. Bulletin of Kherson National Technical University, 1 (44), 91-95.
- Automated processing of text. Available at: http://www.aot.ru/.
- Zaliznyak, A. A. (1980). Grammatical Dictionary of the Russian language : inflection. Moscow, 880.
- Bondarenko, M. F., Osyka, A. F. (2001). Automatic processing of natural language : Textbook. Kyiv, 144.
- Manning, C., Raghavan, P., Schütze, H. (2011) Introduction to Information Retrieval. Moscow, 528.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2014 Наталья Владимировна Борисова, Зоя Анатольевна Кочуева
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with TECHNOLOGY CENTER PC, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher TECHNOLOGY CENTER PC does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.