The comparator identification method for dynamic filling the thesaurus of operational search activities
DOI:
https://doi.org/10.15587/1729-4061.2014.24812Keywords:
information and criminalistic system, object-oriented thesaurus, comparator identification method, automatic word-list descriptorizationAbstract
The use of the thesaurus considerably improves recall and precision of relevant information output of integrated information and criminalistic systems used in operational search activities. The design difficulty of the objectoriented criminal thesaurus consists in the necessity of its constant changes and adapting to new domains. This paper proposes a method for automatic dynamic filling of this particular kind of thesaurus. On the basis of existing thesaurus development standards we propose the semantic linguistic processor for automatic selection of information terms and the method of comparator identification for descriptorization of received notions. The functions of understanding a connected text, key terms, a conceptual-semantic predicate, and a descriptorization predicate introduced in the paper enable to divide the set of terms into mutually exclusive semantic equivalence classes corresponded to semantically similar concepts. This method allows detecting automatically descriptive dictionary entries of a dynamically variable thesaurus adapted to textual information arrays that are used in the process of operational search activities.
The method is implemented as an automatic thesaurus of the information retrieval subsystem integrated by information and criminalistic systems. The conducted experimental study of subsystem results has shown quite high indicators of recall and precision of subsystem outputs.
References
- Браславский, П. И. Тезаурус для расширения запросов к машинам поиска Интернета: структура и функции [Электронный ресурс] / П. И. Браславский. – Режим доступа: http://www.dialog-21.ru/Archive/2003/Braslavskij.htm/.
- ”ISO 12620:2009. Terminology and other language and content resources – Specification of data categories and management of a Data Category Registry for language resources”.iso.org [Text] / Retrieved 9 November, 2011.
- Панченко, А. И. Метод автоматического построения семантических отношений между концептами информационно-поискового тезауруса [Текст] / А. И. Панченко // Вестник ВГУ. Серия: Системный анализ и информационные технологии. – 2010. – № 2. – С. 160—168.
- Mena, J. Investigative Data Mining for Security and Criminal Detection [Text] / J. Mena. – Butterworth Heinemann is an imprint of Elsevier Science, 2003. – 452 p.
- Nath, S. V. Crime Pattern Detection Using Data Mining [Text] / S. V. Nath // Web Intelligence and Intelligent Agent Technology Workshops. – 2006. – P. 41–44.
- Phua, C. Resilient Identity Crime Detection [Текст] / C. Phua, K. Smith-Miles, V. C. S. Lee, Ross W. Gayler // IEEE Transactions on Knowledge and Data Engineering. – 2012. – № 24. – P. 533–546.
- Srivastava, A. N. Text Mining. Classification, Clustering and Applications [Text] / A. N. Srivastava, M. Sahami. – CRC Press. Taylor & Francis Group. London, 2009. – 278 p.
- Panchenko, A. Serelex: Search and Visualization of Semantically Similar Words [Text] / A. Panchenko, P. Romanov, O. Morozova, H. Naets, A. Romanov, A. Philippovich, C. Fairon // In Proceedings of the 35th European Conference on Information Retrieval. – 2013. – LNCS 7814. – P. 837–840.
- Трусов, А. В. Модель поиска информации в распределенных информационных системах сети Интернет [Text] / А. В. Трусов, В. А. Трусов // Научно-техническая информация (НТИ). Сер. 2. Информационные процессы и системы. – 2011. – № 8. – С. 29–31.
- Кудрявцев, В. Н. Объективная сторона преступления [Текст] В. Н. Кудрявцев. – М. : Госюриздат, 1960. – 244 c.
- Бондаренко, М. Ф. Об алгебре конечных предикатов [Текст] / М. Ф. Бондаренко, Ю. П. Шабанов-Кушнаренко // Бионика интеллекта: науч.-техн. журнал. – 2011. – № 3 (77). – С. 3–13.
- Дружинин, В. Н. Диагностика общих познавательных способностей [Электронный ресурс] / В. Н. Дружинин // Когнитивное обучение: современное состояние и перспективы. – Режим доступа: http://shp.by.ru/psy/lit/raznoe/00070.shtm (20.11.08).
- Хайрова, Н. Ф. Использование семантико-ориентированного лингвистического процессора для добывания новых знаний из потока документов корпоративной информационной системы [Текст] / Н. Ф. Хайрова, В. А. Тарловский // Вісник Національного технічного університету «ХПІ». Збірник наукових праць. Тематичний випуск «Системний аналіз, управління та інформаційні технології». – 2010. – № 67. – С. 132–138.
- Russell, B. Logic and Knowledge [Text] / B. Russell // Essays 1901–1905. London, 1956. – 365 p.
- Кудинова, Е. А. Концепт и его соотнесение с лексико-семантическим полем [Текст] / Е. А. Кудинова // Филологические науки. Вопросы теории и практики. – Тамбов : Грамота. – 2008. – Ч. 2, № 1 (1). – C. 48–50.
- Поспелов, Д. А. Введение в прикладную семиотику [Текст] / Д. А. Поспелов, Г. С. Осипов // Новости искусственного интеллекта. – 2002. – № 6. – С. 28–35.
- Солтон, Дж. Динамические библиотечно-информационные системы [Текст] / Дж. Солтон; пер. с англ. – М. : Мир, 1979. – 557 с.
- Философия: энциклопедический словарь [Текст] / ред. А. А. Ивина. – М. : Гардарики, 2004. – 1072 с.
- Хайрова, Н. Модель извлечения знаний из неструктурированных документов корпоративной информационной системы [Текст] / Н. Хайрова, Н. Шаронова. // Applicable Information Models. ITHEA. – Varna, Bulgaria. – 2011. – C. 131–139.
- Braslavskiy, P. I. (2003). Thesaurus for expansion of queries to Internet search engines: structure and functions. Available at: http://www.dialog-21.ru/Archive/2003/Braslavskij.htm/.
- “ISO 12620:2009. (2011). Terminology and other language and content resources – Specification of data categories and management of a Data Category Registry for language resources”.iso.org.
- Panchenko, A. I. (2010). Method of automatic construction of semantic relations between concepts of the information retrieval thesaurus. VSU Herald. Series: System Analysis and Information Technology, 2, 160–168.
- Mena, J. (2003). Investigative Data Mining for Security and Criminal Detection. Butterworth Heinemann is an imprint of Elsevier Science, 452.
- Nath, S. V. (2006). Crime Pattern Detection Using Data Mining. Web Intelligence and Intelligent Agent Technology Workshops, 41–44.
- Phua, C., Smith-Miles, K., Lee, V. C. S., Gayler, R. W. (2012). Resilient Identity Crime Detection. IEEE Transactions on Knowledge and Data Engineering, 24, 533–546.
- Srivastava, A. N., Sahami, M. (2009). Text Mining. Classification, Clustering and Applications. CRC Press. Taylor & Francis Group. London, 278.
- Panchenko, A. Romanov, P., Morozova, O., Naets, H., Romanov, A., Philippovich, A., Fairon, C. (2013). Serelex: Search and Visualization of Semantically Similar Words. In Proceedings of the 35th European Conference on Information Retrieval, LNCS 7814, 837–840.
- Trusov, A. V., Trusov, V. A. (2011). Model of information retrieval in distributed information systems on the Internet. Scientific and technical information (STI). Ser. 2. Information processes and systems, 8, 29–31.
- Kudryavtsev, V. N. (1960). The objective element of a crime. Moscow: Gosjurizdat, 244.
- Bondarenko, M. F., Shabanov-Kushnarenko, Yu. P. (2011). About algebra of final predicates. Bionics of Intelligence: Sci. Mag, 3 (77), 3–13.
- Druzhynin, V. N. Diagnostics of general cognitive abilities. Cognitive learning: present state and prospects. Available at : http://shp.by.ru/psy/lit/raznoe/00070.shtm (20.11.08).
- Khairova, N. F., Tarlovskiy, V. A. (2010). Using semantic-oriented language processor for mining new knowledge from the document flow of corporate information system. Herald of National Technical University “KhPI”. Series: System analysis, management and information technology, 67, 132–138.
- Russell, B. (1956). Logic and Knowledge: Essays 1901–1905. London, 365.
- Kudina, E. A. (2008). Concept and its correlation with the lexical-semantic field. Philological sciences. Theory and Practice. Tambov: Gramota, 1 (1), 48–50.
- Pospelov, D. A., Osipov, G. S. (2002). Introduction to applied semiotics. Artificial Intelligence News, 6, 28-35.
- Solton, J. (1979). Dynamic Library and Information Systems. Moscow: Mir, 557.
- Ivin, A. A. (2004). Philosophy: Encyclopedic Dictionary. Moscow: Gardariki, 1072.
- Khairova, N., Sharonova, N. (2011). Model for knowledge extraction from unstructured documents of corporate information system. Applicable Information Models. ITHEA. Varna, Bulgaria, 131–139.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2014 Нина Феликсовна Хайрова, Дмитрий Юрьевич Узлов, Светлана Валентиновна Петрасова
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with TECHNOLOGY CENTER PC, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher TECHNOLOGY CENTER PC does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.