The comparator identification method for dynamic filling the thesaurus of operational search activities

Authors

  • Нина Феликсовна Хайрова National Technical University “Kharkiv Polytechnic Institute” 21, Frunze str., Kharkiv, Ukraine, 61002, Ukraine https://orcid.org/0000-0002-9826-0286
  • Дмитрий Юрьевич Узлов National Technical University “Kharkiv Polytechnic Institute” 21, Frunze str., Kharkiv, Ukraine, 61002, Ukraine https://orcid.org/0000-0003-2886-7776
  • Светлана Валентиновна Петрасова National Technical University “Kharkiv Polytechnic Institute” 21, Frunze str., Kharkiv, Ukraine, 61002, Ukraine https://orcid.org/0000-0001-6011-135X

DOI:

https://doi.org/10.15587/1729-4061.2014.24812

Keywords:

information and criminalistic system, object-oriented thesaurus, comparator identification method, automatic word-list descriptorization

Abstract

The use of the thesaurus considerably improves recall and precision of relevant information output of integrated information and criminalistic systems used in operational search activities. The design difficulty of the objectoriented criminal thesaurus consists in the necessity of its constant changes and adapting to new domains. This paper proposes a method for automatic dynamic filling of this particular kind of thesaurus. On the basis of existing thesaurus development standards we propose the semantic linguistic processor for automatic selection of information terms and the method of comparator identification for descriptorization of received notions. The functions of understanding a connected text, key terms, a conceptual-semantic predicate, and a descriptorization predicate introduced in the paper enable to divide the set of terms into mutually exclusive semantic equivalence classes corresponded to semantically similar concepts. This method allows detecting automatically descriptive dictionary entries of a dynamically variable thesaurus adapted to textual information arrays that are used in the process of operational search activities.

The method is implemented as an automatic thesaurus of the information retrieval subsystem integrated by information and criminalistic systems. The conducted experimental study of subsystem results has shown quite high indicators of recall and precision of subsystem outputs.

Author Biographies

Нина Феликсовна Хайрова, National Technical University “Kharkiv Polytechnic Institute” 21, Frunze str., Kharkiv, Ukraine, 61002

Doctor of Computer Linguistics, Professor

Intelligent Computer Systems Department

Дмитрий Юрьевич Узлов, National Technical University “Kharkiv Polytechnic Institute” 21, Frunze str., Kharkiv, Ukraine, 61002

Applicant

Department of Intelligent Computer Systems

Светлана Валентиновна Петрасова, National Technical University “Kharkiv Polytechnic Institute” 21, Frunze str., Kharkiv, Ukraine, 61002

Postgraduate

Department of Intelligent Computer Systems

References

  1. Браславский, П. И. Тезаурус для расширения запросов к машинам поиска Интернета: структура и функции [Электронный ресурс] / П. И. Браславский. – Режим доступа: http://www.dialog-21.ru/Archive/2003/Braslavskij.htm/.
  2. ”ISO 12620:2009. Terminology and other language and content resources – Specification of data categories and management of a Data Category Registry for language resources”.iso.org [Text] / Retrieved 9 November, 2011.
  3. Панченко, А. И. Метод автоматического построения семантических отношений между концептами информационно-поискового тезауруса [Текст] / А. И. Панченко // Вестник ВГУ. Серия: Системный анализ и информационные технологии. – 2010. – № 2. – С. 160—168.
  4. Mena, J. Investigative Data Mining for Security and Criminal Detection [Text] / J. Mena. – Butterworth Heinemann is an imprint of Elsevier Science, 2003. – 452 p.
  5. Nath, S. V. Crime Pattern Detection Using Data Mining [Text] / S. V. Nath // Web Intelligence and Intelligent Agent Technology Workshops. – 2006. – P. 41–44.
  6. Phua, C. Resilient Identity Crime Detection [Текст] / C. Phua, K. Smith-Miles, V. C. S. Lee, Ross W. Gayler // IEEE Transactions on Knowledge and Data Engineering. – 2012. – № 24. – P. 533–546.
  7. Srivastava, A. N. Text Mining. Classification, Clustering and Applications [Text] / A. N. Srivastava, M. Sahami. – CRC Press. Taylor & Francis Group. London, 2009. – 278 p.
  8. Panchenko, A. Serelex: Search and Visualization of Semantically Similar Words [Text] / A. Panchenko, P. Romanov, O. Morozova, H. Naets, A. Romanov, A. Philippovich, C. Fairon // In Proceedings of the 35th European Conference on Information Retrieval. – 2013. – LNCS 7814. – P. 837–840.
  9. Трусов, А. В. Модель поиска информации в распределенных информационных системах сети Интернет [Text] / А. В. Трусов, В. А. Трусов // Научно-техническая информация (НТИ). Сер. 2. Информационные процессы и системы. – 2011. – № 8. – С. 29–31.
  10. Кудрявцев, В. Н. Объективная сторона преступления [Текст] В. Н. Кудрявцев. – М. : Госюриздат, 1960. – 244 c.
  11. Бондаренко, М. Ф. Об алгебре конечных предикатов [Текст] / М. Ф. Бондаренко, Ю. П. Шабанов-Кушнаренко // Бионика интеллекта: науч.-техн. журнал. – 2011. – № 3 (77). – С. 3–13.
  12. Дружинин, В. Н. Диагностика общих познавательных способностей [Электронный ресурс] / В. Н. Дружинин // Когнитивное обучение: современное состояние и перспективы. – Режим доступа: http://shp.by.ru/psy/lit/raznoe/00070.shtm (20.11.08).
  13. Хайрова, Н. Ф. Использование семантико-ориентированного лингвистического процессора для добывания новых знаний из потока документов корпоративной информационной системы [Текст] / Н. Ф. Хайрова, В. А. Тарловский // Вісник Національного технічного університету «ХПІ». Збірник наукових праць. Тематичний випуск «Системний аналіз, управління та інформаційні технології». – 2010. – № 67. – С. 132–138.
  14. Russell, B. Logic and Knowledge [Text] / B. Russell // Essays 1901–1905. London, 1956. – 365 p.
  15. Кудинова, Е. А. Концепт и его соотнесение с лексико-семантическим полем [Текст] / Е. А. Кудинова // Филологические науки. Вопросы теории и практики. – Тамбов : Грамота. – 2008. – Ч. 2, № 1 (1). – C. 48–50.
  16. Поспелов, Д. А. Введение в прикладную семиотику [Текст] / Д. А. Поспелов, Г. С. Осипов // Новости искусственного интеллекта. – 2002. – № 6. – С. 28–35.
  17. Солтон, Дж. Динамические библиотечно-информационные системы [Текст] / Дж. Солтон; пер. с англ. – М. : Мир, 1979. – 557 с.
  18. Философия: энциклопедический словарь [Текст] / ред. А. А. Ивина. – М. : Гардарики, 2004. – 1072 с.
  19. Хайрова, Н. Модель извлечения знаний из неструктурированных документов корпоративной информационной системы [Текст] / Н. Хайрова, Н. Шаронова. // Applicable Information Models. ITHEA. – Varna, Bulgaria. – 2011. – C. 131–139.
  20. Braslavskiy, P. I. (2003). Thesaurus for expansion of queries to Internet search engines: structure and functions. Available at: http://www.dialog-21.ru/Archive/2003/Braslavskij.htm/.
  21. “ISO 12620:2009. (2011). Terminology and other language and content resources – Specification of data categories and management of a Data Category Registry for language resources”.iso.org.
  22. Panchenko, A. I. (2010). Method of automatic construction of semantic relations between concepts of the information retrieval thesaurus. VSU Herald. Series: System Analysis and Information Technology, 2, 160–168.
  23. Mena, J. (2003). Investigative Data Mining for Security and Criminal Detection. Butterworth Heinemann is an imprint of Elsevier Science, 452.
  24. Nath, S. V. (2006). Crime Pattern Detection Using Data Mining. Web Intelligence and Intelligent Agent Technology Workshops, 41–44.
  25. Phua, C., Smith-Miles, K., Lee, V. C. S., Gayler, R. W. (2012). Resilient Identity Crime Detection. IEEE Transactions on Knowledge and Data Engineering, 24, 533–546.
  26. Srivastava, A. N., Sahami, M. (2009). Text Mining. Classification, Clustering and Applications. CRC Press. Taylor & Francis Group. London, 278.
  27. Panchenko, A. Romanov, P., Morozova, O., Naets, H., Romanov, A., Philippovich, A., Fairon, C. (2013). Serelex: Search and Visualization of Semantically Similar Words. In Proceedings of the 35th European Conference on Information Retrieval, LNCS 7814, 837–840.
  28. Trusov, A. V., Trusov, V. A. (2011). Model of information retrieval in distributed information systems on the Internet. Scientific and technical information (STI). Ser. 2. Information processes and systems, 8, 29–31.
  29. Kudryavtsev, V. N. (1960). The objective element of a crime. Moscow: Gosjurizdat, 244.
  30. Bondarenko, M. F., Shabanov-Kushnarenko, Yu. P. (2011). About algebra of final predicates. Bionics of Intelligence: Sci. Mag, 3 (77), 3–13.
  31. Druzhynin, V. N. Diagnostics of general cognitive abilities. Cognitive learning: present state and prospects. Available at : http://shp.by.ru/psy/lit/raznoe/00070.shtm (20.11.08).
  32. Khairova, N. F., Tarlovskiy, V. A. (2010). Using semantic-oriented language processor for mining new knowledge from the document flow of corporate information system. Herald of National Technical University “KhPI”. Series: System analysis, management and information technology, 67, 132–138.
  33. Russell, B. (1956). Logic and Knowledge: Essays 1901–1905. London, 365.
  34. Kudina, E. A. (2008). Concept and its correlation with the lexical-semantic field. Philological sciences. Theory and Practice. Tambov: Gramota, 1 (1), 48–50.
  35. Pospelov, D. A., Osipov, G. S. (2002). Introduction to applied semiotics. Artificial Intelligence News, 6, 28-35.
  36. Solton, J. (1979). Dynamic Library and Information Systems. Moscow: Mir, 557.
  37. Ivin, A. A. (2004). Philosophy: Encyclopedic Dictionary. Moscow: Gardariki, 1072.
  38. Khairova, N., Sharonova, N. (2011). Model for knowledge extraction from unstructured documents of corporate information system. Applicable Information Models. ITHEA. Varna, Bulgaria, 131–139.

Published

2014-06-25

How to Cite

Хайрова, Н. Ф., Узлов, Д. Ю., & Петрасова, С. В. (2014). The comparator identification method for dynamic filling the thesaurus of operational search activities. Eastern-European Journal of Enterprise Technologies, 3(2(69), 4–8. https://doi.org/10.15587/1729-4061.2014.24812