Evaluation of pertinence of linguistic descriptors in information retrieval systems

Authors

  • Лариса Эрнестовна Чалая Kharkiv National University of Radioelectronics Lenina 14, Kharkov, Ukraine, 61166, Ukraine
  • Юлия Юрьевна Харитонова Kharkiv National University of Radioelectronics Lenina 14, Kharkov, Ukraine, 61166, Ukraine https://orcid.org/0000-0001-8089-577X

DOI:

https://doi.org/10.15587/1729-4061.2015.37450

Keywords:

descriptor, acronym, mining, electronic text, classification, semantic information

Abstract

The possibility of using acronyms as linguistic descriptors for the classification of the analyzed electronic texts is considered in the paper. The proposed approach is implemented using a two-step procedure.

In the first stage, acronyms are extracted from several text documents of the field under consideration, followed by preparation of specialized acronyms dictionaries. Search results are sorted to remove impertinent pairs "acronyms/definitions", and then the alignment of the letters, contained in acronyms, with the definition words is carried out.

In the second stage, the modified metric DeMT, which allows to determine the pertinence definition of the acronym in the document under analysis is applied. Herewith, the adapted dictionary, created in the first stage of the process under consideration is used. The modified metric , which takes into account contexts can be successfully adapted to the problem of evaluating the pertinence of linguistic descriptors.

Author Biographies

Лариса Эрнестовна Чалая, Kharkiv National University of Radioelectronics Lenina 14, Kharkov, Ukraine, 61166

Associate professor, Candidate of technical science

The department of artificial intelligence

Юлия Юрьевна Харитонова, Kharkiv National University of Radioelectronics Lenina 14, Kharkov, Ukraine, 61166

PhD student

The department of artificial intelligence

References

  1. Navarro, G. (2001). A guided tour to approximate string matching. ACM Computing Surveys, 33 (1), 31–88.. doi: 10.1145/375360.375365
  2. Duchateau, F., Bellahsene, Z., Roche, M. (2007). A context-based measure for discovering approximate semantic matching between schema elements. In Proceedings of IEEE Research Challenges in Information Science (RCIS), 9–20.
  3. Rahm, E., Bernstein, P. A. (2001). A survey of approaches to automatic schema matching. VLDB Journal: Very Large Data Bases, 10 (4), 334–350. doi: 10.1007/s007780100057
  4. Duchateau, F., Bellahsene, Z., Roche, M. (2008). Improving quality and performance of schema matching in large scale. Ingénierie des Systèmes d’Information (ISI), 13 (5), 59–82. doi: 10.3166/isi.13.5.59-82
  5. Aussenac-Gilles, N., Bourigault, D. (2003). Construction d’ontologies à partir de textes. In Actes de Traitement Automatique des Langues Naturelles (TALN), 2, 27–47.
  6. Turney, P. (2001). Mining the Web for synonyms: PMI–IR versus LSA on TOEFL. Proceedings of the 12th European Conference on Machine Learning (ECML), LNCS, 2167, 491–502. doi: 10.1007/3-540-44795-4_42
  7. Qamar, A., Gaussier, E. (2009). Online and batch learning of generalized cosine similarities. In Proceedings of International Conference on Data Mining (ICDM), 926–931. doi: 10.1109/icdm.2009.114
  8. Nyberg, K., Raiko, T., Hyvönen, E., Tiinanen, T. (2010). Document classification utilising ontologies and relations between documents. In Proceedings of the Eighth Workshop on Mining and Learning with Graphs (MLG), 86–93. doi: 10.1145/1830252.1830264
  9. Bellahsene, Z., Benbernou, S., Jaudoin, H., Pinet, F., Pivert, O., Toumani, F., Bernard, S., Colomb, P., Coletta, R., Coquery, E., De Marchi, F., Duchateau, F., Hacid, M.-S., HadjAli, A., Roche, M. (2010). Forum: a flexible data integration system based on data semantics. SIGMOD Record, 39 (2), 11–18.
  10. Roche, M., Prince, V. (2007). AcroDef: A quality measure for discriminating expansions of ambiguous acronyms. In Proceedings of CONTEXT, LNCS, Springer-Verlag, 411–424. doi: 10.1007/978-3-540-74255-5_31
  11. Roche, M. (2004). Intégration de la construction de la terminologie de domaines spécialisés dans un processus global de fouille de textes. Paris, 11.
  12. Smadja, F., McKeown, K. R., Hatzivassiloglou, V. (1996). Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22 (1), 1–38.
  13. Blandin, G. (2005). Dictionnaire de sigles et acronyms. Asankyeya. Available at: http://www.sigles.net
  14. Pike, R. (2004). Medline. USA. Available at: http://www.ncbi.nlm.nih.gov/PubMed

Published

2015-02-25

How to Cite

Чалая, Л. Э., & Харитонова, Ю. Ю. (2015). Evaluation of pertinence of linguistic descriptors in information retrieval systems. Eastern-European Journal of Enterprise Technologies, 1(9(73), 46–53. https://doi.org/10.15587/1729-4061.2015.37450

Issue

Section

Information and controlling system