Evaluation of pertinence of linguistic descriptors in information retrieval systems
DOI:
https://doi.org/10.15587/1729-4061.2015.37450Keywords:
descriptor, acronym, mining, electronic text, classification, semantic informationAbstract
The possibility of using acronyms as linguistic descriptors for the classification of the analyzed electronic texts is considered in the paper. The proposed approach is implemented using a two-step procedure.
In the first stage, acronyms are extracted from several text documents of the field under consideration, followed by preparation of specialized acronyms dictionaries. Search results are sorted to remove impertinent pairs "acronyms/definitions", and then the alignment of the letters, contained in acronyms, with the definition words is carried out.
In the second stage, the modified metric DeMT, which allows to determine the pertinence definition of the acronym in the document under analysis is applied. Herewith, the adapted dictionary, created in the first stage of the process under consideration is used. The modified metric , which takes into account contexts can be successfully adapted to the problem of evaluating the pertinence of linguistic descriptors.
References
- Navarro, G. (2001). A guided tour to approximate string matching. ACM Computing Surveys, 33 (1), 31–88.. doi: 10.1145/375360.375365
- Duchateau, F., Bellahsene, Z., Roche, M. (2007). A context-based measure for discovering approximate semantic matching between schema elements. In Proceedings of IEEE Research Challenges in Information Science (RCIS), 9–20.
- Rahm, E., Bernstein, P. A. (2001). A survey of approaches to automatic schema matching. VLDB Journal: Very Large Data Bases, 10 (4), 334–350. doi: 10.1007/s007780100057
- Duchateau, F., Bellahsene, Z., Roche, M. (2008). Improving quality and performance of schema matching in large scale. Ingénierie des Systèmes d’Information (ISI), 13 (5), 59–82. doi: 10.3166/isi.13.5.59-82
- Aussenac-Gilles, N., Bourigault, D. (2003). Construction d’ontologies à partir de textes. In Actes de Traitement Automatique des Langues Naturelles (TALN), 2, 27–47.
- Turney, P. (2001). Mining the Web for synonyms: PMI–IR versus LSA on TOEFL. Proceedings of the 12th European Conference on Machine Learning (ECML), LNCS, 2167, 491–502. doi: 10.1007/3-540-44795-4_42
- Qamar, A., Gaussier, E. (2009). Online and batch learning of generalized cosine similarities. In Proceedings of International Conference on Data Mining (ICDM), 926–931. doi: 10.1109/icdm.2009.114
- Nyberg, K., Raiko, T., Hyvönen, E., Tiinanen, T. (2010). Document classification utilising ontologies and relations between documents. In Proceedings of the Eighth Workshop on Mining and Learning with Graphs (MLG), 86–93. doi: 10.1145/1830252.1830264
- Bellahsene, Z., Benbernou, S., Jaudoin, H., Pinet, F., Pivert, O., Toumani, F., Bernard, S., Colomb, P., Coletta, R., Coquery, E., De Marchi, F., Duchateau, F., Hacid, M.-S., HadjAli, A., Roche, M. (2010). Forum: a flexible data integration system based on data semantics. SIGMOD Record, 39 (2), 11–18.
- Roche, M., Prince, V. (2007). AcroDef: A quality measure for discriminating expansions of ambiguous acronyms. In Proceedings of CONTEXT, LNCS, Springer-Verlag, 411–424. doi: 10.1007/978-3-540-74255-5_31
- Roche, M. (2004). Intégration de la construction de la terminologie de domaines spécialisés dans un processus global de fouille de textes. Paris, 11.
- Smadja, F., McKeown, K. R., Hatzivassiloglou, V. (1996). Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22 (1), 1–38.
- Blandin, G. (2005). Dictionnaire de sigles et acronyms. Asankyeya. Available at: http://www.sigles.net
- Pike, R. (2004). Medline. USA. Available at: http://www.ncbi.nlm.nih.gov/PubMed
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2015 Лариса Эрнестовна Чалая, Юлия Юрьевна Харитонова
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with TECHNOLOGY CENTER PC, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher TECHNOLOGY CENTER PC does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.