Implantation of indexing optimization technology for highly specialized terms based on Metaphone phonetical algorithm
Keywords:fuzzy match, phonetic rule, phonetic algorithm, Metaphone, Ukrainian surname
AbstractWhen compiling databases, for example to meet the needs of healthcare establishments, there is quite a common problem with the introduction and further processing of names and surnames of doctors and patients that are highly specialized both in terms of pronunciation and writing. This is because names and surnames of people cannot be unique, their notation is not subject to any rules of phonetics, while their length in different languages may not match. With the advent of the Internet, this situation has become generally critical and can lead to that multiple copies of e-mails are sent to one address. It is possible to solve the specified problem by using phonetic algorithms for comparing words Daitch-Mokotoff, SoundEx, NYSIIS, Polyphone, and Metaphone, as well as the Levenstein and Jaro algorithms, Q-gram-based algorithms, which make it possible to find distances between words. The most widespread among them are the SoundЕx and Metaphone algorithms, which are designed to index the words based on their sound, taking into consideration the rules of pronunciation. By applying the Metaphone algorithm, an attempt has been made to optimize the phonetic search processes for tasks of fuzzy coincidence, for example, at data deduplication in various databases and registries, in order to reduce the number of errors of incorrect input of surnames. An analysis of the most common surnames reveals that some of them are of the Ukrainian or Russian origin. At the same time, the rules following which the names are pronounced and written, for example in Ukrainian, differ radically from basic algorithms for English and differ quite significantly for the Russian language. That is why a phonetic algorithm should take into consideration first of all the peculiarities in the formation of Ukrainian surnames, which is of special relevance now. The paper reports results from an experiment to generate phonetic indexes, as well as results of the increased performance when using the formed indexes. A method for adapting the search for other areas and several related languages is presented separately using an example of search for medical preparations
- Branting, L. K. (2003). A comparative evaluation of name-matching algorithms. Proceedings of the 9th International Conference on Artificial Intelligence and Law - ICAIL’03, 224–232. doi: https://doi.org/10.1145/1047788.1047837
- Snae, C. (2007). A Comparison and Analysis of Name Matching Algorithms. International Scholarly and Scientific Research & Innovation, 1 (1), 107–112.
- Peng, T., Li, L., Kennedy, J. (2012). A Comparison of Techniques for Name Matching. PsycEXTRA Dataset. doi: https://doi.org/10.1037/e527372013-010
- Karahtanov, D. S. (2010). Realizatsiya algoritma Metaphone dlya kirillicheskih familiy sredstvami yazyka PL/SQL. Molodoy uchenniy, 8, 162–168.
- Paramonov, V. V., Shigarov, A. O., Ruzhnikov, G. M., Belykh, P. V. (2016). Polyphon: An Algorithm for Phonetic String Matching in Russian Language. Information and Software Technologies, 568–579. doi: https://doi.org/10.1007/978-3-319-46254-7_46
- Baruah, D., Kakoti Mahanta, A. (2015). Design and Development of Soundex for Assamese Language. International Journal of Computer Applications, 117 (9), 9–12. doi: https://doi.org/10.5120/20581-3000
- Silbert J. M. (1970). The World’s First Computerized Criminal-Justice Informationsharing System the New York State Identification and Intelligence System (NYSIIS). Criminology, 8 (2), 107–128. doi: https://doi.org/10.1111/j.1745-9125.1970.tb00734.x
- Zahoransky, D., Polasek, I. (2015). Text Search of Surnames in Some Slavic and Other Morphologically Rich Languages Using Rule Based Phonetic Algorithms. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23 (3), 553–563. doi: https://doi.org/10.1109/taslp.2015.2393393
- Philips, L. (1990). Hanging on the Metaphone. Computer Language, 7 (12), 39–43.
- Parmar, V. P., Kumbharana, C. K. (2014). Study Existing Various Phonetic Algorithms and Designing and Development of a working model for the New Developed Algorithm and Comparison by implementing it with Existing Algorithm(s). International Journal of Computer Applications, 98 (19), 45–49. doi: https://doi.org/10.5120/17295-7795
- Koneru, K., Pulla, V. S. V., Varol, C. (2016). Performance Evaluation of Phonetic Matching Algorithms on English Words and Street Names - Comparison and Correlation. Proceedings of the 5th International Conference on Data Management Technologies and Applications. doi: https://doi.org/10.5220/0005926300570064
- Ukrainskyi pravopys. Kabinetom Ministriv Ukrainy (Postanova No. 437 vid 22 travnia 2019 r.). Available at: https://mon.gov.ua/storage/app/media/zagalna%20serednya/05062019-onovl-pravo.pdf
- Redko, Yu. K. (1968). Dovidnyk ukrainskykh prizvyshch. Kyiv: Radianska shkola, 265.
- Chyselnist naselennia (za otsinkoiu) na 1 sichnia 2018 roku ta serednia chyselnist u 2017 rotsi. Derzhavna sluzhba statystyky Ukrainy. Available at: http://www.ukrstat.gov.ua/operativ/operativ2017/ds/kn/kn_u/kn1217_u.html
- E.6. Release 10.5. Appendix E. Release Notes (2019). PostgreSQL Global Development Group. Available at: https://www.postgresql.org/docs/10/release-10-5.html
- Database Management Systems. JetBrains. Available at: https://www.jetbrains.com/datagrip/
- Programmniy kompleks «Apteka». Informatsionnyy WEB-servis. Available at: https://pharmbase.com.ua/ru/project/web-content/
- Elektronna medychna systema dlia patsientiv ta likariv. Helsi. Available at: https://helsi.me
How to Cite
Copyright (c) 2019 Volodymyr Buriachok, Matin Hadzhyiev, Volodymyr Sokolov, Pavlo Skladannyi, Lidiia Kuzmenko
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with PC TECHNOLOGY CENTER, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher PC TECHNOLOGY CENTER does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.