Метод кластеризації повідомлень за допомогою архівуючого перетворення

Олексій Олександрович Сірий

doi:10.15587/2313-8416.2015.44364

Метод кластеризації повідомлень за допомогою архівуючого перетворення

Auteurs-es

Олексій Олександрович Сірий Київський національний університет України «Київський політехнічний інститут» пр. Перемоги, 37, м. Київ, Україна, 03056, Ukraine

DOI :

https://doi.org/10.15587/2313-8416.2015.44364

Mots-clés :

архівація, ентропія, розпізнавання тексту, спам, фішинг, LZ77, алгоритм Хаффмана

Résumé

В даній статті представлено метод визначення характеристик текстів та їх класифікації за допомогою архівування. Використовуючи прямий зв’язок архівування за допомогою алгоритмів LZ77 і Хаффмана з ентропією, виділяються ознаки тексту, що дозволяють визначати мову його написання, стиль, авторство, кластеризувати масиви даних за їх належністю до певної тематики

Biographie de l'auteur-e

Олексій Олександрович Сірий, Київський національний університет України «Київський політехнічний інститут» пр. Перемоги, 37, м. Київ, Україна, 03056

Кафедра захисту інформації

Фізико-технічний інститут

Références

Thiago, S. G., Walmir, M. C. (2009). A review of machine learning approaches to Spam filtering. Expert Systems with Applications, 36 (7), 10206–10222. doi: 10.1016/j.eswa.2009.02.037

Schwarts, A. (2004). SpamAssasin. O’Reilly Media, 224.

Sahami, M., Dumais, S., Heckerman, D., Horvitz, E. (1998). A Bayesian approach to filtering junk email. AAAI Technical Report, WS-98-05.

Vatolin, D., Ratushnyak, A., Smirnov, M., Yoockin, V. (2002). Data compression methods. Structure of archivers, image and video compression. Moscow, Russia: Dialog-MIFI, 384.

Ziv, J., Lempel, A. (1977). A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory, IT-23 (3), 337–343.

Benedetto, D., Caglioti, E., Loreto, V. (2002). Language Trees and Zipping. Physical review letter, 88 (4), 1–4. doi: 10.1103/physrevlett.88.048702

Algorithms, methods, source codes. Available at: http://algolist.manual.ru/compress/standard/huffman.php

Téléchargements

PDF (Українська)

Publié-e

2015-06-21

Numéro

Vol. 6 No. 2(11) (2015)

Rubrique

Technical Sciences

Licence

Cette œuvre est sous licence Creative Commons Attribution 4.0 International.

Our journal abides by the Creative Commons CC BY copyright rights and permissions for open access journals.

Authors, who are published in this journal, agree to the following conditions:

1. The authors reserve the right to authorship of the work and pass the first publication right of this work to the journal under the terms of a Creative Commons CC BY, which allows others to freely distribute the published research with the obligatory reference to the authors of the original work and the first publication of the work in this journal.

2. The authors have the right to conclude separate supplement agreements that relate to non-exclusive work distribution in the form in which it has been published by the journal (for example, to upload the work to the online storage of the journal or publish it as part of a monograph), provided that the reference to the first publication of the work in this journal is included.