Метод кластеризації повідомлень за допомогою архівуючого перетворення

Олексій Олександрович Сірий

doi:10.15587/2313-8416.2015.44364

Message clusterization method based on archive transformation

Authors

Олексій Олександрович Сірий Kyiv National University of Ukraine «Kyiv polytechnic institute» 37 Peremogy ave, Kyiv, Ukraine, 03056, Ukraine

DOI:

https://doi.org/10.15587/2313-8416.2015.44364

Keywords:

archiving, entropy, text recognition, spam, fishing, LZ77, Huffman algorithm

Abstract

This article represents the method of the text’s parameters identification and their classification with the help of archiving. Using the direct bond between the archiving with LZ77 and Huffman algorithm and entropy, the text’s characteristics are identified, and they help to define its language, style, authorship, and cluster data files by their topic relevance

Author Biography

Олексій Олександрович Сірий, Kyiv National University of Ukraine «Kyiv polytechnic institute» 37 Peremogy ave, Kyiv, Ukraine, 03056

Department of Information Security

Institute of Physics and Technology

References

Thiago, S. G., Walmir, M. C. (2009). A review of machine learning approaches to Spam filtering. Expert Systems with Applications, 36 (7), 10206–10222. doi: 10.1016/j.eswa.2009.02.037

Schwarts, A. (2004). SpamAssasin. O’Reilly Media, 224.

Sahami, M., Dumais, S., Heckerman, D., Horvitz, E. (1998). A Bayesian approach to filtering junk email. AAAI Technical Report, WS-98-05.

Vatolin, D., Ratushnyak, A., Smirnov, M., Yoockin, V. (2002). Data compression methods. Structure of archivers, image and video compression. Moscow, Russia: Dialog-MIFI, 384.

Ziv, J., Lempel, A. (1977). A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory, IT-23 (3), 337–343.

Benedetto, D., Caglioti, E., Loreto, V. (2002). Language Trees and Zipping. Physical review letter, 88 (4), 1–4. doi: 10.1103/physrevlett.88.048702

Algorithms, methods, source codes. Available at: http://algolist.manual.ru/compress/standard/huffman.php

Downloads

PDF (Українська)

Published

2015-06-21

Issue

Vol. 6 No. 2(11) (2015)

Section

Technical Sciences

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Our journal abides by the Creative Commons CC BY copyright rights and permissions for open access journals.

Authors, who are published in this journal, agree to the following conditions:

1. The authors reserve the right to authorship of the work and pass the first publication right of this work to the journal under the terms of a Creative Commons CC BY, which allows others to freely distribute the published research with the obligatory reference to the authors of the original work and the first publication of the work in this journal.

2. The authors have the right to conclude separate supplement agreements that relate to non-exclusive work distribution in the form in which it has been published by the journal (for example, to upload the work to the online storage of the journal or publish it as part of a monograph), provided that the reference to the first publication of the work in this journal is included.