Message clusterization method based on archive transformation
DOI:
https://doi.org/10.15587/2313-8416.2015.44364Keywords:
archiving, entropy, text recognition, spam, fishing, LZ77, Huffman algorithmAbstract
This article represents the method of the text’s parameters identification and their classification with the help of archiving. Using the direct bond between the archiving with LZ77 and Huffman algorithm and entropy, the text’s characteristics are identified, and they help to define its language, style, authorship, and cluster data files by their topic relevance
References
Thiago, S. G., Walmir, M. C. (2009). A review of machine learning approaches to Spam filtering. Expert Systems with Applications, 36 (7), 10206–10222. doi: 10.1016/j.eswa.2009.02.037
Schwarts, A. (2004). SpamAssasin. O’Reilly Media, 224.
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E. (1998). A Bayesian approach to filtering junk email. AAAI Technical Report, WS-98-05.
Vatolin, D., Ratushnyak, A., Smirnov, M., Yoockin, V. (2002). Data compression methods. Structure of archivers, image and video compression. Moscow, Russia: Dialog-MIFI, 384.
Ziv, J., Lempel, A. (1977). A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory, IT-23 (3), 337–343.
Benedetto, D., Caglioti, E., Loreto, V. (2002). Language Trees and Zipping. Physical review letter, 88 (4), 1–4. doi: 10.1103/physrevlett.88.048702
Algorithms, methods, source codes. Available at: http://algolist.manual.ru/compress/standard/huffman.php
Downloads
Published
Issue
Section
License
Copyright (c) 2015 Олексій Олександрович Сірий
This work is licensed under a Creative Commons Attribution 4.0 International License.
Our journal abides by the Creative Commons CC BY copyright rights and permissions for open access journals.
Authors, who are published in this journal, agree to the following conditions:
1. The authors reserve the right to authorship of the work and pass the first publication right of this work to the journal under the terms of a Creative Commons CC BY, which allows others to freely distribute the published research with the obligatory reference to the authors of the original work and the first publication of the work in this journal.
2. The authors have the right to conclude separate supplement agreements that relate to non-exclusive work distribution in the form in which it has been published by the journal (for example, to upload the work to the online storage of the journal or publish it as part of a monograph), provided that the reference to the first publication of the work in this journal is included.