Text image compression based on statistical analysis and classification of the vertical line elements

Authors

  • Владимир Георгиевич Иванов Yaroslav Mudryi NationalLaw University Pushkin street , 77, Kharkov, Ukraine, 61024, Ukraine
  • Юрий Вячеславович Ломоносов Yaroslav Mudryi NationalLaw University Pushkin street, 77, Kharkov, Ukraine, 61024, Ukraine
  • Михаил Григорьевич Любарский Yaroslav Mudryi NationalLaw University Pushkin street, 77, Kharkov, Ukraine, 61024, Ukraine

DOI:

https://doi.org/10.15587/1729-4061.2014.26298

Keywords:

text image compression, vertical line elements, statistical analysis, classification

Abstract

A new original method for text data image compression is presented. Vertical line elements rather than connecting symbols of the text image are used as the main processing element. The given probability model quite accurately describes possible distortions of the vertical line elements, caused by printing and scanning noise. Based on the accepted probability model and statistical analysis methods, minimum most plausible set of undistorted elements in the entire set of the investigated vertical line elements is found. For each vertical element, the probability that this element is a distortion of an element of the set of undistorted line elements is found. Classification of the vertical line elements of the text image is based on a probabilistic assessment of the classified elements belonging to a single center. The end result of the proposed method is forming a dictionary of connecting symbols of the text image, where each class is represented by its most probable image and an allocation map of connecting symbols on the plane of the studied image. The proposed text image processing method has allowed to obtain a relatively high compression ratio with good quality of the reconstructed image. Comparison with the currently best special text image compression algorithm - JB2, within the format DjVu, has shown that the proposed algorithm has the advantage in data compression ratio of about 37% in processing a text page image with a resolution of 300 dpi.

Author Biographies

Владимир Георгиевич Иванов, Yaroslav Mudryi NationalLaw University Pushkin street , 77, Kharkov, Ukraine, 61024

Doctor of Technical Sciences, Professor, Head of Department

Department of Computer Science and Engineering

Юрий Вячеславович Ломоносов, Yaroslav Mudryi NationalLaw University Pushkin street, 77, Kharkov, Ukraine, 61024

Candidate of engineerings sciences, associate professor

Department of Computer Science

Михаил Григорьевич Любарский, Yaroslav Mudryi NationalLaw University Pushkin street, 77, Kharkov, Ukraine, 61024

Doctor physical and mathematical sciences, professor

Department of Computer Science

References

  1. Technical Papers from AT&T Labs. Available at: http://djvuzone.org/techpapers/index.html.
  2. DjVu.org. Available at: http://www.djvu.org/
  3. Haffner, P., Bottou, L., Howard, P. G., LeCun, Y. (1999). DjVu: Analyzing and Compressing Scanned Documents for Internet Distribution. Fifth International Conference on Document Analysis and Recognition (ICDAR'99), 625. doi:10.1109/icdar.1999.791865
  4. JBIG2.com : An Introduction to JBIG2. Available at: http://jbig2.com/index.html
  5. Ayvazian, S. A., Bukhshtaber, V .M., Eniukov, I. S. (1989). Prikladnaja statistika: Klassifikacija i snizhenie razmernosti [Applied Statistics: Classification and reduction of dimensionality]. Moscow: Finansy i statistika, 1989, 607.
  6. Ivanov, V. G., Lomonosov, J. V., Lubarskiy, M. G. (2009). Compression of Images on the Busis of Automatic and Indistinct Classification of Fragments. Journal of Automation and information sciences, 41 (1), 27–39. doi:10.1615/jautomatinfscien.v41.i1.40
  7. Shlezinger, M. I. (1983). Matematicheskie sredstva obrabotki izobrazhenij [The mathematical methods of image processing]. Kiev: Naukova dumka, 200.
  8. Gonsales, R. C., Woods, R. E. (2005). Digital Image Processing. Moscow, Tekhnosfera, 1072.
  9. Zemskov, V. N., Kim, I. S. (2003). Szhatie izobrazhenij na osnove avtomaticheskoj klassifikacii [Image compression based on the automatic classification]. Izvestija vuzov. Jelektronika [Proceedings of the universities. Electronics], 2, 50-56.
  10. Mallat, S. (1989). Multiresolution Approximation and Wavelet Orthonormal Bases L2(R). Trans of the American Mathematical Society, 315 (1), 68–87. doi:10.1090/s0002-9947-1989-1008470-5
  11. Gupta, M. R., Stroilov, A. (2005). Segmenting for wavelet compression. Data Compression Conference (DCC’05), 462. doi:10.1109/dcc.2005.80
  12. Montiel, E., Aquado, A. S., Nixon, M. S. (2005). Texture classification via conditional histograms. Pattern Recognition Letters, 26 (11), 1740–1751. doi:10.1016/j.patrec.2005.02.004
  13. Lakhani, G. (2008). Improving Image Decomposition Method of the 3-MRC Coding of Scanned Compound Document Images. Sixth Indian Conference on Computer Vision, Graphics & Image Processing, 289–296. doi:10.1109/icvgip.2008.94
  14. Ding, W., Liu, D., He, Y., Wu, F. (2006). Block-based Fast Compression for Compound Images. IEEE International Conference on Multimedia and Expo, 809–812. doi:10.1109/icme.2006.262624
  15. Malvar, H. S. (2001). Fast Adaptive Encoder for Bi-Level Images. Data Compression Conference (DCC '01), 253. doi:10.1109/dcc.2001.917156
  16. Imura, H., Tanaka, Y. (2009). Compression and String Matching Method for Printed Document Images. 10th International Conference on Document Analysis and Recognition, 291–295. doi:10.1109/icdar.2009.182
  17. Ivanov, V. G., Lubarskiy, M. G., Lomonosov, J. V. (2010). Compression of Text Image Based on Characters and Their Classification. Journal of Automation and information sciences, 42 (11), 46–57. doi:10.1615/jautomatinfscien.v42.i11.50
  18. Ivanov, V. G., Lubarskiy, M. G., Lomonosov, J. V. (2011). Text Image Compression Based on the Formation and Classification of Vertical Elements of a Row in the Graphical Dictionary of Symbol Data. Journal of Automation and information sciences, 43(10), 29–41. doi:10.1615/jautomatinfscien.v43.i10.40
  19. Kanatnikov, A. N., Krishhenko, A .P. Funkcii neskol'kih peremennyh [Function of several variables]. Available at: http://mathmod.bmstu.ru/Docs/Eduwork/la_fnp/FNP-14.pdf
  20. Nekotorye materialy iz lekcij po analizu. Teorema o nejavnoj funkcii. [Some material from the lectures on analysis. The implicit function theorem]. Available at: http://new.math.msu.su/Sites/demosite/Uploads/Neyavnaya%20funktsiya.B6EF5654E2C8486FBA1EF9F186B32F1A.pdf

Published

2014-07-24

How to Cite

Иванов, В. Г., Ломоносов, Ю. В., & Любарский, М. Г. (2014). Text image compression based on statistical analysis and classification of the vertical line elements. Eastern-European Journal of Enterprise Technologies, 4(2(70), 4–15. https://doi.org/10.15587/1729-4061.2014.26298