Content-based image retrieval method in a multidimensional data model at big data scale

Authors

DOI:

https://doi.org/10.30837/2522-9818.2025.4.018

Keywords:

similarity search; big data; search algorithms; data structures; content-based image retrieval; parallel computing; high-performance computing; search efficiency; algorithm optimization.

Abstract

The subject of the study is the method and algorithms for content-based image retrieval within the Multidimensional Cube (MDC) model. The goal is to develop a search method based on image descriptor vectors and an algorithm that implements this method in both sequential and parallel versions for MDC. The research tasks include: defining requirements for the search method; analyzing the MDC model structure and defining the approach to the search method; developing search methods and algorithms for scenarios where the model is stored in RAM or in a relational database; integrating parallel computing into the algorithm; analyzing alternative models based on multidimensional trees, graphs, hashing, inverted indexing, quantization and inverted multi-index structures; developing evaluation metrics and conducting experiments to compare the efficiency of the MDC-based method with alternative search models. Methodology: analytical and comparative methods for search algorithm evaluation, modeling, and experimental verification were applied. Thread-level parallelism and hardware optimization methods were used, along with comparative analysis of model efficiency (KD-tree, Locality-Sensitive Hashing, Hierarchical Navigable Small World, Inverted File with Flat Compression, Inverted Multi-Index). Statistical methods were employed to assess results using recall, search time, and model construction time metrics. Experiments were conducted with both web-sourced and synthetic image descriptors, as well as load testing to evaluate the model’s throughput. Results: a new search method and the Wave-Search Algorithm were developed. Its parallel version achieves up to a 3x speedup. For top-10 and top-100 queries in a dataset of 1 million descriptors, MDC shows the best overall performance among the compared models based on the metrics and strong stability under load. Conclusions: the proposed search method and its implementation (Wave-Search Algorithm) efficiently utilize the MDC model’s structure for search tasks, outperforms alternative search models in terms of effectiveness, demonstrates robustness under load, and has significant potential for further development, including the use of hardware acceleration.

Author Biographies

Stanislav Danylenko, Kharkiv National University of Radio Electronics

PhD Student, Software Engineering Department

Kyrylo Smelyakov, Kharkiv National University of Radio Electronics

Doctor of Sciences (Engineering), Professor, Head of the Software Engineering Department

References

References

American Society for Indexing, "History of Information Retrieval", available at: https://asindexing.org/about-indexing/history-of-information-retrieval/ (last accessed 27.04.2025).

Maña, N., Babiera, J., Bayloces, K., Palmer, X.-L., Potter, L., Lavilles, R., Velasco, L. (2024), "Information Retrieval Systems: A Methodological Review", Proceedings of the Future Technologies Conference (FTC) 2024, Volume 3, Springer Nature Switzerland, Cham, P. 572–591. DOI: 10.1007/978-3-031-73125-9_36

Rostami, C., Hosseini, E., Saberi, M. (2021), "Information-seeking behavior in the digital age: use by faculty members of the internet, scientific databases and social networks", Information Discovery and Delivery, Vol. 50, No. 1, P. 87–98. DOI: 10.1108/idd-02-2020-0014

Jain, R. (2023). "A Comparative Study of Breadth First Search and Depth First Search Algorithms in Solving the Water Jug Problem on Google Colab", SSRN Electronic Journal. DOI: 10.2139/ssrn.4402567

Li, X., Yang, J., Ma, J. (2021), "Recent developments of content-based image retrieval (CBIR)", Neurocomputing, Vol. 452, P. 675–689. DOI: 10.1016/j.neucom.2020.07.139

Alsmadi, M. (2020), "Content-Based Image Retrieval Using Color, Shape and Texture Descriptors and Features", Arabian Journal for Science and Engineering, Vol. 45, No. 4, P. 3317–3330. DOI: 10.1007/s13369-020-04384-y.

Zhang, Q., Canosa, R. (2014), "A comparison of histogram distance metrics for content-based image retrieval", Imaging and Multimedia Analytics in a Web and Mobile World 2014, SPIE, Vol. 9027. DOI: 10.1117/12.2042359

Li, M., Wang, H., Dai, H., Li, M., Chai, C., Gu, R., Chen, F., Chen, Z., Li, S., Liu, Q., G. Chen. (2024), "A Survey of Multi-Dimensional Indexes: Past and Future Trends", IEEE Transactions on Knowledge and Data Engineering, Vol. 36, P. 3635–3655. DOI: 10.1109/tkde.2024.3364183

Samoladas, D., Karras, C., Karras, A., Theodorakopoulos, L., Sioutas S. (2022), "Tree Data Structures and Efficient Indexing Techniques for Big Data Management: A Comprehensive Study", Proceedings of the 26th Pan-Hellenic Conference on Informatics, ACM, New York, USA, P. 123–132. DOI: 10.1145/3575879.3575977

Rakotondrasoa, H.M., Bucher, M., Sinayskiy, I. (2023), "Quantitative Comparison of Nearest Neighbor Search Algorithms", arXiv. DOI: 10.48550/arXiv.2307.05235

Liu, Q., Li, M., Zeng, Y., Shen, Y., Chen, L. (2025), "How good are multi-dimensional learned indexes? An experimental survey", The VLDB Journal, Vol. 34. DOI: 10.1007/s00778-024-00893-6

Li, D., Esquivel, J. (2025), "Trust-Aware Hybrid Collaborative Recommendation with Locality-Sensitive Hashing", Tsinghua Science and Technology, Vol. 30, No. 4, P. 1421–1434. DOI: 10.26599/tst.2023.9010096

Weng, Z., Zhu, Y., Lan, Y., Huang. L.-K. (2019), "A fast online spherical hashing method based on data sampling for large scale image retrieval", Neurocomputing, Vol. 364, P. 209–218. DOI: 10.1016/j.neucom.2019.06.053

Ryali, C., Hopfield, J., Grinberg, L., Krotov, D. (2020), "Bio-Inspired Hashing for Unsupervised Similarity Search", arXiv. DOI: 10.48550/arXiv.2001.04907

Jiang, X., Hu, F. (2024), "Multi-scale Adaptive Feature Fusion Hashing for Image Retrieval", Arabian Journal for Science and Engineering. DOI: 10.1007/s13369-024-09627-w

Liu, R., Zhao, J., Chu, X., Liang, Y., Zhou, W., He, J. (2023), "Can LSH (locality-sensitive hashing) be replaced by neural network?", Soft Computing, Vol. 28, P. 1041–1053. DOI: 10.1007/s00500-023-09402-3

Malkov, Y., Yashunin D. (2020), "Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 42, No. 4, P. 824–836. DOI: 10.1109/tpami.2018.2889473

Dong, W., Moses, C., Li, K. (2011), "Efficient k-nearest neighbor graph construction for generic similarity measures", Proceedings of the 20th International Conference on World Wide Web, ACM, New York, USA. DOI: 10.1145/1963405.1963487

Weng, S., Fan, Z., Gou, J. (2024), “A fast DBSCAN algorithm using a bi-directional HNSW index structure for big data”, International Journal of Machine Learning and Cybernetics, Vol. 15, No. 8, P. 3471–3494. DOI: 10.1007/s13042-024-02104-8

Zhibing, H. (2024), "Quick and Efficient Large Scale Approximate Nearest Neighbor Search on High-Dimensional Data", 2024 21st International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), IEEE, P. 1–5. DOI: 10.1109/iccwamtip64812.2024.10873693

Zhao, J., Pierre Both, J., Konstantinidis, K. (2024), "Approximate nearest neighbor graph provides fast and efficient embedding with applications for large-scale biological data", NAR Genomics and Bioinformatics, Vol. 6, No. 4. DOI: 10.1093/nargab/lqae172.

Yousaf, M., Shakoor Khan, M., Ullah, S. (2024), "An Extended-Isomap for high-dimensional data accuracy and efficiency: a comprehensive survey", Multimedia Tools and Applications, Vol. 83, No. 38, P. 85523–85574. DOI: 10.1007/s11042-024-19917-y

Liu, Y., Pan, Z., Wang, L., Wang Y. (2022), "A new fast inverted file-based algorithm for approximate nearest neighbor search without accuracy reduction", Information Sciences, Vol. 608, P. 613–629. DOI: 10.1016/j.ins.2022.06.086

Bazdyrev, A. (2023), Semi-supervised inverted file index approach for approximate nearest neighbor search. System Research and Information Technologies, No. 4, P. 69–75. DOI: 10.20535/srit.2308-8893.2023.4.05

Matsui, Y., Hinami, R., Satoh, S. (2018), "Reconfigurable Inverted Index", Proceedings of the 26th ACM International Conference on Multimedia, ACM, New York, USA, P. 1715–1723. DOI: 10.1145/3240508.3240630

Jégou, H., Douze, M., Schmid, C. (2011), "Product Quantization for Nearest Neighbor Search", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, No. 1, P. 117–128. DOI: 10.1109/tpami.2010.57

Babenko, A., Lempitsky, V. (2012), "The inverted multi-index", 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, P. 3069–3076. DOI: 10.1109/cvpr.2012.6248038

Qiu, Z., Liu, J., Chen, Y., King, I. (2024), "HiHPQ: Hierarchical Hyperbolic Product Quantization for Unsupervised Image Retrieval", Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, No. 5, P. 4614–4622. DOI: 10.1609/aaai.v38i5.28261

Gu, L., Liu, J., Liu, X., Wan, W., Sun, J. (2024), "Entropy-Optimized Deep Weighted Product Quantization for Image Retrieval", IEEE Transactions on Image Processing, Vol. 33, P. 1162–1174. DOI: 10.1109/tip.2024.3359066

Jamalifard, M., Andreu-Perez, J., Hagras, H., López, L. (2024), "Fuzzy Norm-Explicit Product Quantization for Recommender Systems", IEEE Transactions on Fuzzy Systems, Vol. 32, No. 5, P. 2987–2998. DOI: 10.1109/tfuzz.2024.3365722

pgvector, GitHub. "pgvector: Open-source vector similarity search for Postgres", available at: https://github.com/pgvector/pgvector (last accessed 27.04.2025).

Danylenko, S., Smelyakov, S. (2025), "Development of a Multidimensional Data Model for Efficient Content-based Image Retrieval in Big Data Storage", Radioelectronic and Computer Systems, Vol. 2025, No. 1, P. 137–152. DOI: https://doi.org/10.32620/reks.2025.1.10

Sandoz, P. "JEP 448: Vector API (Sixth Incubator)", available at: https://openjdk.org/jeps/448 (last accessed 01.05.2025).

Deeplearning4j, "Deeplearning4j Suite Overview", available at: https://deeplearning4j.konduit.ai/ (last accessed 01.05.2025).

jocl.org, "Java Bindings for OpenCL", available at: http://www.jocl.org/ (last accessed 01.05.2025).

jcuda.org, "JCuda", available at: http://www.jcuda.org/jcuda/JCuda.html (last accessed 01.05.2025).

Nevliudov, I., Yevsieiev, V., Maksymova, S., Gopejenko, V., Kosenko, V. (2025), "Development of mathematical support for adaptive control for the intelligent gripper of the collaborative robot manipulator", Advanced Information Systems, Vol. 9, No. 3, P. 57–65. DOI: https://doi.org/10.20998/2522-9052.2025.3.07

COCO, "Common Objects in Context", available at: https://cocodataset.org/ (last accessed 02.05.2025).

Greg, "Various Tagged Images", available at: https://www.kaggle.com/datasets/greg115/various-tagged-images (last accessed 02.05.2025).

Hyun, W. "Amazon Bin Image Dataset (536,434 images, 224×224)", available at: https://www.kaggle.com/datasets/williamhyun/amazon-bin-image-dataset-536434-images-224x224 (last accessed 02.05.2025).

Danylenko, S. "MDC-2025-WSA", Available at: https://drive.google.com/drive/folders/1LNgV8MzNkhXC7elWOQemvFFkBddW_wPM?usp=sharing/ (last accessed 02.05.2025).

Postman API Platform, "Postman: The World’s Leading API Platform", available at: https://www.postman.com/ (last accessed 02.05.2025).

Downloads

Published

2025-12-28

How to Cite

Danylenko, S., & Smelyakov, K. (2025). Content-based image retrieval method in a multidimensional data model at big data scale. INNOVATIVE TECHNOLOGIES AND SCIENTIFIC SOLUTIONS FOR INDUSTRIES, (4(34), 18–31. https://doi.org/10.30837/2522-9818.2025.4.018