Content-based image retrieval method in a multidimensional data model at big data scale
DOI:
https://doi.org/10.30837/2522-9818.2025.4.018Keywords:
similarity search; big data; search algorithms; data structures; content-based image retrieval; parallel computing; high-performance computing; search efficiency; algorithm optimization.Abstract
The subject of the study is the method and algorithms for content-based image retrieval within the Multidimensional Cube (MDC) model. The goal is to develop a search method based on image descriptor vectors and an algorithm that implements this method in both sequential and parallel versions for MDC. The research tasks include: defining requirements for the search method; analyzing the MDC model structure and defining the approach to the search method; developing search methods and algorithms for scenarios where the model is stored in RAM or in a relational database; integrating parallel computing into the algorithm; analyzing alternative models based on multidimensional trees, graphs, hashing, inverted indexing, quantization and inverted multi-index structures; developing evaluation metrics and conducting experiments to compare the efficiency of the MDC-based method with alternative search models. Methodology: analytical and comparative methods for search algorithm evaluation, modeling, and experimental verification were applied. Thread-level parallelism and hardware optimization methods were used, along with comparative analysis of model efficiency (KD-tree, Locality-Sensitive Hashing, Hierarchical Navigable Small World, Inverted File with Flat Compression, Inverted Multi-Index). Statistical methods were employed to assess results using recall, search time, and model construction time metrics. Experiments were conducted with both web-sourced and synthetic image descriptors, as well as load testing to evaluate the model’s throughput. Results: a new search method and the Wave-Search Algorithm were developed. Its parallel version achieves up to a 3x speedup. For top-10 and top-100 queries in a dataset of 1 million descriptors, MDC shows the best overall performance among the compared models based on the metrics and strong stability under load. Conclusions: the proposed search method and its implementation (Wave-Search Algorithm) efficiently utilize the MDC model’s structure for search tasks, outperforms alternative search models in terms of effectiveness, demonstrates robustness under load, and has significant potential for further development, including the use of hardware acceleration.
References
References
American Society for Indexing, "History of Information Retrieval", available at: https://asindexing.org/about-indexing/history-of-information-retrieval/ (last accessed 27.04.2025).
Maña, N., Babiera, J., Bayloces, K., Palmer, X.-L., Potter, L., Lavilles, R., Velasco, L. (2024), "Information Retrieval Systems: A Methodological Review", Proceedings of the Future Technologies Conference (FTC) 2024, Volume 3, Springer Nature Switzerland, Cham, P. 572–591. DOI: 10.1007/978-3-031-73125-9_36
Rostami, C., Hosseini, E., Saberi, M. (2021), "Information-seeking behavior in the digital age: use by faculty members of the internet, scientific databases and social networks", Information Discovery and Delivery, Vol. 50, No. 1, P. 87–98. DOI: 10.1108/idd-02-2020-0014
Jain, R. (2023). "A Comparative Study of Breadth First Search and Depth First Search Algorithms in Solving the Water Jug Problem on Google Colab", SSRN Electronic Journal. DOI: 10.2139/ssrn.4402567
Li, X., Yang, J., Ma, J. (2021), "Recent developments of content-based image retrieval (CBIR)", Neurocomputing, Vol. 452, P. 675–689. DOI: 10.1016/j.neucom.2020.07.139
Alsmadi, M. (2020), "Content-Based Image Retrieval Using Color, Shape and Texture Descriptors and Features", Arabian Journal for Science and Engineering, Vol. 45, No. 4, P. 3317–3330. DOI: 10.1007/s13369-020-04384-y.
Zhang, Q., Canosa, R. (2014), "A comparison of histogram distance metrics for content-based image retrieval", Imaging and Multimedia Analytics in a Web and Mobile World 2014, SPIE, Vol. 9027. DOI: 10.1117/12.2042359
Li, M., Wang, H., Dai, H., Li, M., Chai, C., Gu, R., Chen, F., Chen, Z., Li, S., Liu, Q., G. Chen. (2024), "A Survey of Multi-Dimensional Indexes: Past and Future Trends", IEEE Transactions on Knowledge and Data Engineering, Vol. 36, P. 3635–3655. DOI: 10.1109/tkde.2024.3364183
Samoladas, D., Karras, C., Karras, A., Theodorakopoulos, L., Sioutas S. (2022), "Tree Data Structures and Efficient Indexing Techniques for Big Data Management: A Comprehensive Study", Proceedings of the 26th Pan-Hellenic Conference on Informatics, ACM, New York, USA, P. 123–132. DOI: 10.1145/3575879.3575977
Rakotondrasoa, H.M., Bucher, M., Sinayskiy, I. (2023), "Quantitative Comparison of Nearest Neighbor Search Algorithms", arXiv. DOI: 10.48550/arXiv.2307.05235
Liu, Q., Li, M., Zeng, Y., Shen, Y., Chen, L. (2025), "How good are multi-dimensional learned indexes? An experimental survey", The VLDB Journal, Vol. 34. DOI: 10.1007/s00778-024-00893-6
Li, D., Esquivel, J. (2025), "Trust-Aware Hybrid Collaborative Recommendation with Locality-Sensitive Hashing", Tsinghua Science and Technology, Vol. 30, No. 4, P. 1421–1434. DOI: 10.26599/tst.2023.9010096
Weng, Z., Zhu, Y., Lan, Y., Huang. L.-K. (2019), "A fast online spherical hashing method based on data sampling for large scale image retrieval", Neurocomputing, Vol. 364, P. 209–218. DOI: 10.1016/j.neucom.2019.06.053
Ryali, C., Hopfield, J., Grinberg, L., Krotov, D. (2020), "Bio-Inspired Hashing for Unsupervised Similarity Search", arXiv. DOI: 10.48550/arXiv.2001.04907
Jiang, X., Hu, F. (2024), "Multi-scale Adaptive Feature Fusion Hashing for Image Retrieval", Arabian Journal for Science and Engineering. DOI: 10.1007/s13369-024-09627-w
Liu, R., Zhao, J., Chu, X., Liang, Y., Zhou, W., He, J. (2023), "Can LSH (locality-sensitive hashing) be replaced by neural network?", Soft Computing, Vol. 28, P. 1041–1053. DOI: 10.1007/s00500-023-09402-3
Malkov, Y., Yashunin D. (2020), "Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 42, No. 4, P. 824–836. DOI: 10.1109/tpami.2018.2889473
Dong, W., Moses, C., Li, K. (2011), "Efficient k-nearest neighbor graph construction for generic similarity measures", Proceedings of the 20th International Conference on World Wide Web, ACM, New York, USA. DOI: 10.1145/1963405.1963487
Weng, S., Fan, Z., Gou, J. (2024), “A fast DBSCAN algorithm using a bi-directional HNSW index structure for big data”, International Journal of Machine Learning and Cybernetics, Vol. 15, No. 8, P. 3471–3494. DOI: 10.1007/s13042-024-02104-8
Zhibing, H. (2024), "Quick and Efficient Large Scale Approximate Nearest Neighbor Search on High-Dimensional Data", 2024 21st International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), IEEE, P. 1–5. DOI: 10.1109/iccwamtip64812.2024.10873693
Zhao, J., Pierre Both, J., Konstantinidis, K. (2024), "Approximate nearest neighbor graph provides fast and efficient embedding with applications for large-scale biological data", NAR Genomics and Bioinformatics, Vol. 6, No. 4. DOI: 10.1093/nargab/lqae172.
Yousaf, M., Shakoor Khan, M., Ullah, S. (2024), "An Extended-Isomap for high-dimensional data accuracy and efficiency: a comprehensive survey", Multimedia Tools and Applications, Vol. 83, No. 38, P. 85523–85574. DOI: 10.1007/s11042-024-19917-y
Liu, Y., Pan, Z., Wang, L., Wang Y. (2022), "A new fast inverted file-based algorithm for approximate nearest neighbor search without accuracy reduction", Information Sciences, Vol. 608, P. 613–629. DOI: 10.1016/j.ins.2022.06.086
Bazdyrev, A. (2023), Semi-supervised inverted file index approach for approximate nearest neighbor search. System Research and Information Technologies, No. 4, P. 69–75. DOI: 10.20535/srit.2308-8893.2023.4.05
Matsui, Y., Hinami, R., Satoh, S. (2018), "Reconfigurable Inverted Index", Proceedings of the 26th ACM International Conference on Multimedia, ACM, New York, USA, P. 1715–1723. DOI: 10.1145/3240508.3240630
Jégou, H., Douze, M., Schmid, C. (2011), "Product Quantization for Nearest Neighbor Search", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, No. 1, P. 117–128. DOI: 10.1109/tpami.2010.57
Babenko, A., Lempitsky, V. (2012), "The inverted multi-index", 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, P. 3069–3076. DOI: 10.1109/cvpr.2012.6248038
Qiu, Z., Liu, J., Chen, Y., King, I. (2024), "HiHPQ: Hierarchical Hyperbolic Product Quantization for Unsupervised Image Retrieval", Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, No. 5, P. 4614–4622. DOI: 10.1609/aaai.v38i5.28261
Gu, L., Liu, J., Liu, X., Wan, W., Sun, J. (2024), "Entropy-Optimized Deep Weighted Product Quantization for Image Retrieval", IEEE Transactions on Image Processing, Vol. 33, P. 1162–1174. DOI: 10.1109/tip.2024.3359066
Jamalifard, M., Andreu-Perez, J., Hagras, H., López, L. (2024), "Fuzzy Norm-Explicit Product Quantization for Recommender Systems", IEEE Transactions on Fuzzy Systems, Vol. 32, No. 5, P. 2987–2998. DOI: 10.1109/tfuzz.2024.3365722
pgvector, GitHub. "pgvector: Open-source vector similarity search for Postgres", available at: https://github.com/pgvector/pgvector (last accessed 27.04.2025).
Danylenko, S., Smelyakov, S. (2025), "Development of a Multidimensional Data Model for Efficient Content-based Image Retrieval in Big Data Storage", Radioelectronic and Computer Systems, Vol. 2025, No. 1, P. 137–152. DOI: https://doi.org/10.32620/reks.2025.1.10
Sandoz, P. "JEP 448: Vector API (Sixth Incubator)", available at: https://openjdk.org/jeps/448 (last accessed 01.05.2025).
Deeplearning4j, "Deeplearning4j Suite Overview", available at: https://deeplearning4j.konduit.ai/ (last accessed 01.05.2025).
jocl.org, "Java Bindings for OpenCL", available at: http://www.jocl.org/ (last accessed 01.05.2025).
jcuda.org, "JCuda", available at: http://www.jcuda.org/jcuda/JCuda.html (last accessed 01.05.2025).
Nevliudov, I., Yevsieiev, V., Maksymova, S., Gopejenko, V., Kosenko, V. (2025), "Development of mathematical support for adaptive control for the intelligent gripper of the collaborative robot manipulator", Advanced Information Systems, Vol. 9, No. 3, P. 57–65. DOI: https://doi.org/10.20998/2522-9052.2025.3.07
COCO, "Common Objects in Context", available at: https://cocodataset.org/ (last accessed 02.05.2025).
Greg, "Various Tagged Images", available at: https://www.kaggle.com/datasets/greg115/various-tagged-images (last accessed 02.05.2025).
Hyun, W. "Amazon Bin Image Dataset (536,434 images, 224×224)", available at: https://www.kaggle.com/datasets/williamhyun/amazon-bin-image-dataset-536434-images-224x224 (last accessed 02.05.2025).
Danylenko, S. "MDC-2025-WSA", Available at: https://drive.google.com/drive/folders/1LNgV8MzNkhXC7elWOQemvFFkBddW_wPM?usp=sharing/ (last accessed 02.05.2025).
Postman API Platform, "Postman: The World’s Leading API Platform", available at: https://www.postman.com/ (last accessed 02.05.2025).
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Our journal abides by the Creative Commons copyright rights and permissions for open access journals.
Authors who publish with this journal agree to the following terms:
Authors hold the copyright without restrictions and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-commercial and non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their published work online (e.g., in institutional repositories or on their website) as it can lead to productive exchanges, as well as earlier and greater citation of published work.












