Modern approaches to data storage: comparison of relational and cloud data warehouses using etl and elt methods
DOI:
https://doi.org/10.31498/2225-6733.48.2024.310669Keywords:
data warehouse, relational data warehouse, Data Lake, Polyglot Persistence, Apache Iceberg, Apache ParquetAbstract
The paper analyses various aspects of the use of relational and cloud data warehouses as well as methods of integrating ETL and ELT data. A comparative analysis of these approaches, their advantages and disadvantages are provided. A central relational data warehouse is proposed that provides a single version of truth (SVOT), which allows standardising and structuring data, avoiding differences and providing the access to the same information for all users of an organisation. It is analysed the methodological approaches to implementing a data warehouse: top-down, bottom-up, and from middle. It is described cloud data warehouses that use cloud technologies to provide scalability, availability and fault tolerance, which is important for the companies with huge amounts of data. The advantages and disadvantages of ETL and ELT are analysed: ETL transforms data before it is loaded into the warehouse, which makes it easier to maintain data confidentiality. ELT performs transformation after loading, which allows for more flexible data processing directly in the warehouse. In the article, we deal with the approaches to implementing a data warehouse: top-down is suitable for strategic planning, bottom-up allows for faster results, and the middle approach combines both methods to achieve optimal efficiency. We considered cloud data storage: compared to relational storage, cloud storage is more flexible, scalable and efficient, providing speed and reducing infrastructure costs. It is described cloud storage architectures: massive parallel processing, hybrid architectures, lambda architectures, and multi-structured architectures. They provide high performance and flexibility in data processing. It is described data storage technologies: Data Lake, Polyglot Persistence, Apache Iceberg, Apache Parquet, and columnar databases that provide efficient storage and processing of large amounts of data
References
Rehman K. U., Ahmad U., Mahmood S. A comparative analysis of traditional and cloud data warehouse. VAWKUM Transactions on Computer Sciences. 2018. Vol. 6(1). Pp. 34-40. DOI: https://doi.org/10.21015/vtcs.v15i1.487.
Migrating a research data warehouse to a public cloud: challenges and opportunities / M. G. Kahn et al. Journal of the American Medical Informatics Association. 2022. Vol. 29(4). Pp. 592-600. DOI: https://doi.org/10.1093/jamia/ocab278.
Verma H. Data-warehousing on cloud computing. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET). 2013. Vol. 2. Iss. 2. Pp. 411-416.
Integration methods and advantages of machine learning with cloud data warehouses / H. Li et al. International Journal of Computer Science and Information Technology. 2024. Vol. 2(1). Pp. 348-358. DOI: https://doi.org/10.62051/ijcsit.v2n1.36.
A Data Warehouse Approach for Business Intelligence / G. Garani, A. Chernov, I. Savvas, M. Butakova. Infrastructure for Collaborative Enterprises (WETICE): IEEE 28th International Conference on Enabling Technologies, Napoli, Italy, 12-14 June, 2019. Pp. 70-75. DOI: https://doi.org/10.1109/WETICE.2019.00022.
Sina A. Optimizing data warehousing performance through machine learning algorithms in the cloud. International Journal of Science and Research (IJSR). 2023. Vol. 12(12). Pp. 1859-1867. DOI: https://dx.doi.org/10.21275/SR231224074241.
Heinonen J. From classical DW to cloud data warehouse : Masters Thesis. Helsinki, 2020. 79 p.
Kawthar K., Nabli A., Gargouri F. Privacy and availability in cloud data warehouse. Proceedings of the 10th International Conference on Education Technology and Computers. 2018. Pp. 388-391. DOI: https://doi.org/10.1145/3290511.3290580.
Cloud Data Warehouse. URL: https://www.qlik.com/us/cloud-data-migration/cloud-data-warehouse (дата звернення: 10.06.2023).
Data Set: Amazon Books Reviews. URL: https://www.kaggle.com/datasets/mohamedbakhet/amazon-books-reviews (дата звернення: 12.06.2023).
Deep Dive into AWS DynamoDB: A NoSQL Database for High-Performance Applications. URL: https://medium.com/@christopheradamson253/deep-dive-into-aws-dynamodb-a-nosql-database-for-high-performance-applications-4c80d1410533 (дата звернення: 01.08.2023).
PostgreSQL. URL: https://kinsta.com/knowledgebase/what-is-postgresql/ (дата звернення: 12.09.2023).
What is extract, load, transform (ELT). URL: https://www.ibm.com/topics/elt (дата звернення: 10.12.2023).
Column databases. URL: https://www.tinybird.co/blog-posts/what-is-a-columnar-database (дата звернення: 08.01.2024).
What is an MPP Database. URL: https://airbyte.com/data-engineering-resources/mpp-database (дата звернення: 13.11.2023).
Exploring the Basics of Amazon Simple Storage Service (S3). URL: https://medium.com/@dbrandonbawe/exploring-the-basics-of-amazon-simple-storage-service-s3-f8ad2af0a6f9 (дата звернення: 10.11.2023).
What is a Cloud Data Warehouse. URL: https://www.astera.com/type/blog/cloud-data-warehouse/ (дата звернення: 10.11.2023).
The Lambda Architecture: A Hybrid Approach to Data Processing. URL: https://www.linkedin.com/pulse/lambda-architecture-hybrid-approach-data-processing-midhun-pottammal (дата звернення: 10.01.2024).
Introduction to Data Lakes. URL: https://www.databricks.com/discover/data-lakes (дата звернен-ня: 11.01.2024).
What is Polyglot Persistence. URL: https://www.harperdb.io/post/what-is-polyglot-persistence-and-why-is-it-awful (дата звернення: 01.02.2024).
Exploring the Benefits of Hybrid Architecture in Data Warehousing. URL: https://reconfigured.io/blog/exploring-benefits-of-hybrid-architecture-in-data-warehousing (дата звернення: 30.11.2023).
The Apache Iceberg Open Table Format. URL: https://www.dremio.com/resources/guides/apache-iceberg/ (дата звернення: 28.11.2023).
Parquet. URL: https://www.databricks.com/glossary/what-is-parquet (дата звернення: 02.09.2023).
ETL Process in Data Warehouse. URL: https://www.geeksforgeeks.org/etl-process-in-data-warehouse/ (дата звернення: 10.10.2023).
Downloads
Published
How to Cite
Issue
Section
License
The journal «Reporter of the Priazovskyi State Technical University. Section: Technical sciences» is published under the CC BY license (Attribution License).
This license allows for the distribution, editing, modification, and use of the work as a basis for derivative works, even for commercial purposes, provided that proper attribution is given. It is the most flexible of all available licenses and is recommended for maximum dissemination and use of non-restricted materials.
Authors who publish in this journal agree to the following terms:
1. Authors retain the copyright of their work and grant the journal the right of first publication under the terms of the Creative Commons Attribution License (CC BY). This license allows others to freely distribute the published work, provided that proper attribution is given to the original authors and the first publication of the work in this journal is acknowledged.
2. Authors are allowed to enter into separate, additional agreements for non-exclusive distribution of the work in the same form as published in this journal (e.g., depositing it in an institutional repository or including it in a monograph), provided that a reference to the first publication in this journal is maintained.







