Development of a standardized approach for evaluating business insights in stream processing systems based on technical metrics
DOI:
https://doi.org/10.15587/2706-5448.2025.325717Keywords:
benchmarking, distributed systems, performance measurement, SLO (service level objectives), real-time processingAbstract
The object of research is the benchmarking process of stream processing frameworks, specifically evaluating the impact of Service Level Objectives (SLOs) in real-time data processing systems.
One of the most problematic aspects is the lack of standardization in SLO definitions, which leads to inconsistencies between technical performance indicators (latency, throughput) and business objectives. Additionally, existing benchmarking methodologies primarily assess technical metrics without considering their business relevance.
In the course of the study, experimental methods were used to analyze the relationship between latency and throughput under varying load conditions. A series of experiments were conducted with a Kafka Streams-based stream processing setup, modifying workload parameters and resource constraints.
The results obtained demonstrate the nonlinear relationship between latency and throughput. Increasing event rates can either enhance or degrade performance depending on resource constraints and Kafka Streams' commit interval settings. The findings demonstrate that under stable conditions, latency decreases from 21 s to 6.2 s while throughput increases from 0.6 ops/sec to 72 ops/sec. When computational bottlenecks are introduced, latency spikes to 349 s and throughput drops to 32 ops/sec, highlighting performance degradation. Conversely, distributed processing reduces latency to 11 s and increases throughput to 169.9 ops/sec. While higher loads generally improve throughput, excessive processing delays can unexpectedly reduce it due to resource contention.
These insights provide a foundation for dynamic SLO adjustments to optimize real-time data processing efficiency. The presented approach helps to avoid generalized and inefficient methods for measuring the performance of stream processing frameworks.
References
- Tantalaki, N., Souravlas, S., Roumeliotis, M. (2019). A review on big data real-time stream processing and its scheduling techniques. International Journal of Parallel, Emergent and Distributed Systems, 35 (5), 571–601. https://doi.org/10.1080/17445760.2019.1585848
- Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I. (2013). Discretized streams: fault-tolerant streaming computation at scale. Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, 423–438. https://doi.org/10.1145/2517349.2522737
- Noghabi, S. A., Paramasivam, K., Pan, Y., Ramesh, N., Bringhurst, J., Gupta, I., Campbell, R. H. (2017). Samza: Stateful scalable stream processing at LinkedIn. Proceedings of the VLDB Endowment, 10 (12), 1634–1645. https://doi.org/10.14778/3137765.3137770
- Saxena, S., Gupta, S. (2017). Practical real-time data processing and analytics: distributed computing and event processing using Apache Spark, Flink, Storm, and Kafka. Packt Publishing Ltd., 360.
- Raptis, T. P., Passarella, A. (2023). A Survey on Networked Data Streaming With Apache Kafka. IEEE Access, 11, 85333–85350. https://doi.org/10.1109/access.2023.3303810
- Dias de Assunção, M., da Silva Veith, A., Buyya, R. (2018). Distributed data stream processing and edge computing: A survey on resource elasticity and future directions. Journal of Network and Computer Applications, 103, 1–17. https://doi.org/10.1016/j.jnca.2017.12.001
- Kalim, F. (2020). Satisfying service level objectives in stream processing systems. [Doctoral dissertation; University of Illinois at Urbana-Champaign]. Available at: https://www.ideals.illinois.edu/items/116227
- Benchmarking Streaming Computation Engines at Yahoo! Yahoo. Available at: http://yahooeng.tumblr.com/post/135321837876/benchmarkingstreaming-computation-engines-at
- Wang, Y., Boissier, M., Rabl, T. (2024). A survey of stream processing system benchmarks. Proceedings of the 16th TPC Technology Conference on Performance Evaluation and Benchmarking (TPCTC 2024). Guangzhou.
- Lu, R., Wu, G., Xie, B., Hu, J. (2014). Stream Bench: Towards Benchmarking Modern Distributed Stream Computing Frameworks. 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, 69–78. https://doi.org/10.1109/ucc.2014.15
- van Dongen, G., Poel, D. V. D. (2021). A Performance Analysis of Fault Recovery in Stream Processing Frameworks. IEEE Access, 9, 93745–93763. https://doi.org/10.1109/access.2021.3093208
- Henning, S., Vogel, A., Leichtfried, M., Ertl, O., Rabiser, R. (2024). ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks. Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering, 2–13. https://doi.org/10.1145/3629526.3645036
- Hesse, G., Matthies, C., Perscheid, M., Uflacker, M., Plattner, H. (2021). ESPBench: The Enterprise Stream Processing Benchmark. Proceedings of the ACM/SPEC International Conference on Performance Engineering, 201–212. https://doi.org/10.1145/3427921.3450242
- Xu, H., Liu, P., Ahmed, S. T., Da Silva, D., Hu, L. (2023). Adaptive Fragment-Based Parallel State Recovery for Stream Processing Systems. IEEE Transactions on Parallel and Distributed Systems, 34 (8), 2464–2478. https://doi.org/10.1109/tpds.2023.3251997
- Dongen, G. V. (2021). Open stream processing benchmark: An extensive analysis of distributed stream processing frameworks. [Doctoral dissertation; Ghent University].
- van Dongen, G., Steurtewagen, B., Van den Poel, D. (2018). Latency Measurement of Fine-Grained Operations in Benchmarking Distributed Stream Processing Frameworks. 2018 IEEE International Congress on Big Data (BigData Congress). San Francisco, 247–250. https://doi.org/10.1109/bigdatacongress.2018.00043
- Jayasekara, S., Karunasekera, S., Harwood, A. (2021). Optimizing checkpoint‐based fault‐tolerance in distributed stream processing systems: Theory to practice. Software: Practice and Experience, 52 (1), 296–315. https://doi.org/10.1002/spe.3021
- Henning, S., Hasselbring, W. (2021). Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines in Microservice Architectures. Big Data Research, 25, 100209. https://doi.org/10.1016/j.bdr.2021.100209
- Kavuri, S., Narne, S. (2020). Implementing Effective SLO Monitoring in High-Volume Data Processing Systems. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 5 (6), 558–578. https://doi.org/10.32628/cseit206479
- Kalim, F., Xu, L., Bathey, S., Meherwal, R., Gupta, I. (2018). Henge: Intent-driven multi-tenant stream processing. Proceedings of the ACM Symposium on Cloud Computing, 249–262. https://doi.org/10.1145/3267809.3267832
- Griebler, D., Vogel, A., De Sensi, D., Danelutto, M., Fernandes, L. G. (2020). Simplifying and implementing service level objectives for stream parallelism. The Journal of Supercomputing, 76 (6), 4603–4628. https://doi.org/10.1007/s11227-019-02914-6
- Kayser, C., Dias de Assunção, M., Ferreto, T. (2024). Lapse: Latency & Power-Aware Placement of Data Stream Applications on Edge Computing. Proceedings of the 14th International Conference on Cloud Computing and Services Science. SciTePress, 358–366. https://doi.org/10.5220/0012737400003711
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Artem Bashtovyi, Andrii Fechan

This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.



