Development of a standardized approach for evaluating business insights in stream processing systems based on technical metrics

Authors

DOI:

https://doi.org/10.15587/2706-5448.2025.325717

Keywords:

benchmarking, distributed systems, performance measurement, SLO (service level objectives), real-time processing

Abstract

The object of research is the benchmarking process of stream processing frameworks, specifically evaluating the impact of Service Level Objectives (SLOs) in real-time data processing systems.

One of the most problematic aspects is the lack of standardization in SLO definitions, which leads to inconsistencies between technical performance indicators (latency, throughput) and business objectives. Additionally, existing benchmarking methodologies primarily assess technical metrics without considering their business relevance.

In the course of the study, experimental methods were used to analyze the relationship between latency and throughput under varying load conditions. A series of experiments were conducted with a Kafka Streams-based stream processing setup, modifying workload parameters and resource constraints.

The results obtained demonstrate the nonlinear relationship between latency and throughput. Increasing event rates can either enhance or degrade performance depending on resource constraints and Kafka Streams' commit interval settings. The findings demonstrate that under stable conditions, latency decreases from 21 s to 6.2 s while throughput increases from 0.6 ops/sec to 72 ops/sec. When computational bottlenecks are introduced, latency spikes to 349 s and throughput drops to 32 ops/sec, highlighting performance degradation. Conversely, distributed processing reduces latency to 11 s and increases throughput to 169.9 ops/sec. While higher loads generally improve throughput, excessive processing delays can unexpectedly reduce it due to resource contention.

These insights provide a foundation for dynamic SLO adjustments to optimize real-time data processing efficiency. The presented approach helps to avoid generalized and inefficient methods for measuring the performance of stream processing frameworks.

Author Biographies

Artem Bashtovyi, Lviv Polytechnic National University

PhD Student, Assistant

Department of Software

Andrii Fechan, Lviv Polytechnic National University

Doctor of Technical Sciences, Professor

Department of Software

References

  1. Tantalaki, N., Souravlas, S., Roumeliotis, M. (2019). A review on big data real-time stream processing and its scheduling techniques. International Journal of Parallel, Emergent and Distributed Systems, 35 (5), 571–601. https://doi.org/10.1080/17445760.2019.1585848
  2. Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I. (2013). Discretized streams: fault-tolerant streaming computation at scale. Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, 423–438. https://doi.org/10.1145/2517349.2522737
  3. Noghabi, S. A., Paramasivam, K., Pan, Y., Ramesh, N., Bringhurst, J., Gupta, I., Campbell, R. H. (2017). Samza: Stateful scalable stream processing at LinkedIn. Proceedings of the VLDB Endowment, 10 (12), 1634–1645. https://doi.org/10.14778/3137765.3137770
  4. Saxena, S., Gupta, S. (2017). Practical real-time data processing and analytics: distributed computing and event processing using Apache Spark, Flink, Storm, and Kafka. Packt Publishing Ltd., 360.
  5. Raptis, T. P., Passarella, A. (2023). A Survey on Networked Data Streaming With Apache Kafka. IEEE Access, 11, 85333–85350. https://doi.org/10.1109/access.2023.3303810
  6. Dias de Assunção, M., da Silva Veith, A., Buyya, R. (2018). Distributed data stream processing and edge computing: A survey on resource elasticity and future directions. Journal of Network and Computer Applications, 103, 1–17. https://doi.org/10.1016/j.jnca.2017.12.001
  7. Kalim, F. (2020). Satisfying service level objectives in stream processing systems. [Doctoral dissertation; University of Illinois at Urbana-Champaign]. Available at: https://www.ideals.illinois.edu/items/116227
  8. Benchmarking Streaming Computation Engines at Yahoo! Yahoo. Available at: http://yahooeng.tumblr.com/post/135321837876/benchmarkingstreaming-computation-engines-at
  9. Wang, Y., Boissier, M., Rabl, T. (2024). A survey of stream processing system benchmarks. Proceedings of the 16th TPC Technology Conference on Performance Evaluation and Benchmarking (TPCTC 2024). Guangzhou.
  10. Lu, R., Wu, G., Xie, B., Hu, J. (2014). Stream Bench: Towards Benchmarking Modern Distributed Stream Computing Frameworks. 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, 69–78. https://doi.org/10.1109/ucc.2014.15
  11. van Dongen, G., Poel, D. V. D. (2021). A Performance Analysis of Fault Recovery in Stream Processing Frameworks. IEEE Access, 9, 93745–93763. https://doi.org/10.1109/access.2021.3093208
  12. Henning, S., Vogel, A., Leichtfried, M., Ertl, O., Rabiser, R. (2024). ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks. Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering, 2–13. https://doi.org/10.1145/3629526.3645036
  13. Hesse, G., Matthies, C., Perscheid, M., Uflacker, M., Plattner, H. (2021). ESPBench: The Enterprise Stream Processing Benchmark. Proceedings of the ACM/SPEC International Conference on Performance Engineering, 201–212. https://doi.org/10.1145/3427921.3450242
  14. Xu, H., Liu, P., Ahmed, S. T., Da Silva, D., Hu, L. (2023). Adaptive Fragment-Based Parallel State Recovery for Stream Processing Systems. IEEE Transactions on Parallel and Distributed Systems, 34 (8), 2464–2478. https://doi.org/10.1109/tpds.2023.3251997
  15. Dongen, G. V. (2021). Open stream processing benchmark: An extensive analysis of distributed stream processing frameworks. [Doctoral dissertation; Ghent University].
  16. van Dongen, G., Steurtewagen, B., Van den Poel, D. (2018). Latency Measurement of Fine-Grained Operations in Benchmarking Distributed Stream Processing Frameworks. 2018 IEEE International Congress on Big Data (BigData Congress). San Francisco, 247–250. https://doi.org/10.1109/bigdatacongress.2018.00043
  17. Jayasekara, S., Karunasekera, S., Harwood, A. (2021). Optimizing checkpoint‐based fault‐tolerance in distributed stream processing systems: Theory to practice. Software: Practice and Experience, 52 (1), 296–315. https://doi.org/10.1002/spe.3021
  18. Henning, S., Hasselbring, W. (2021). Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines in Microservice Architectures. Big Data Research, 25, 100209. https://doi.org/10.1016/j.bdr.2021.100209
  19. Kavuri, S., Narne, S. (2020). Implementing Effective SLO Monitoring in High-Volume Data Processing Systems. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 5 (6), 558–578. https://doi.org/10.32628/cseit206479
  20. Kalim, F., Xu, L., Bathey, S., Meherwal, R., Gupta, I. (2018). Henge: Intent-driven multi-tenant stream processing. Proceedings of the ACM Symposium on Cloud Computing, 249–262. https://doi.org/10.1145/3267809.3267832
  21. Griebler, D., Vogel, A., De Sensi, D., Danelutto, M., Fernandes, L. G. (2020). Simplifying and implementing service level objectives for stream parallelism. The Journal of Supercomputing, 76 (6), 4603–4628. https://doi.org/10.1007/s11227-019-02914-6
  22. Kayser, C., Dias de Assunção, M., Ferreto, T. (2024). Lapse: Latency & Power-Aware Placement of Data Stream Applications on Edge Computing. Proceedings of the 14th International Conference on Cloud Computing and Services Science. SciTePress, 358–366. https://doi.org/10.5220/0012737400003711
Development of a standardized approach for evaluating business insights in stream processing systems based on technical metrics

Downloads

Published

2025-03-31

How to Cite

Bashtovyi, A., & Fechan, A. (2025). Development of a standardized approach for evaluating business insights in stream processing systems based on technical metrics. Technology Audit and Production Reserves, 2(2(82), 15–20. https://doi.org/10.15587/2706-5448.2025.325717

Issue

Section

Information Technologies