Development of an adaptive data provenance control method for resource-aware real-time stream processing

Authors

DOI:

https://doi.org/10.15587/2706-5448.2026.362294

Keywords:

stream processing, data provenance, Apache Flink, distributed systems

Abstract

The object of research is the process of managing data provenance in real-time stream processing systems. The subject of research is an adaptive data provenance method for resource-aware stream processing in Apache Flink, which changes provenance granularity at runtime without affecting business results.

The problem addressed is how to maintain explainable provenance for anomaly detection or auditing while avoiding system resource overload from always-on, fine-grained data provenance.

The significance of the results obtained is a provenance level controller driven by live CPU and heap memory usage metrics. Its scientific novelty lies in a three-level logic (detailed, summary, none) that dynamically adjusts the provenance level without interrupting the pipeline. Performance was evaluated on 5000 simulated IoT sensor workloads with 60-second processing windows and a target of 15000 records per second under 3-, 15-, and 30-minute runs.

These results indicate that the suggested method maintains throughput close to the no-provenance baseline while significantly reducing CPU cost compared to full provenance. For equal workloads, adaptive runs consume roughly 34 times the baseline CPU, whereas full provenance requires about 69 times the baseline CPU, because the controller demotes provenance under pressure and restores detailed mode only when resources are recovered. Heap usage is a weaker control signal because it reflects only part of the total memory and can change when the runtime periodically frees unused memory.

The proposed method is most useful during anomalies, incidents, or audits where selective, on-demand provenance is needed. This adaptability makes it highly suitable for IoT gateways, edge devices, and regulated industries operating under strict resource constraints.

Author Biographies

Roman Moravskyi, Lviv Polytechnic National University

PhD Student, Assistant

Department of Software

Yevheniya Levus, Lviv Polytechnic National University

Candidate of Technical Sciences, Associate Professor

Department of Software

References

  1. Herschel, M., Diestelkämper, R., Ben Lahmar, H. (2017). A survey on provenance: What for? What form? What from? The VLDB Journal, 26 (6), 881–906. https://doi.org/10.1007/s00778-017-0486-1
  2. Bashtovyi, A., Fechan, A. (2025). Development of a standardized approach for evaluating business insights in stream processing systems based on technical metrics. Technology Audit and Production Reserves, 2 (2 (82)), 15–20. https://doi.org/10.15587/2706-5448.2025.325717
  3. Palyvos-Giannas, D., Gulisano, V., Papatriantafilou, M. (2018). GeneaLog: Fine-Grained Data Streaming Provenance at the Edge. Proceedings of the 19th International Middleware Conference, 227–238. https://doi.org/10.1145/3274808.3274826
  4. Palyvos-Giannas, D., Havers, B., Papatriantafilou, M., Gulisano, V. (2020). Ananke: a streaming framework for live forward provenance. Proceedings of the VLDB Endowment, 14 (3), 391–403. https://doi.org/10.14778/3430915.3430928
  5. Gordani Shahri, M., Erlandsson, A., Palyvos-Giannas, D., Gulisano, V. (2021). Poster: Twins, a Middleware for Adaptive Streaming Provenance at the Edge. Proceedings of the 22nd International Conference on Distributed Computing and Networking. New York, 235–236. https://doi.org/10.1145/3427796.3433931
  6. Wang, L., Shen, X., Li, W., LI, Z., Sekar, R., Liu, H., Chen, Y. (2025). Incorporating Gradients to Rules: Towards Lightweight, Adaptive Provenance-based Intrusion Detection. Proceedings 2025 Network and Distributed System Security Symposium. https://doi.org/10.14722/ndss.2025.230822
  7. Fernando, L., Kim, T., Daudjee, K., Rabl, T. (2025). Enjima: A Resource-Adaptive Stream Processing System. Proceedings of the ACM on Management of Data, 3 (6), 1–27. https://doi.org/10.1145/3769790
  8. Moravskyi, R., Levus, Y. (2024). Analysis of Real-time Processing Approaches for Large Data Volumes in Metering Infrastructure. Journal of Lviv Polytechnic National University: Information Systems and Networks, 15, 169–183. https://doi.org/10.23939/sisn2024.15.169
  9. Mohamed, Z. (2023). Data streaming provenance in advanced metering infrastructures. [Master’s thesis; Chalmers University of Technology]. Available at: https://hdl.handle.net/2077/79292
  10. Taube, J., Johnsson, W. (2022). Streaming analytics with provenance in the advanced metering infrastructure. [Master’s thesis; Chalmers University of Technology]. Available at: https://odr.chalmers.se/handle/20.500.12380/305852
  11. Ye, Q., Lu, M. (2021). s2p: Provenance Research for Stream Processing System. Applied Sciences, 11 (12), 5523. https://doi.org/10.3390/app11125523
  12. Räth, T., Schlegel, M., Sattler, K.-U. (2024). Everything Everyway All at Once – Time Traveling Debugging for Stream Processing Applications. 2024 IEEE 40th International Conference on Data Engineering (ICDE), 1606–1618. https://doi.org/10.1109/icde60146.2024.00131
  13. Goyal, A., Liu, J., Bates, A., Wang, G. (2024). ORCHID: Streaming threat detection over versioned provenance graphs. arXiv:2408.13347v1. https://doi.org/10.48550/arXiv.2408.13347
Development of an adaptive data provenance control method for resource-aware real-time stream processing

Downloads

Published

2026-05-29

How to Cite

Moravskyi, R., & Levus, Y. (2026). Development of an adaptive data provenance control method for resource-aware real-time stream processing. Technology Audit and Production Reserves, 3(2(89), 60–65. https://doi.org/10.15587/2706-5448.2026.362294

Issue

Section

Information Technologies