Development of an adaptive data provenance control method for resource-aware real-time stream processing
DOI:
https://doi.org/10.15587/2706-5448.2026.362294Keywords:
stream processing, data provenance, Apache Flink, distributed systemsAbstract
The object of research is the process of managing data provenance in real-time stream processing systems. The subject of research is an adaptive data provenance method for resource-aware stream processing in Apache Flink, which changes provenance granularity at runtime without affecting business results.
The problem addressed is how to maintain explainable provenance for anomaly detection or auditing while avoiding system resource overload from always-on, fine-grained data provenance.
The significance of the results obtained is a provenance level controller driven by live CPU and heap memory usage metrics. Its scientific novelty lies in a three-level logic (detailed, summary, none) that dynamically adjusts the provenance level without interrupting the pipeline. Performance was evaluated on 5000 simulated IoT sensor workloads with 60-second processing windows and a target of 15000 records per second under 3-, 15-, and 30-minute runs.
These results indicate that the suggested method maintains throughput close to the no-provenance baseline while significantly reducing CPU cost compared to full provenance. For equal workloads, adaptive runs consume roughly 3–4 times the baseline CPU, whereas full provenance requires about 6–9 times the baseline CPU, because the controller demotes provenance under pressure and restores detailed mode only when resources are recovered. Heap usage is a weaker control signal because it reflects only part of the total memory and can change when the runtime periodically frees unused memory.
The proposed method is most useful during anomalies, incidents, or audits where selective, on-demand provenance is needed. This adaptability makes it highly suitable for IoT gateways, edge devices, and regulated industries operating under strict resource constraints.
References
- Herschel, M., Diestelkämper, R., Ben Lahmar, H. (2017). A survey on provenance: What for? What form? What from? The VLDB Journal, 26 (6), 881–906. https://doi.org/10.1007/s00778-017-0486-1
- Bashtovyi, A., Fechan, A. (2025). Development of a standardized approach for evaluating business insights in stream processing systems based on technical metrics. Technology Audit and Production Reserves, 2 (2 (82)), 15–20. https://doi.org/10.15587/2706-5448.2025.325717
- Palyvos-Giannas, D., Gulisano, V., Papatriantafilou, M. (2018). GeneaLog: Fine-Grained Data Streaming Provenance at the Edge. Proceedings of the 19th International Middleware Conference, 227–238. https://doi.org/10.1145/3274808.3274826
- Palyvos-Giannas, D., Havers, B., Papatriantafilou, M., Gulisano, V. (2020). Ananke: a streaming framework for live forward provenance. Proceedings of the VLDB Endowment, 14 (3), 391–403. https://doi.org/10.14778/3430915.3430928
- Gordani Shahri, M., Erlandsson, A., Palyvos-Giannas, D., Gulisano, V. (2021). Poster: Twins, a Middleware for Adaptive Streaming Provenance at the Edge. Proceedings of the 22nd International Conference on Distributed Computing and Networking. New York, 235–236. https://doi.org/10.1145/3427796.3433931
- Wang, L., Shen, X., Li, W., LI, Z., Sekar, R., Liu, H., Chen, Y. (2025). Incorporating Gradients to Rules: Towards Lightweight, Adaptive Provenance-based Intrusion Detection. Proceedings 2025 Network and Distributed System Security Symposium. https://doi.org/10.14722/ndss.2025.230822
- Fernando, L., Kim, T., Daudjee, K., Rabl, T. (2025). Enjima: A Resource-Adaptive Stream Processing System. Proceedings of the ACM on Management of Data, 3 (6), 1–27. https://doi.org/10.1145/3769790
- Moravskyi, R., Levus, Y. (2024). Analysis of Real-time Processing Approaches for Large Data Volumes in Metering Infrastructure. Journal of Lviv Polytechnic National University: Information Systems and Networks, 15, 169–183. https://doi.org/10.23939/sisn2024.15.169
- Mohamed, Z. (2023). Data streaming provenance in advanced metering infrastructures. [Master’s thesis; Chalmers University of Technology]. Available at: https://hdl.handle.net/2077/79292
- Taube, J., Johnsson, W. (2022). Streaming analytics with provenance in the advanced metering infrastructure. [Master’s thesis; Chalmers University of Technology]. Available at: https://odr.chalmers.se/handle/20.500.12380/305852
- Ye, Q., Lu, M. (2021). s2p: Provenance Research for Stream Processing System. Applied Sciences, 11 (12), 5523. https://doi.org/10.3390/app11125523
- Räth, T., Schlegel, M., Sattler, K.-U. (2024). Everything Everyway All at Once – Time Traveling Debugging for Stream Processing Applications. 2024 IEEE 40th International Conference on Data Engineering (ICDE), 1606–1618. https://doi.org/10.1109/icde60146.2024.00131
- Goyal, A., Liu, J., Bates, A., Wang, G. (2024). ORCHID: Streaming threat detection over versioned provenance graphs. arXiv:2408.13347v1. https://doi.org/10.48550/arXiv.2408.13347
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Roman Moravskyi, Yevheniya Levus

This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.



