Hybrid imputation of biomedical data by using transformers and autoencoders for assessing human biological age

Authors

DOI:

https://doi.org/10.15587/1729-4061.2025.340325

Keywords:

data imputation, composite architecture, deep learning, functional age, PhenoAge, NHANES

Abstract

This study investigates the process of restoring missing biomedical and social data for human biological age assessment. The principal challenge is the high rate of missing values in datasets, notably NHANES – up to 40%. This complicates accurate health prediction and reduces the effectiveness of preventive interventions.

To address this issue, deep learning methods, specifically autoencoders and transformers, were employed. The autoencoder provided fast imputation (37.4 s, MAE = 7.54) but less accuracy. The transformer achieved the highest accuracy (246.3 s, MAE = 1.10) yet required substantial resources and showed overfitting risks.

A hybrid architecture has been proposed to combine the advantages of both approaches. On the NHANES dataset (55,081 records and 84 biomarkers), the model demonstrated an optimal balance (54.2 s, MAE = 5.26) and stability with up to 50% missing data. Compared to mean-value imputation, the accuracy of biological age estimation improved by 25%. The coefficient of determination reached 0.9875, and root mean squared error was 35.9, confirming strong consistency of the restored values. Sensitivity analysis revealed stable accuracy up to 55% missing data, after which degradation occurred.

A unique feature of the hybrid approach is the combination of high accuracy with moderate computational cost. This makes the model suitable for medical information systems with incomplete datasets. Practical applications include preventive medicine, biological aging monitoring, and risk group identification.

In the Ukrainian context, the model could enhance biomedical research and digital healthcare while also serving as a foundation for bioinformatics and life expectancy studies

Author Biographies

Volodymyr Slipchenko, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”

Doctor of Technical Sciences, Professor

Department of Digital Technologies in Energy

Liubov Poliahushko, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”

PhD, Associate Professor

Department of Digital Technologies in Energy

Oleksandr Volkov, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”

Department of Digital Technologies in Energy

Vladyslav Shatylo, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”

Department of Digital Technologies in Energy

References

  1. Poliahushko, L., Volkov, O. (2024). Socioeconomic influence on biological age: an overview of current studies and role of artificial intelligence. Telecommunication and information technologies, 3 (84), 120–130. https://doi.org/10.31673/2412-4338.2024.03041234
  2. Lau, D. T., Ahluwalia, N., Fryar, C. D., Kaufman, M., Arispe, I. E., Paulose-Ram, R. (2023). Data Related to Social Determinants of Health Captured in the National Health and Nutrition Examination Survey. American Journal of Public Health, 113 (12), 1290–1295. https://doi.org/10.2105/ajph.2023.307490
  3. Kowsar, I., Rabbani, S. B., Samad, M. D. (2024). Attention-Based Imputation of Missing Values in Electronic Health Records Tabular Data. 2024 IEEE 12th International Conference on Healthcare Informatics (ICHI), 177–182. https://doi.org/10.1109/ichi61247.2024.00030
  4. Casella, M., Milano, N., Dolce, P., Marocco, D. (2024). Transformers deep learning models for missing data imputation: an application of the ReMasker model on a psychometric scale. Frontiers in Psychology, 15. https://doi.org/10.3389/fpsyg.2024.1449272
  5. Lim, D. K., Rashid, N. U., Oliva, J. B., Ibrahim, J. G. (2024). Unsupervised Imputation of Non-Ignorably Missing Data Using Importance-Weighted Autoencoders. Statistics in Biopharmaceutical Research, 17 (2), 222–234. https://doi.org/10.1080/19466315.2024.2368787
  6. Horvath, S. (2013). DNA methylation age of human tissues and cell types. Genome Biology, 14 (10). https://doi.org/10.1186/gb-2013-14-10-r115
  7. Levine, M. E., Lu, A. T., Quach, A., Chen, B. H., Assimes, T. L., Bandinelli, S. et al. (2018). An epigenetic biomarker of aging for lifespan and healthspan. Aging, 10 (4), 573–591. https://doi.org/10.18632/aging.101414
  8. Aracri, F., Bianco, M. G., Quattrone, A., Sarica, A. (2025). Bridging the Gap: Missing Data Imputation Methods and Their Effect on Dementia Classification Performance. Brain Sciences, 15 (6), 639. https://doi.org/10.3390/brainsci15060639
  9. Altamimi, A., Alarfaj, A. A., Umer, M., Alabdulqader, E. A., Alsubai, S., Kim, T., Ashraf, I. (2024). An automated approach to predict diabetic patients using KNN imputation and effective data mining techniques. BMC Medical Research Methodology, 24 (1). https://doi.org/10.1186/s12874-024-02324-0
  10. Madley-Dowd, P., Curnow, E., Hughes, R. A., Cornish, R. P., Tilling, K., Heron, J. (2024). Analyses using multiple imputation need to consider missing data in auxiliary variables. American Journal of Epidemiology, 194 (6), 1756–1763. https://doi.org/10.1093/aje/kwae306
  11. Beaulieu-Jones, B. K., Moore, J. H. (2017). Missing data imputation in the electronic health record using deeply learned autoencoders. Biocomputing 2017, 207–218. https://doi.org/10.1142/9789813207813_0021
  12. Gondara, L., Wang, K. (2018). MIDA: Multiple Imputation Using Denoising Autoencoders. Advances in Knowledge Discovery and Data Mining, 260–272. https://doi.org/10.1007/978-3-319-93040-4_21
  13. Li, Y., Rao, S., Solares, J. R. A., Hassaine, A., Ramakrishnan, R., Canoy, D. et al. (2020). BEHRT: Transformer for Electronic Health Records. Scientific Reports, 10 (1). https://doi.org/10.1038/s41598-020-62922-y
  14. Khan, M. A. (2024). A Comparative Study on Imputation Techniques: Introducing a Transformer Model for Robust and Efficient Handling of Missing EEG Amplitude Data. Bioengineering, 11 (8), 740. https://doi.org/10.3390/bioengineering11080740
  15. He, S., Grant, P. E., Ou, Y. (2022). Global-Local Transformer for Brain Age Estimation. IEEE Transactions on Medical Imaging, 41 (1), 213–224. https://doi.org/10.1109/tmi.2021.3108910
  16. Urban, A., Sidorenko, D., Zagirova, D., Kozlova, E., Kalashnikov, A., Pushkov, S. et al. (2023). Precious1GPT: multimodal transformer-based transfer learning for aging clock development and feature importance analysis for aging and age-related disease target discovery. Aging. https://doi.org/10.18632/aging.204788
  17. Wang, X., Chen, H., Zhang, J., Fan, J. (2024). Generative adversarial learning for missing data imputation. Neural Computing and Applications, 37 (3), 1403–1416. https://doi.org/10.1007/s00521-024-10652-x
  18. Hong, S., Lynn, H. S. (2020). Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction. BMC Medical Research Methodology, 20 (1). https://doi.org/10.1186/s12874-020-01080-1
  19. Zhou, Y.-H., Saghapour, E. (2021). ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data. Frontiers in Genetics, 12. https://doi.org/10.3389/fgene.2021.691274
  20. Bae, C.-Y., Im, Y., Lee, J., Park, C.-S., Kim, M., Kwon, H. et al. (2021). Comparison of Biological Age Prediction Models Using Clinical Biomarkers Commonly Measured in Clinical Practice Settings: AI Techniques Vs. Traditional Statistical Methods. Frontiers in Analytical Science, 1. https://doi.org/10.3389/frans.2021.709589
  21. United States Department of Health and Human Services. Centers for Disease Control and Prevention. National Center for Health Statistics. National Health and Nutrition Examination Survey (NHANES), 1999-2000 (2012). Inter-university Consortium for Political and Social Research [distributor]. https://doi.org/10.3886/icpsr25501.v4
  22. Mack, C., Su, Z., Weistreich, D. (2018). Managing Missing Data in Patient Registries. Agency for Healthcare Research and Quality (AHRQ). https://doi.org/10.23970/ahrqregistriesmissingdata
  23. Chicco, D., Warrens, M. J., Jurman, G. (2021). The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science, 7, e623. https://doi.org/10.7717/peerj-cs.623
  24. da Silva, I. N., Hernane Spatti, D., Andrade Flauzino, R., Liboni, L. H. B., dos Reis Alves, S. F. (2016). Multilayer Perceptron Networks. Artificial Neural Networks, 55–115. https://doi.org/10.1007/978-3-319-43162-8_5
  25. Jinbo, Z., Yufu, L., Haitao, M. (2025). Handling missing data of using the XGBoost-based multiple imputation by chained equations regression method. Frontiers in Artificial Intelligence, 8. https://doi.org/10.3389/frai.2025.1553220
Hybrid imputation of biomedical data by using transformers and autoencoders for assessing human biological age

Downloads

Published

2025-10-30

How to Cite

Slipchenko, V., Poliahushko, L., Volkov, O., & Shatylo, V. (2025). Hybrid imputation of biomedical data by using transformers and autoencoders for assessing human biological age. Eastern-European Journal of Enterprise Technologies, 5(4 (137), 31–40. https://doi.org/10.15587/1729-4061.2025.340325

Issue

Section

Mathematics and Cybernetics - applied aspects