Development of a parameter-efficient method for biomedical image synthesis by substituting text conditioning with pathology foundation model embeddings in latent diffusion

Sergii Kuzmin; Oleh Berezsky

doi:10.15587/2706-5448.2026.355663

Authors

Sergii Kuzmin Lviv Polytechnic National University, Ukraine https://orcid.org/0009-0001-7182-2883
Oleh Berezsky West Ukrainian National University, Ukraine https://orcid.org/0000-0001-9931-4154

DOI:

https://doi.org/10.15587/2706-5448.2026.355663

Keywords:

latent diffusion models, pathology foundation models, histopathology image synthesis, medical image generation

Abstract

The object of research is the process of synthesizing patches of histopathological images conditioned by embeddings of the pathology foundation model. One of the key problems is that existing approaches to diffusion synthesis either rely on text conditioning via CLIP encoders, which lack morphological understanding, or require full retraining of the generative base model, which requires significant computational resources.

The research used a parameter-efficient adaptation of the previously trained latent diffusion model using low-rank adaptation (LoRA) of the U-Net attention layers in combination with a training MLP projector that reflects the embeddings of the pathology foundation model UNI2-h in the conditioning space of the cross-attention mechanism. Ablation studies of 12 configurations were conducted varying the adapter rank, the number of conditioning tokens, and the projector architecture.

It is confirmed that embeddings of the pathology foundation model can effectively replace text conditioning for the synthesis of histopathology images in a parameter-efficient mode. The optimal configuration achieved FID 77.59 on the validation set and FID 84.17 on the test set when training only 5.53 million parameters, which is 0.64% of the parameters of the base model. This is due to the fact that the proposed method has a number of characteristic features, in particular: embeddings of the pathology foundation model provide morphologically richer conditioning than CLIP-based text representations, and low rank adaptation limits the trainable space to the conditioning pathway.

This provides the possibility of generating histopathology images without text annotations and without full retraining of the model using approximately 12 GB of video memory. Compared to the previous text-conditioned approach on the same dataset, which demonstrated class-wise FID values in the range of 113 to 138, the embedding conditioning method provides significantly higher generation quality while maintaining parameter efficiency.

Author Biographies

Sergii Kuzmin, Lviv Polytechnic National University

PhD Student

Department of Automated Control Systems

Oleh Berezsky, West Ukrainian National University

Doctor of Technical Sciences, Professor

Department of Computer Engineering

References

Litjens, G., Bandi, P., Ehteshami Bejnordi, B., Geessink, O., Balkenhol, M., Bult, P. et al. (2018). 1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset. GigaScience, 7 (6). https://doi.org/10.1093/gigascience/giy065
Walsh, E., Orsi, N. M. (2024). The current troubled state of the global pathology workforce: a concise review. Diagnostic Pathology, 19 (1). https://doi.org/10.1186/s13000-024-01590-2
Guan, H., Yap, P.-T., Bozoki, A., Liu, M. (2024). Federated learning for medical image analysis: A survey. Pattern Recognition, 151, 110424. https://doi.org/10.1016/j.patcog.2024.110424
Zhang, Y., Kang, B., Hooi, B., Yan, S., Feng, J. (2023). Deep Long-Tailed Learning: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45 (9), 10795–10816. https://doi.org/10.1109/tpami.2023.3268118
Campanella, G., Hanna, M. G., Geneslaw, L., Miraflor, A., Werneck Krauss Silva, V., Busam, K. J. et al. (2019). Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature Medicine, 25 (8), 1301–1309. https://doi.org/10.1038/s41591-019-0508-1
Jose, L., Liu, S., Russo, C., Nadort, A., Di Ieva, A. (2021). Generative Adversarial Networks in Digital Pathology and Histopathological Image Processing: A Review. Journal of Pathology Informatics, 12 (1), 43. https://doi.org/10.4103/jpi.jpi_103_20
Saad, M. M., O’Reilly, R., Rehmani, M. H. (2024). A survey on training challenges in generative adversarial networks for biomedical image analysis. Artificial Intelligence Review, 57 (2). https://doi.org/10.1007/s10462-023-10624-y
Dhariwal, P., Nichol, A. (2021). Diffusion models beat GANs on image synthesis. arXiv:2105.05233. https://doi.org/10.48550/arXiv.2105.05233
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10674–10685. https://doi.org/10.1109/cvpr52688.2022.01042
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S. et al. (2021). Learning transferable visual models from natural language supervision. arXiv:2103.00020. https://doi.org/10.48550/arXiv.2103.00020
Chen, R. J., Ding, T., Lu, M. Y., Williamson, D. F. K., Jaume, G., Song, A. H. et al. (2024). Towards a general-purpose foundation model for computational pathology. Nature Medicine, 30, 850–862. https://doi.org/10.1038/s41591-024-02857-3
Yellapragada, S., Graikos, A., Prasanna, P., Kurc, T., Saltz, J., Samaras, D. (2024). PathLDM: Text conditioned Latent Diffusion Model for Histopathology. 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 5170–5179. https://doi.org/10.1109/wacv57701.2024.00510
Graikos, A., Yellapragada, S., Le, M.-Q., Kapse, S., Prasanna, P., Saltz, J., Samaras, D. (2024). Learned Representation-Guided Diffusion Models for Large-Image Generation. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 8532–8542. https://doi.org/10.1109/cvpr52733.2024.00815
Boada, J. C., Umer, R. M., Marr, C. (2025). CytoDiff: AI-Driven Cytomorphology Image Synthesis for Medical Diagnostics. 2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 1136–1144. https://doi.org/10.1109/iccvw69036.2025.00122
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S. et al. (2022). LoRA: low-rank adaptation of large language models. arXiv:2106.09685. https://doi.org/10.48550/arXiv.2106.09685
Ho, J., Jain, A., Abbeel, P. (2020). Denoising diffusion probabilistic models. Proceedings of the 34th International Conference on Neural Information Processing Systems, 34, 6840–6851. https://doi.org/10.48550/arXiv.2006.11239
Yellapragada, S., Graikos, A., Triaridis, K., Prasanna, P., Gupta, R., Saltz, J., Samaras, D. (2025). ZoomLDM: Latent Diffusion Model for multi-scale image generation. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 23453–23463. https://doi.org/10.1109/cvpr52734.2025.02184
Mao, Y., Li, H., Pang, W., Papanastasiou, G., Yang, G., Wang, C. (2024). SeLoRA: self-expanding low-rank adaptation of latent diffusion model for medical image synthesis. arXiv:2408.07196. https://doi.org/10.48550/arXiv.2408.07196
Berezsky, O., Melnyk, G., Liashchynskyi, P., Pitsun, O.; Babichev, S., Lytvynenko, V. (Eds.) (2025). Biomedical Image Datasets. Lecture Notes on Data Engineering and Communications Technologies, vol 244. Cham: Springer, 61–82. https://doi.org/10.1007/978-3-031-88483-2_3
Berezsky, O., Liashchynskyi, P., Melnyk, G., Dombrovskyi, M., Berezkyi, M. (2024). Synthesis of biomedical images based on generative intelligence tools. Proceedings of the 7th International Conference on Informatics & Data-Driven Medicine (IDDM 2024). Birmingham. CEUR Workshop Proceedings, 3892, 349–362. Available at: https://ceur-ws.org/Vol-3892/paper23.pdf
Berezsky, O., Liashchynskyi, P., Pitsun, O., Izonin, I. (2024). Synthesis of Convolutional Neural Network architectures for biomedical image classification. Biomedical Signal Processing and Control, 95, 106325. https://doi.org/10.1016/j.bspc.2024.106325
Berezsky, O., Liashchynskyi, P., Pitsun, O., Melnyk, G. (2024). Method and Software Tool for Generating Artificial Databases of Biomedical Images Based on Deep Neural Networks. 6th International Conference on Informatics & Data-Driven Medicine Bratislava. https://doi.org/10.48550/arXiv.2405.16119
Kuzmin, S., Berezsky, O. (2025). Analysis of diffusion models and biomedical image generation tools. Computer Systems and Information Technologies, 2, 8–19. https://doi.org/10.31891/csit-2025-2-1
Zhu, C., Chen, W., Peng, T., Wang, Y., Jin, M. (2022). Hard Sample Aware Noise Robust Learning for Histopathology Image Classification. IEEE Transactions on Medical Imaging, 41 (4), 881–894. https://doi.org/10.1109/tmi.2021.3125459
Ho, J., Salimans, T. (2022). Classifier-free diffusion guidance. arXiv:2207.12598. https://doi.org/10.48550/arXiv.2207.12598
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S. (2017). GANs trained by a two time-scale update rule converge to a local Nash equilibrium. 31st Conference on Neural Information Processing Systems (NIPS 2017). Long Beac. https://doi.org/10.48550/arXiv.1706.08500
Bińkowski, M., Sutherland, D. J., Arbel, M., Gretton, A. (2018). Demystifying MMD GANs. International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.1801.01401

Development of a parameter-efficient method for biomedical image synthesis by substituting text conditioning with pathology foundation model embeddings in latent diffusion

Authors

DOI:

Keywords:

Abstract

Author Biographies

Sergii Kuzmin, Lviv Polytechnic National University

Oleh Berezsky, West Ukrainian National University

References

Downloads

Published

How to Cite

Issue

Section

License

Information site

Language

Information

Developed By

Current Issue