Development and research of the efficiency of a website parsing information system using the Selenide framework

Authors

DOI:

https://doi.org/10.31498/2225-6733.49.1.2024.321179

Keywords:

web parsing, web parser, web site, web application, information search system, data, Selenium, Selenide

Abstract

The article is devoted to the study of methods for automating data collection from websites using parsing technologies. The paper describes the main advantages of parsing compared to manual data collection, provides a classification of existing parsers, their capabilities, limitations and application in real projects. A detailed analysis of popular commercial and free parsers, such as Import.io, Webhose.io, Dexi.io, Scraperhub, ParseHub, Visual Scraper, Spinn3r, 80legs, Scraper, OutWit Hub, is carried out in order to determine their advantages and disadvantages in various usage scenarios. Particular attention is paid to the comparison of the Selenide and Selenide frameworks, which are widely used to automate interaction with web browsers. The conclusion is made about the feasibility of using the Selenide framework due to its simplified syntax, capabilities for working with dynamic content and support for intelligent waiting. The article presents the development of a custom parser based on Selenide, focused on the needs of small and medium-sized enterprises with a limited budget. The system is built on a modern technology stack, including Java 11, Python, PostgreSQL, Angular 12, Docker, Gradle, Kafka, Node.js. The program architecture, interaction between modules, and the relational database model for storing the obtained data are described in detail. The proposed approach allows you to configure the parser to work with different types of sites, provides high speed of information collection and processing, as well as flexibility in configuring sampling parameters. The created tool provides the opportunity to use containerization technologies to simplify the deployment and support of the application. The results of the work can be used to implement effective information search systems and automate routine data collection processes, which is especially important for companies that seek to optimize their business processes and reduce costs

Author Biographies

A. Serhiienko, State Higher Education Institution «Priazovskyi state technical university», Dnipro

PhD (Engineering), associate professor

O. Balalaieva, State Higher Education Institution «Priazovskyi state technical university», Dnipro

PhD (Engineering), associate professor

I. Garkusha, Dnipro University of Technology, Dnipro

PhD (Engineering), associate professor

D. Platonov , State Higher Education Institution «Priazovskyi state technical university», Dnipro

Master's student

References

Ratra R., Gulia P. Big Data tools and techniques: a roadmap for predictive analytics. International Journal of Engineering and Advanced Technology (IJEAT). 2009. Vol. 9. Iss. 2. Pp. 4986-4992. DOI: https://doi.org/10.35940/ijeat.B2360.129219.

Tomar R.S. A Study on Web Scraping. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering. 2020. Vol. 8. Iss. 6. Pp. 1820-1824. DOI: https://doi.org/10.15662/IJAREEIE.2019.0806020.

Ateeq W. M. B., Al-Khalifa H. S. Intelligent framework for detecting predatory publishing venues. IEEE Access. 2023. Vol. 11. Pp. 20582-20618. DOI: https://doi.org/10.1109/ACCESS.2023.3250256.

EasySpider: EasySpider: A No-Code Visual System for Crawling the Web / Wang N., Feng W., Yin J., Ng S.-K. WWW '23 Companion : Companion Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April - 4 May 2023. Pp. 192-195. DOI: https://doi.org/10.1145/3543873.3587345.

A classification framework for data marketplaces / Stahl F., Schomm F., Vossen G., Vomfell L. Vietnam Journal of Computer Science. 2016. Vol. 3. Pp. 137-143. DOI: https://doi.org/10.1007/s40595-016-0064-2.

Kirichenko L., Radivilova T., Carlsson A. Detecting cyber threats through social network analysis: short survey. SocioEconomic Challenges. 2017. Vol. 1. Iss. 1. Pp. 20-34. DOI: https://doi.org/10.21272/sec.2017.1-03.

Exploring Web Scraping with Python / Sasi A., Deep A., Kumar K., Birla V. Machine Intelligence and Smart Systems : Proceedings of the I International Conference, Gwalior, India, 24-25 September 2020. Pp. 287-296. DOI: https://doi.org/10.1007/978-981-33-4893-6_26.

Selenide. Concise UI tests in Java. URL: https://selenide.org/documentation/selenide-vs-selenium.html (дата звернення: 28.08.2024).

Published

2024-12-26

How to Cite

Serhiienko, A. ., Balalaieva, O. ., Garkusha, I. ., & Platonov , D. . (2024). Development and research of the efficiency of a website parsing information system using the Selenide framework. Reporter of the Priazovskyi State Technical University. Section: Technical Sciences, 1(49), 16–28. https://doi.org/10.31498/2225-6733.49.1.2024.321179