An overview of current issues in automatic text summarization of natural language using artificial intelligence methods

Oleksii Kuznietsov; Gennadiy Kyselov

doi:10.15587/2706-5448.2024.309472

Authors

Oleksii Kuznietsov National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute», Ukraine https://orcid.org/0000-0002-3537-9976
Gennadiy Kyselov National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute», Ukraine https://orcid.org/0000-0003-2682-3593

DOI:

https://doi.org/10.15587/2706-5448.2024.309472

Keywords:

automatic abstracting, natural language processing, artificial intelligence, generative models, neural networks, deep learning

Abstract

The object of the research is the task of automatic abstracting of natural language texts. The importance of these tasks is determined by the existing problem of creating essays that would adequately reflect the content of the original text and highlight key information. This task requires the ability of models to deeply analyze the context of the text, which complicates the abstracting process.

Results are presented that demonstrate the effectiveness of using generative models based on neural networks, text semantic analysis methods, and deep learning for automatic creation of abstracts. The use of models showed a high level of adequacy and informativeness of abstracts. GPT (Generative Pre-trained Transformer) generates text that looks like it was written by a human, which makes it useful for automatic essay generation.

For example, the GPT model generates abbreviated summaries based on input text, while the BERT model is used for summarizing texts in many areas, including search engines and natural language processing. This allows for short but informative abstracts that retain the essential content of the original and provides the ability to produce high-quality abstracts that can be used for abstracting web pages, emails, social media, and other content. Compared to traditional abstracting methods, artificial intelligence provides such advantages as greater accuracy, informativeness and the ability to process large volumes of text more efficiently, which facilitates access to information and improves productivity in text processing.

Automatic abstracting of texts using artificial intelligence models allows to significantly reduce the time required for the analysis of large volumes of textual information. This is especially important in today's information environment, where the amount of available data is constantly growing. The use of these models promotes efficient use of resources and increases overall productivity in a variety of fields, including scientific research, education, business and media.

Author Biographies

Oleksii Kuznietsov, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

PhD Student

Department of System Design

Gennadiy Kyselov, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

PhD, Senior Researcher, Associate Professor

Department of System Design

References

Pustejovsky, J., Stubbs, A. (2012). Natural Language Annotation for Machine Learning. Cambridge: Farnham, 343.
Natural language processing. Available at: https://en.wikipedia.org/wiki/Natural_language_processing
Luhn, H. P. (1958). The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development, 2 (2), 159–165. https://doi.org/10.1147/rd.22.0159
Guarino, N., Masolo, C., Vetere, G. (1999). Content-Based Access to the Web. IEEE Intelligent Systems, 70–80.
Lin, C.-Y., Hovy, E. H. (2000). The Automated acquisition of topic signatures for text summarization. Proceedings of COLING-00. Saarbrücken, 495–501. https://doi.org/10.3115/990820.990892
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41 (6), 391–407. https://doi.org/10.1002/(sici)1097-4571(199009)41:6<391::aid-asi1>3.0.co;2-9
Gong, Y., Liu, X. (2001). Generic text summarization using relevance measure and latent semantic analysis. Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 19–25. https://doi.org/10.1145/383952.383955
PageRank. Available at: https://en.wikipedia.org/wiki/PageRank
Kupiec, J., Pedersen, J., Chen, F. (1995). A trainable document summarizer. Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval – SIGIR ’95. ACM, 68–73. https://doi.org/10.1145/215206.215333
Ouyang, Y., Li, W., Li, S., Lu, Q. (2011). Applying regression models to query-focused multi-document summarization. Information Processing & Management, 47 (2), 227–237. https://doi.org/10.1016/j.ipm.2010.03.005
Wong, K.-F., Wu, M., Li, W. (2008). Extractive summarization using supervised and semi-supervised learning. Proceedings of the 22nd International Conference on Computational Linguistics – COLING ’08, 985–992. https://doi.org/10.3115/1599081.1599205
Zhou, L., Hovy, E. (2003). A web-trained extraction summarization system. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology – NAACL ’03, 205–211. https://doi.org/10.3115/1073445.1073482
Conroy, J. M., O’leary, D. P. (2001). Text summarization via hidden Markov models. Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 406–407. https://doi.org/10.1145/383952.384042
Shen, D., Sun, J.-T., Li, H., Yang, Q., Chen, Z. (2007). Document Summarization Using Conditional Random Fields. IJCAI, 7, 2862–2867.
SummarizeBot. Available at: https://www.summarizebot.com/about.html
SMMRY. Available at: https://smmry.com/about
Generative Pre-trained Transformer. Available at: https://openai.com/chatgpt
Devlin, J., Chang, M.-W. (2018). Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing. Available at: https://research.google/blog/open-sourcing-bert-state-of-the-art-pre-training-for-natural-language-processing/
TextTeaser. Available at: https://pypi.org/project/textteaser/
Answers to Frequently Asked Questions about NLTK (2022). Available at: https://github.com/nltk/nltk/wiki/FAQ
Gensim. Available at: https://radimrehurek.com/gensim/intro.html#what-is-gensim
SUMY. Available at: https://github.com/miso-belica/sumy
Bert-Extractive-Summarizer. Available at: https://github.com/dmmiller612/bert-extractive-summarizer
Ukrainska pravda. Available at: https://www.pravda.com.ua/
Pymorphy2 0.9.1. Available at: https://pypi.org/project/pymorphy2/
Support vector machine. Available at: https://en.wikipedia.org/wiki/Support_vector_machine
Tag Cloud. Available at: https://en.wikipedia.org/wiki/Tag_cloud

An overview of current issues in automatic text summarization of natural language using artificial intelligence methods

Authors

DOI:

Keywords:

Abstract

Author Biographies

Oleksii Kuznietsov, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

Gennadiy Kyselov, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

References

Downloads

Published

How to Cite

Issue

Section

License

Information site

Language

Information

Developed By

Current Issue