An overview of current issues in automatic text summarization of natural language using artificial intelligence methods

Authors

DOI:

https://doi.org/10.15587/2706-5448.2024.309472

Keywords:

automatic abstracting, natural language processing, artificial intelligence, generative models, neural networks, deep learning

Abstract

The object of the research is the task of automatic abstracting of natural language texts. The importance of these tasks is determined by the existing problem of creating essays that would adequately reflect the content of the original text and highlight key information. This task requires the ability of models to deeply analyze the context of the text, which complicates the abstracting process.

Results are presented that demonstrate the effectiveness of using generative models based on neural networks, text semantic analysis methods, and deep learning for automatic creation of abstracts. The use of models showed a high level of adequacy and informativeness of abstracts. GPT (Generative Pre-trained Transformer) generates text that looks like it was written by a human, which makes it useful for automatic essay generation.

For example, the GPT model generates abbreviated summaries based on input text, while the BERT model is used for summarizing texts in many areas, including search engines and natural language processing. This allows for short but informative abstracts that retain the essential content of the original and provides the ability to produce high-quality abstracts that can be used for abstracting web pages, emails, social media, and other content. Compared to traditional abstracting methods, artificial intelligence provides such advantages as greater accuracy, informativeness and the ability to process large volumes of text more efficiently, which facilitates access to information and improves productivity in text processing.

Automatic abstracting of texts using artificial intelligence models allows to significantly reduce the time required for the analysis of large volumes of textual information. This is especially important in today's information environment, where the amount of available data is constantly growing. The use of these models promotes efficient use of resources and increases overall productivity in a variety of fields, including scientific research, education, business and media.

Author Biographies

Oleksii Kuznietsov, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

PhD Student

Department of System Design

Gennadiy Kyselov, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

PhD, Senior Researcher, Associate Professor

Department of System Design

References

  1. Pustejovsky, J., Stubbs, A. (2012). Natural Language Annotation for Machine Learning. Cambridge: Farnham, 343.
  2. Natural language processing. Available at: https://en.wikipedia.org/wiki/Natural_language_processing
  3. Luhn, H. P. (1958). The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development, 2 (2), 159–165. https://doi.org/10.1147/rd.22.0159
  4. Guarino, N., Masolo, C., Vetere, G. (1999). Content-Based Access to the Web. IEEE Intelligent Systems, 70–80.
  5. Lin, C.-Y., Hovy, E. H. (2000). The Automated acquisition of topic signatures for text summarization. Proceedings of COLING-00. Saarbrücken, 495–501. https://doi.org/10.3115/990820.990892
  6. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41 (6), 391–407. https://doi.org/10.1002/(sici)1097-4571(199009)41:6<391::aid-asi1>3.0.co;2-9
  7. Gong, Y., Liu, X. (2001). Generic text summarization using relevance measure and latent semantic analysis. Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 19–25. https://doi.org/10.1145/383952.383955
  8. PageRank. Available at: https://en.wikipedia.org/wiki/PageRank
  9. Kupiec, J., Pedersen, J., Chen, F. (1995). A trainable document summarizer. Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval – SIGIR ’95. ACM, 68–73. https://doi.org/10.1145/215206.215333
  10. Ouyang, Y., Li, W., Li, S., Lu, Q. (2011). Applying regression models to query-focused multi-document summarization. Information Processing & Management, 47 (2), 227–237. https://doi.org/10.1016/j.ipm.2010.03.005
  11. Wong, K.-F., Wu, M., Li, W. (2008). Extractive summarization using supervised and semi-supervised learning. Proceedings of the 22nd International Conference on Computational Linguistics – COLING ’08, 985–992. https://doi.org/10.3115/1599081.1599205
  12. Zhou, L., Hovy, E. (2003). A web-trained extraction summarization system. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology – NAACL ’03, 205–211. https://doi.org/10.3115/1073445.1073482
  13. Conroy, J. M., O’leary, D. P. (2001). Text summarization via hidden Markov models. Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 406–407. https://doi.org/10.1145/383952.384042
  14. Shen, D., Sun, J.-T., Li, H., Yang, Q., Chen, Z. (2007). Document Summarization Using Conditional Random Fields. IJCAI, 7, 2862–2867.
  15. SummarizeBot. Available at: https://www.summarizebot.com/about.html
  16. SMMRY. Available at: https://smmry.com/about
  17. Generative Pre-trained Transformer. Available at: https://openai.com/chatgpt
  18. Devlin, J., Chang, M.-W. (2018). Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing. Available at: https://research.google/blog/open-sourcing-bert-state-of-the-art-pre-training-for-natural-language-processing/
  19. TextTeaser. Available at: https://pypi.org/project/textteaser/
  20. Answers to Frequently Asked Questions about NLTK (2022). Available at: https://github.com/nltk/nltk/wiki/FAQ
  21. Gensim. Available at: https://radimrehurek.com/gensim/intro.html#what-is-gensim
  22. SUMY. Available at: https://github.com/miso-belica/sumy
  23. Bert-Extractive-Summarizer. Available at: https://github.com/dmmiller612/bert-extractive-summarizer
  24. Ukrainska pravda. Available at: https://www.pravda.com.ua/
  25. Pymorphy2 0.9.1. Available at: https://pypi.org/project/pymorphy2/
  26. Support vector machine. Available at: https://en.wikipedia.org/wiki/Support_vector_machine
  27. Tag Cloud. Available at: https://en.wikipedia.org/wiki/Tag_cloud
An overview of current issues in automatic text summarization of natural language using artificial intelligence methods

Downloads

Published

2024-07-31

How to Cite

Kuznietsov, O., & Kyselov, G. (2024). An overview of current issues in automatic text summarization of natural language using artificial intelligence methods. Technology Audit and Production Reserves, 4(2(78), 12–19. https://doi.org/10.15587/2706-5448.2024.309472

Issue

Section

Information Technologies