Large language model (LLM) comparison between GPT-3 and PaLM-2 to produce Indonesian cultural content

Authors

DOI:

https://doi.org/10.15587/1729-4061.2024.309972

Keywords:

large language model, generative artificial intelligence, GPT-3, PaLM-2, BERTScore Evaluation

Abstract

Large language models can help to compile content with a cultural theme. However, any information generated by large language models needs to be evaluated to see the truth/fact of the information generated. With many studies discussing the comparison of the capabilities of large language models, there is not much research that directly discusses the comparison of the performance of large language models in producing Indonesian cultural content. This research compares the correctness of the information generated by the large language model using the expert judgment method when creating Indonesian cultural content and its fine-tuning capabilities evaluated using BERTScore. The evaluation method was successfully applied and the results show that in this case, PaLM-2 included less misinformation while GPT-3 excelled in fine-tuning. Using the combination of expert judgment and BERTScore makes it possible to evaluate large language models and obtain additional valid training data to correct deficiencies. The results showed that PaLM-2 produced more valid content with a score of 27 points, while GPT-3 scored 8 points. For training on new datasets/fine-tuning, it was found that the GPT-3 language model was able to learn the dataset more quickly, with a time of 50 minutes and a cost of IDR 27,000, while PaLM-2 took 2 hours 10 minutes and a cost of IDR 1,377,204. For the training dataset evaluation results, GPT-3 is superior with an average of all scores reaching 0.85205. Meanwhile, the PaLM-2 Tuned Model got an average overall score of 0.78942. In this case, the GPT-3 Tuned Model is superior by 8 %. In practice, this method can be used if the assessment is descriptive and requires direct assessment from experts

Supporting Agency

  • Thank you to the Palembang City Culture Service for being a resource for the data in this research.

Author Biographies

Deni Erlansyah, Universitas Bina Darma

Data Science Interdisciplinary Research Center

Amirul Mukminin, Jambi University

Doctor of Educational Leardership and Policy Studies (Language Policy), Professor

Department of English Language Education

Dedek Julian, Universitas Bina Darma

Data Science Interdisciplinary Research Center

Edi Surya Negara, Universitas Bina Darma

Data Science Interdisciplinary Research Center

Ferdi Aditya, Universitas Bina Darma

Data Science Interdisciplinary Research Center

Rezki Syaputra, Universitas Bina Darma

Data Science Interdisciplinary Research Center

References

  1. Wijaya, J. H. (2023). Lifestyle Transformation in Indonesia: The Impact of Foreign Cultures in the Era of Globalization. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4511264
  2. Adnan, N. (2014). Character Building Through Traditional Dance As Developing Identity Belongings: A Study Of Indonesia-Malaysia. Proceeding of the Third International Seminar on Languages and Arts. Padang. Available at: https://ejournal.unp.ac.id/index.php/isla/article/view/5412/
  3. Barbier, E. B., Burgess, J. C. (2017). The Sustainable Development Goals and the systems approach to sustainability. Economics, 11 (1). https://doi.org/10.5018/economics-ejournal.ja.2017-28
  4. Yamasaki, K., Yamada, T. (2022). A framework to assess the local implementation of Sustainable Development Goal 11. Sustainable Cities and Society, 84, 104002. https://doi.org/10.1016/j.scs.2022.104002
  5. Negara, E., Hidayanto, A., Andryani, R., Syaputra, R. (2021). Survey of Smart Contract Framework and Its Application. Information, 12 (7), 257. https://doi.org/10.3390/info12070257
  6. Lyu, Y., Zhang, H., Niu, S., Cai, J. (2024). A Preliminary Exploration of YouTubers’ Use of Generative-AI in Content Creation. Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3613905.3651057
  7. Zhang, C., Lu, Y. (2021). Study on artificial intelligence: The state of the art and future prospects. Journal of Industrial Information Integration, 23, 100224. https://doi.org/10.1016/j.jii.2021.100224
  8. Koteluk, O., Wartecki, A., Mazurek, S., Kołodziejczak, I., Mackiewicz, A. (2021). How Do Machines Learn? Artificial Intelligence as a New Era in Medicine. Journal of Personalized Medicine, 11 (1), 32. https://doi.org/10.3390/jpm11010032
  9. Shabbir, J., Anwer, T. (2018). Artificial Intelligence and its Role in Near Future. arXiv. https://doi.org/10.48550/arXiv.1804.01396
  10. Ahmed, I., Roy, A., Kajol, M., Hasan, U., Datta, P. P., Reza, Md. R. (2023). ChatGPT vs. Bard: A Comparative Study. https://doi.org/10.22541/au.168923529.98827844/v1
  11. Shidiq, M. (2023). The Use Of Artificial Intelligence-Based Chat-gpt And Its Challenges For The World Of Education; From The Viewpoint Of The Development Of Creative Writing Skills. Proceeding of International Conference on Education, Society and Humanity, 353–357. Available at: https://ejournal.unuja.ac.id/index.php/icesh/article/view/5614
  12. González García, C., Núñez-Valdez, E., García-Díaz, V., Pelayo G-Bustelo, C., Cueva-Lovelle, J. M. (2019). A Review of Artificial Intelligence in the Internet of Things. International Journal of Interactive Multimedia and Artificial Intelligence, 5 (4), 9. https://doi.org/10.9781/ijimai.2018.03.004
  13. Jan, Z., Ahamed, F., Mayer, W., Patel, N., Grossmann, G., Stumptner, M., Kuusk, A. (2023). Artificial intelligence for industry 4.0: Systematic review of applications, challenges, and opportunities. Expert Systems with Applications, 216, 119456. https://doi.org/10.1016/j.eswa.2022.119456
  14. Hasan, A. R. (2022). Artificial Intelligence (AI) in Accounting & Auditing: A Literature Review. Open Journal of Business and Management, 10 (01), 440–465. https://doi.org/10.4236/ojbm.2022.101026
  15. Hughes, R. T., Zhu, L., Bednarz, T. (2021). Generative Adversarial Networks–Enabled Human–Artificial Intelligence Collaborative Applications for Creative and Design Industries: A Systematic Review of Current Approaches and Trends. Frontiers in Artificial Intelligence, 4. https://doi.org/10.3389/frai.2021.604234
  16. Tri Julianto, I., Kurniadi, D., Septiana, Y., Sutedi, A. (2023). Alternative Text Pre-Processing using Chat GPT Open AI. Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI), 12 (1), 67–77. https://doi.org/10.23887/janapati.v12i1.59746
  17. Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S., Usman, M. et al. (2023). A Comprehensive Overview of Large Language Models. arXiv. Available: https://doi.org/10.48550/arXiv.2307.06435
  18. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P. et al. (2023). Language Models are Few-Shot Learners. arXiv. https://doi.org/10.48550/arXiv.2005.14165
  19. Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A. et al. (2022). PaLM: Scaling Language Modeling with Pathways. arXiv. https://doi.org/10.48550/arXiv.2204.02311
  20. Andryani, R., Surya Negara, E., Syaputra, R., Erlansyah, D. (2023). Analysis of Academic Social Networks in Indonesia. Qubahan Academic Journal, 3 (4), 409–421. https://doi.org/10.58429/qaj.v3n4a289
  21. Negara, E. S., Keni, K., Andryani, R., Syaputra, R. S., Widyanti, Y. (2023). Social network analysis to detect influential actors with Indonesian hastags using the centrality method. Sixth International Conference of Mathematical Sciences (ICMS 2022). https://doi.org/10.1063/5.0126819
  22. Negara, E. S., Andryani, R., Erlansyah, D., Syaputra, R. (2020). Analysis of Indonesian Motorcycle Gang with Social Network Approach. International Journal of Advanced Computer Science and Applications, 11 (12). https://doi.org/10.14569/ijacsa.2020.0111224
  23. Nurhachita, N., Negara, E. S. (2021). A comparison between deep learning, naïve bayes and random forest for the application of data mining on the admission of new students. IAES International Journal of Artificial Intelligence (IJ-AI), 10 (2), 324. https://doi.org/10.11591/ijai.v10.i2.pp324-331
  24. Anil, R., Dai, A. M., Firat, O., Johnson, M., Lepikhin, D., Passoset, A. et al. (2023). PaLM 2 Technical Report. arXiv. https://doi.org/10.48550/arXiv.2305.10403
  25. Porter, J. (2023). ChatGPT continues to be one of the fastest-growing services ever. The Verge. Available at: https://www.theverge.com/2023/11/6/23948386/chatgpt-active-user-count-openai-developer-conference
  26. Aydin, Ö., Karaarslan, E. (2023). Is ChatGPT Leading Generative AI? What is Beyond Expectations? Academic Platform Journal of Engineering and Smart Systems, 11 (3), 118–134. https://doi.org/10.21541/apjess.1293702
  27. Farquhar, S., Varma, V., Kenton, Z., Gasteiger, J., Mikulik, V., Shah, R. (2024). Challenges with unsupervised LLM knowledge discovery. arXiv. https://doi.org/10.48550/arXiv.2312.10029
  28. Floridi, L., Chiriatti, M. (2020). GPT-3: Its Nature, Scope, Limits, and Consequences. Minds and Machines, 30 (4), 681–694. https://doi.org/10.1007/s11023-020-09548-1
  29. Chae, Y., Davidson, T. (2023). Large Language Models for Text Classification: From Zero-Shot Learning to Instruction-Tuning. https://doi.org/10.31235/osf.io/sthwk
  30. Bi, B., Li, C., Wu, C., Yan, M., Wang, W., Huang, S. et al. (2020). PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://doi.org/10.18653/v1/2020.emnlp-main.700
  31. Schubert, M. C., Wick, W., Venkataramani, V. (2023). Performance of Large Language Models on a Neurology Board–Style Examination. JAMA Network Open, 6 (12), e2346721. https://doi.org/10.1001/jamanetworkopen.2023.46721
  32. Chen, L., Chen, P., Lin, Z. (2020). Artificial Intelligence in Education: A Review. IEEE Access, 8, 75264–75278. https://doi.org/10.1109/access.2020.2988510
  33. Koto, F., Aisyah, N., Li, H., Baldwin, T. (2023). Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. https://doi.org/10.18653/v1/2023.emnlp-main.760
  34. Dao, X.-Q. (2023). Performance Comparison of Large Language Models on VNHSGE English Dataset: OpenAI ChatGPT, Microsoft Bing Chat, and Google Bard. arXiv. https://doi.org/10.48550/arXiv.2307.02288
  35. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P. (2022). Training language models to follow instructions with human feedback. arXiv. https://doi.org/10.48550/arXiv.2203.02155
  36. Milani Fitria, K. (2023). Information Retrieval Performance in Text Generation using Knowledge from Generative Pre-trained Transformer (GPT-3). Jambura Journal of Mathematics, 5 (2), 327–338. https://doi.org/10.34312/jjom.v5i2.20574
  37. Rofiq, M. A., Azhar, A. (2022). Hazards Identification and Risk Assessment In Welding Confined Space Ship Reparation PT. X With Job Safety Analysis Method. BERKALA SAINSTEK, 10 (4), 175. https://doi.org/10.19184/bst.v10i4.32669
  38. Bill, D., Eriksson, T. (2023). Fine-Tuning A Llm Using Reinforcement Learning From Human Feedback For A Therapy Chatbot Application. KTH. Available at: https://www.diva-portal.org/smash/get/diva2:1782678/FULLTEXT01.pdf
  39. Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., Artzi, Y. (2020). BERTScore: Evaluating Text Generation with BERT. arXiv. https://doi.org/10.48550/arXiv.1904.09675
Large language model (LLM) comparison between GPT-3 and PaLM-2 to produce Indonesian cultural content

Downloads

Published

2024-08-30

How to Cite

Erlansyah, D., Mukminin, A., Julian, D., Negara, E. S., Aditya, F., & Syaputra, R. (2024). Large language model (LLM) comparison between GPT-3 and PaLM-2 to produce Indonesian cultural content. Eastern-European Journal of Enterprise Technologies, 4(2 (130), 19–29. https://doi.org/10.15587/1729-4061.2024.309972