Large language model (LLM) comparison between GPT-3 and PaLM-2 to produce Indonesian cultural content

Authors

Deni Erlansyah Universitas Bina Darma, Indonesia https://orcid.org/0009-0005-5021-0174
Amirul Mukminin Jambi University, Indonesia https://orcid.org/0000-0002-6806-1315
Dedek Julian Universitas Bina Darma, Indonesia https://orcid.org/0009-0004-3582-8788
Edi Surya Negara Universitas Bina Darma, Indonesia https://orcid.org/0000-0002-6387-2813
Ferdi Aditya Universitas Bina Darma, Indonesia https://orcid.org/0009-0002-6185-0220
Rezki Syaputra Universitas Bina Darma, Indonesia https://orcid.org/0009-0006-4018-6913

DOI:

https://doi.org/10.15587/1729-4061.2024.309972

Keywords:

large language model, generative artificial intelligence, GPT-3, PaLM-2, BERTScore Evaluation

Abstract

Large language models can help to compile content with a cultural theme. However, any information generated by large language models needs to be evaluated to see the truth/fact of the information generated. With many studies discussing the comparison of the capabilities of large language models, there is not much research that directly discusses the comparison of the performance of large language models in producing Indonesian cultural content. This research compares the correctness of the information generated by the large language model using the expert judgment method when creating Indonesian cultural content and its fine-tuning capabilities evaluated using BERTScore. The evaluation method was successfully applied and the results show that in this case, PaLM-2 included less misinformation while GPT-3 excelled in fine-tuning. Using the combination of expert judgment and BERTScore makes it possible to evaluate large language models and obtain additional valid training data to correct deficiencies. The results showed that PaLM-2 produced more valid content with a score of 27 points, while GPT-3 scored 8 points. For training on new datasets/fine-tuning, it was found that the GPT-3 language model was able to learn the dataset more quickly, with a time of 50 minutes and a cost of IDR 27,000, while PaLM-2 took 2 hours 10 minutes and a cost of IDR 1,377,204. For the training dataset evaluation results, GPT-3 is superior with an average of all scores reaching 0.85205. Meanwhile, the PaLM-2 Tuned Model got an average overall score of 0.78942. In this case, the GPT-3 Tuned Model is superior by 8 %. In practice, this method can be used if the assessment is descriptive and requires direct assessment from experts

Supporting Agency

Thank you to the Palembang City Culture Service for being a resource for the data in this research.

Author Biographies

Deni Erlansyah, Universitas Bina Darma

Data Science Interdisciplinary Research Center

Amirul Mukminin, Jambi University

Doctor of Educational Leardership and Policy Studies (Language Policy), Professor

Department of English Language Education

Dedek Julian, Universitas Bina Darma

Data Science Interdisciplinary Research Center

Edi Surya Negara, Universitas Bina Darma

Data Science Interdisciplinary Research Center

Ferdi Aditya, Universitas Bina Darma

Data Science Interdisciplinary Research Center

Rezki Syaputra, Universitas Bina Darma

Data Science Interdisciplinary Research Center

References

Wijaya, J. H. (2023). Lifestyle Transformation in Indonesia: The Impact of Foreign Cultures in the Era of Globalization. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4511264
Adnan, N. (2014). Character Building Through Traditional Dance As Developing Identity Belongings: A Study Of Indonesia-Malaysia. Proceeding of the Third International Seminar on Languages and Arts. Padang. Available at: https://ejournal.unp.ac.id/index.php/isla/article/view/5412/
Barbier, E. B., Burgess, J. C. (2017). The Sustainable Development Goals and the systems approach to sustainability. Economics, 11 (1). https://doi.org/10.5018/economics-ejournal.ja.2017-28
Yamasaki, K., Yamada, T. (2022). A framework to assess the local implementation of Sustainable Development Goal 11. Sustainable Cities and Society, 84, 104002. https://doi.org/10.1016/j.scs.2022.104002
Negara, E., Hidayanto, A., Andryani, R., Syaputra, R. (2021). Survey of Smart Contract Framework and Its Application. Information, 12 (7), 257. https://doi.org/10.3390/info12070257
Lyu, Y., Zhang, H., Niu, S., Cai, J. (2024). A Preliminary Exploration of YouTubers’ Use of Generative-AI in Content Creation. Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3613905.3651057
Zhang, C., Lu, Y. (2021). Study on artificial intelligence: The state of the art and future prospects. Journal of Industrial Information Integration, 23, 100224. https://doi.org/10.1016/j.jii.2021.100224
Koteluk, O., Wartecki, A., Mazurek, S., Kołodziejczak, I., Mackiewicz, A. (2021). How Do Machines Learn? Artificial Intelligence as a New Era in Medicine. Journal of Personalized Medicine, 11 (1), 32. https://doi.org/10.3390/jpm11010032
Shabbir, J., Anwer, T. (2018). Artificial Intelligence and its Role in Near Future. arXiv. https://doi.org/10.48550/arXiv.1804.01396
Ahmed, I., Roy, A., Kajol, M., Hasan, U., Datta, P. P., Reza, Md. R. (2023). ChatGPT vs. Bard: A Comparative Study. https://doi.org/10.22541/au.168923529.98827844/v1
Shidiq, M. (2023). The Use Of Artificial Intelligence-Based Chat-gpt And Its Challenges For The World Of Education; From The Viewpoint Of The Development Of Creative Writing Skills. Proceeding of International Conference on Education, Society and Humanity, 353–357. Available at: https://ejournal.unuja.ac.id/index.php/icesh/article/view/5614
González García, C., Núñez-Valdez, E., García-Díaz, V., Pelayo G-Bustelo, C., Cueva-Lovelle, J. M. (2019). A Review of Artificial Intelligence in the Internet of Things. International Journal of Interactive Multimedia and Artificial Intelligence, 5 (4), 9. https://doi.org/10.9781/ijimai.2018.03.004
Jan, Z., Ahamed, F., Mayer, W., Patel, N., Grossmann, G., Stumptner, M., Kuusk, A. (2023). Artificial intelligence for industry 4.0: Systematic review of applications, challenges, and opportunities. Expert Systems with Applications, 216, 119456. https://doi.org/10.1016/j.eswa.2022.119456
Hasan, A. R. (2022). Artificial Intelligence (AI) in Accounting & Auditing: A Literature Review. Open Journal of Business and Management, 10 (01), 440–465. https://doi.org/10.4236/ojbm.2022.101026
Hughes, R. T., Zhu, L., Bednarz, T. (2021). Generative Adversarial Networks–Enabled Human–Artificial Intelligence Collaborative Applications for Creative and Design Industries: A Systematic Review of Current Approaches and Trends. Frontiers in Artificial Intelligence, 4. https://doi.org/10.3389/frai.2021.604234
Tri Julianto, I., Kurniadi, D., Septiana, Y., Sutedi, A. (2023). Alternative Text Pre-Processing using Chat GPT Open AI. Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI), 12 (1), 67–77. https://doi.org/10.23887/janapati.v12i1.59746
Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S., Usman, M. et al. (2023). A Comprehensive Overview of Large Language Models. arXiv. Available: https://doi.org/10.48550/arXiv.2307.06435
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P. et al. (2023). Language Models are Few-Shot Learners. arXiv. https://doi.org/10.48550/arXiv.2005.14165
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A. et al. (2022). PaLM: Scaling Language Modeling with Pathways. arXiv. https://doi.org/10.48550/arXiv.2204.02311
Andryani, R., Surya Negara, E., Syaputra, R., Erlansyah, D. (2023). Analysis of Academic Social Networks in Indonesia. Qubahan Academic Journal, 3 (4), 409–421. https://doi.org/10.58429/qaj.v3n4a289
Negara, E. S., Keni, K., Andryani, R., Syaputra, R. S., Widyanti, Y. (2023). Social network analysis to detect influential actors with Indonesian hastags using the centrality method. Sixth International Conference of Mathematical Sciences (ICMS 2022). https://doi.org/10.1063/5.0126819
Negara, E. S., Andryani, R., Erlansyah, D., Syaputra, R. (2020). Analysis of Indonesian Motorcycle Gang with Social Network Approach. International Journal of Advanced Computer Science and Applications, 11 (12). https://doi.org/10.14569/ijacsa.2020.0111224
Nurhachita, N., Negara, E. S. (2021). A comparison between deep learning, naïve bayes and random forest for the application of data mining on the admission of new students. IAES International Journal of Artificial Intelligence (IJ-AI), 10 (2), 324. https://doi.org/10.11591/ijai.v10.i2.pp324-331
Anil, R., Dai, A. M., Firat, O., Johnson, M., Lepikhin, D., Passoset, A. et al. (2023). PaLM 2 Technical Report. arXiv. https://doi.org/10.48550/arXiv.2305.10403
Porter, J. (2023). ChatGPT continues to be one of the fastest-growing services ever. The Verge. Available at: https://www.theverge.com/2023/11/6/23948386/chatgpt-active-user-count-openai-developer-conference
Aydin, Ö., Karaarslan, E. (2023). Is ChatGPT Leading Generative AI? What is Beyond Expectations? Academic Platform Journal of Engineering and Smart Systems, 11 (3), 118–134. https://doi.org/10.21541/apjess.1293702
Farquhar, S., Varma, V., Kenton, Z., Gasteiger, J., Mikulik, V., Shah, R. (2024). Challenges with unsupervised LLM knowledge discovery. arXiv. https://doi.org/10.48550/arXiv.2312.10029
Floridi, L., Chiriatti, M. (2020). GPT-3: Its Nature, Scope, Limits, and Consequences. Minds and Machines, 30 (4), 681–694. https://doi.org/10.1007/s11023-020-09548-1
Chae, Y., Davidson, T. (2023). Large Language Models for Text Classification: From Zero-Shot Learning to Instruction-Tuning. https://doi.org/10.31235/osf.io/sthwk
Bi, B., Li, C., Wu, C., Yan, M., Wang, W., Huang, S. et al. (2020). PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://doi.org/10.18653/v1/2020.emnlp-main.700
Schubert, M. C., Wick, W., Venkataramani, V. (2023). Performance of Large Language Models on a Neurology Board–Style Examination. JAMA Network Open, 6 (12), e2346721. https://doi.org/10.1001/jamanetworkopen.2023.46721
Chen, L., Chen, P., Lin, Z. (2020). Artificial Intelligence in Education: A Review. IEEE Access, 8, 75264–75278. https://doi.org/10.1109/access.2020.2988510
Koto, F., Aisyah, N., Li, H., Baldwin, T. (2023). Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. https://doi.org/10.18653/v1/2023.emnlp-main.760
Dao, X.-Q. (2023). Performance Comparison of Large Language Models on VNHSGE English Dataset: OpenAI ChatGPT, Microsoft Bing Chat, and Google Bard. arXiv. https://doi.org/10.48550/arXiv.2307.02288
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P. (2022). Training language models to follow instructions with human feedback. arXiv. https://doi.org/10.48550/arXiv.2203.02155
Milani Fitria, K. (2023). Information Retrieval Performance in Text Generation using Knowledge from Generative Pre-trained Transformer (GPT-3). Jambura Journal of Mathematics, 5 (2), 327–338. https://doi.org/10.34312/jjom.v5i2.20574
Rofiq, M. A., Azhar, A. (2022). Hazards Identification and Risk Assessment In Welding Confined Space Ship Reparation PT. X With Job Safety Analysis Method. BERKALA SAINSTEK, 10 (4), 175. https://doi.org/10.19184/bst.v10i4.32669
Bill, D., Eriksson, T. (2023). Fine-Tuning A Llm Using Reinforcement Learning From Human Feedback For A Therapy Chatbot Application. KTH. Available at: https://www.diva-portal.org/smash/get/diva2:1782678/FULLTEXT01.pdf
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., Artzi, Y. (2020). BERTScore: Evaluating Text Generation with BERT. arXiv. https://doi.org/10.48550/arXiv.1904.09675

Large language model (LLM) comparison between GPT-3 and PaLM-2 to produce Indonesian cultural content

Downloads

PDF

Published

2024-08-30

How to Cite

Erlansyah, D., Mukminin, A., Julian, D., Negara, E. S., Aditya, F., & Syaputra, R. (2024). Large language model (LLM) comparison between GPT-3 and PaLM-2 to produce Indonesian cultural content. Eastern-European Journal of Enterprise Technologies, 4(2 (130), 19–29. https://doi.org/10.15587/1729-4061.2024.309972

Download Citation

Issue

Vol. 4 No. 2 (130) (2024): Information technology. Industry control systems

Section

Information technology

License

Copyright (c) 2024 Deni Erlansyah, Amirul Mukminin, Dedek Julian, Edi Surya Negara, Ferdi Aditya, Rezki Syaputra

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.

A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with TECHNOLOGY CENTER PC, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher TECHNOLOGY CENTER PC does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.