Development of a fake news detection tool for Vietnamese based on deep learning techniques

Trung Hung Vo; Thi Le Thuyen Phan; Khanh Chi Ninh

doi:10.15587/1729-4061.2022.265317

Authors

Trung Hung Vo University of Technology and Education - The University of Danang, Viet Nam
Thi Le Thuyen Phan FPT University, Viet Nam https://orcid.org/0000-0002-3445-8760
Khanh Chi Ninh The University of Danang, Viet Nam https://orcid.org/0000-0001-8129-6453

DOI:

https://doi.org/10.15587/1729-4061.2022.265317

Keywords:

fake news detection, natural language processing, deep learning, CNN, RNN

Abstract

With the development of the Internet, social networks and different communication channels, people can get information quickly and easily. However, in addition to real and useful news, we also receive false and unreal information. The problem of fake news has become a difficult and unresolved issue. For languages with few users, such as Vietnamese, the research on fake news detection is still very limited and has not received much attention.

In this paper, we present research results on building a tool to support fake news detection for Vietnamese. Our idea is to apply text classification techniques to fake news detection. We have built a database of 4 groups of 2 topics about politics (fake news and real news) and about Covid-19 (fake news and real news). Then use deep learning techniques CNN (Convolutional Neural Network) and RNN (Recurrent Neural Network) to create the corresponding models. When there is new news that needs to be verified, we just need to apply the classification to see which of the four groups they label into to decide whether it is fake news or not. The tool was able to detect fake news quickly and easily with a correct rate of about 85 %. This result will be improved when getting a larger training data set and adjusting the parameters for the machine learning model. These results make an important contribution to the research on detecting fake news for Vietnamese and can be applied to other languages. In the future, besides using classification techniques (based on content analysis), we can combine many other methods such as checking the source, verifying the author's information, checking the distribution process to improve the quality of fake news detection.

Supporting Agency

This research was funded by the Ministry of Education and Training (Vietnam) through the project code B2022-DNA-17.

Author Biographies

Trung Hung Vo, University of Technology and Education - The University of Danang

Doctor of Computer Science, Professor, Vice-Rector

Department of Digital Technology

Thi Le Thuyen Phan, FPT University

Doctor of Computer Science, IT Lecturer

Department of Information Technology

Khanh Chi Ninh, The University of Danang

PhD Student, Lecturer of VKU

Department of Information Technology

Vietnam-Korea University of Information and Communication Technology (VKU)

References

Watson, A. (2022). Trust in media worldwide 2021. Statista. Available at: https://www.statista.com/statistics/683336/media-trust-worldwide/
Fallis, D. (2015). What Is Disinformation? Library Trends, 63 (3), 401–426. doi: https://doi.org/10.1353/lib.2015.0014
Wardle, C., Derakhshan, H. (2017). Information disorder: Toward an interdisciplinary framework for research and policy making. Council of Europe, 109.
Nguyen, D. Q., Tuan Nguyen, A. (2020). PhoBERT: Pre-trained language models for Vietnamese. Findings of the Association for Computational Linguistics: EMNLP 2020. doi: https://doi.org/10.18653/v1/2020.findings-emnlp.92
Le, D.-T., Vu, X.-S., To, N.-D., Nguyen, H.-Q., Nguyen, T.-T., Le, L. et. al. (2020). ReINTEL: A multimodal data challenge for responsible information identification on social network sites. arXiv. doi: https://doi.org/10.48550/arXiv.2012.08895
Molina, M. D., Sundar, S. S., Le, T., Lee, D. (2019). “Fake News” Is Not Simply False Information: A Concept Explication and Taxonomy of Online Content. American Behavioral Scientist, 65 (2), 180–212. doi: https://doi.org/10.1177/0002764219878224
Miller, T., Howe, P., Sonenberg, L. (2017). Explainable AI: Beware of inmates running the asylum or: How I learnt to stop worrying and love the social and behavioural sciences. arXiv. doi: https://doi.org/10.48550/arXiv.1712.00547
Chadwick, A., Stanyer, J. (2021). Deception as a Bridging Concept in the Study of Disinformation, Misinformation, and Misperceptions: Toward a Holistic Framework. Communication Theory, 32 (1), 1–24. doi: https://doi.org/10.1093/ct/qtab019
Zhou, X., Wu, J., Zafarani, R. (2020). SAFE: Similarity-Aware Multi-modal Fake News Detection. Lecture Notes in Computer Science, 354–367. doi: https://doi.org/10.1007/978-3-030-47436-2_27
Zhou, X., Zafarani, R. (2019). Network-based Fake News Detection. ACM SIGKDD Explorations Newsletter, 21 (2), 48–60. doi: https://doi.org/10.1145/3373464.3373473
Kollias, D., Zafeiriou, S. (2021). Exploiting Multi-CNN Features in CNN-RNN Based Dimensional Emotion Recognition on the OMG in-the-Wild Dataset. IEEE Transactions on Affective Computing, 12 (3), 595–606. doi: https://doi.org/10.1109/taffc.2020.3014171
Elhadad, M. K., Li, K. F., Gebali, F. (2019). A Novel Approach for Selecting Hybrid Features from Online News Textual Metadata for Fake News Detection. Lecture Notes in Networks and Systems, 914–925. doi: https://doi.org/10.1007/978-3-030-33509-0_86
Keeling, R., Chhatwal, R., Huber-Fliflet, N., Zhang, J., Wei, F., Zhao, H. et. al. (2019). Empirical Comparisons of CNN with Other Learning Algorithms for Text Classification in Legal Document Review. 2019 IEEE International Conference on Big Data (Big Data). doi: https://doi.org/10.1109/bigdata47090.2019.9006248
Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi: https://doi.org/10.3115/v1/d14-1181
Yu, Y., Si, X., Hu, C., Zhang, J. (2019). A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Computation, 31 (7), 1235–1270. doi: https://doi.org/10.1162/neco_a_01199
Ketkar, N. (2017). Introduction to Keras. Deep Learning with Python, 97–111. doi: https://doi.org/10.1007/978-1-4842-2766-4_7

Development of a fake news detection tool for Vietnamese based on deep learning techniques

Authors

DOI:

Keywords:

Abstract

Supporting Agency

Author Biographies

Trung Hung Vo, University of Technology and Education - The University of Danang

Thi Le Thuyen Phan, FPT University

Khanh Chi Ninh, The University of Danang

References

Downloads

Published

How to Cite

Issue

Section

License

Language

Information

Make a Submission

Developed By