A study on deep learning for Vietnamese text classification
212 viewsDOI:
https://doi.org/10.54939/1859-1043.j.mst.95.2024.85-94Keywords:
Deep learning; Text classification; LSTM; CNN.Abstract
Text categorization aims to automatically assign given text passages or documents to predetermined categories or subjects. Despite the wide array of techniques employed in classifying English text, there remains a dearth of research on Vietnamese text classification. This paper introduces a novel approach utilizing a Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) with a deep network structure for Vietnamese text classification. Our findings demonstrate a substantial improvement in classification accuracy when applying deep learning techniques to two Vietnamese news corpus datasets. This study contributes to the advancement of Vietnamese text classification by introducing and demonstrating the efficacy of LSTM and CNN with a deeper network structure. The results offer valuable insights for researchers and practitioners working on text categorization in the Vietnamese language.
References
[1]. P. Komarek, “Logistic regression for data mining and high-dimensional classification”, Carnegie Mellon University, (2004).
[2]. M. N. M. S. a. A. H. O. W. N. H. W. Mohamed, “A comparative study of Reduced Error Pruning method in decision tree algorithms”, in IEEE International Conference on Control System, Computing and Engineering, Penang, (2012).
[3]. C. &. V. V. Cortes, "Support-vector networks," Machine learning, vol. 20, pp. 273-297, (1995). DOI: https://doi.org/10.1007/BF00994018
[4]. L. A. a. F. Tietze, In: World Patent Information , (2018).
[5]. Y. Kim, "Convolutional neural networks for sentence classification," arXiv preprint arXiv:1408.5882, (2014). DOI: https://doi.org/10.3115/v1/D14-1181
[6]. X. J. Z. a. Y. L. Zhang, "Character-level convolutional networks for text classification," Advances in neural information processing systems, vol. 28, (2015).
[7]. S. a. C. S. Moriya, "Transfer learning method for very deep CNN for text classification and methods for its evaluation," 2018 IEEE 42nd annual computer software and applications (COMPSAC), (2018). DOI: https://doi.org/10.1109/COMPSAC.2018.10220
[8]. S. L. X. K. L. a. J. Z. Lai, "Recurrent convolutional neural networks for text classification," in The AAAI conference on artificial intelligence, (2015).
[9]. W. K. D. P. R. a. R. F. M. Sari, "Text classification using long short-term memory," in International Conference on Electrical Engineering and Computer Science (ICECOS), (2019).
[10]. T. H. N. H. N. D. L. T. a. V. T. N. Nguyen, "A hybrid feature selection method for Vietnamese text classification" in IEEE Seventh International Conference on Knowledge and Systems Engineering (KSE), (2015).
[11]. H. T. N. D.-T. D. Q. T. a. H. X. H. Huynh, "Vietnamese text classification with textrank and jaccard similarity coefficient," Adv. Sci. Technol. Eng. Syst 5, vol. 5, no. 6, (2020). DOI: https://doi.org/10.25046/aj050644
[12]. V. C. D. D. D. L. N. N. &. N. H. Q. Hoang, "A comparative study on vietnamese text classification methods," in IEEE international conference on research, innovation and vision for the future, (2007).
[13]. N. M. D. B. N. N. V. D. &. N. T. D. Le, "VNLP: an open source framework for Vietnamese natural language processing," in Proceedings of the 4th Symposium on Information and Communication Technology, (2013).
[14]. N. B. S. L. &. N. K. J. Benjamin Erichson, "Compressed singular value decomposition for image and video processing," in Proceedings of the IEEE International Conference on Computer Vision Workshops, (2017).
[15]. S. &. S. J. Hochreiter, "Long short-term memory. Neural computation," vol. 9, no. 8, pp. 1735-1780, (1997). DOI: https://doi.org/10.1162/neco.1997.9.8.1735
[16]. A. Graves, "Generating sequences with recurrent neural networks.," arXiv preprint arXiv:1308.0850, (2013).
[17]. W. &. J. W. Dai, "A mapreduce implementation of C4. 5 decision tree algorithm. International journal of database theory and application," vol. 7, no. 1, pp. 49-60, (2014). DOI: https://doi.org/10.14257/ijdta.2014.7.1.05
[18]. H. N. a. N. T. M. A. Phat, "Vietnamese text classification algorithm using long short term memory and Word2Vec," Информатика и автоматизация, vol. 19, no. 6, pp. 1255-1279, (2020). DOI: https://doi.org/10.15622/ia.2020.19.6.5