Speech translation for Unwritten language using intermediate representation: Experiment for Viet-Muong language pair

Pham Van Dong; Do Thi Ngoc Diep; Mac Dang Khoa; Vu Thi Hai Ha

doi:10.54939/1859-1043.j.mst.CSCE6.2022.65-76

Các tác giả

Pham Van Dong (Tác giả đại diện) Trường Đại học Mỏ - Địa chất
Do Thi Ngoc Diep Đại học Bách khoa Hà Nội
Mac Dang Khoa Viện Nghiên cứu Dữ liệu lớn - VinGroup
Vu Thi Hai Ha Viện Ngôn ngữ học, Viện Hàn lâm Khoa học Xã hội Việt Nam

DOI:

https://doi.org/10.54939/1859-1043.j.mst.CSCE6.2022.65-76

Từ khóa:

Dịch tự động; Tổng hợp tiếng nói; Ngôn ngữ thiểu số; Tiếng Việt; Các phương ngữ tiếng Mường; Ngôn ngữ chưa có chữ viết; Tổng hợp tiếng nói đa ngôn ngữ.

Tóm tắt

Bài báo nghiên cứu một phương pháp dịch tự động từ văn bản của một ngôn ngữ (L1) sang tiếng nói của một ngôn ngữ chưa có chữ viết (L2). Thông thường, văn bản đã viết được sử dụng làm cầu nối để kết nối một mô-đun dịch chuyển từ văn bản của L1 sang văn bản của L2 và một mô-đun tổng hợp tạo ra tiếng nói của L2 từ văn bản. Trong trường hợp ngôn ngữ không có chữ viết, một biểu diễn trung gian phải được sử dụng thay cho chữ viết của L2. Bài báo này đề xuất việc sử dụng biểu diễn âm vị vì mối quan hệ mật thiết giữa âm vị và lời nói trong một ngôn ngữ. Phương pháp đề xuất được áp dụng cho cặp ngôn ngữ Việt - Mường. Văn bản tiếng Việt cần được dịch sang tiếng Mường ở hai phương ngữ là Mường Bi - Hòa Bình và Mường Tân Sơn - Phú Thọ, đều chưa có chữ viết. Bài báo cũng đề xuất bộ âm vị cho mỗi phương ngữ tiếng Mường nêu trên và áp dụng chúng vào bài toán thử nghiệm. Kết quả đánh giá cho thấy chất lượng dịch khá cao ở cả hai phương ngữ (đối với Mường Bi, điểm lưu loát là 4.63/5.0 và điểm đầy đủ là 4.56/5.0) và chất lượng tiếng nói tổng hợp ở cả hai phương ngữ cũng khá tốt (đối với Mường Bi, điểm MOS là 4.47/5.0 và điểm hiểu rõ là 93.55%). Kết quả cũng cho thấy khả năng ứng dụng của hệ thống đề xuất đối với các ngôn ngữ chưa có chữ viết khác là đầy hứa hẹn.

Tài liệu tham khảo

[1]. J. Riesa, B. Mohit, K. Knight, and D. Marcu, “Building an English-Iraqi Arabic machine translation system for spoken utterances with limited resources,” in Ninth International Conference on Spoken Language Processing, (2006). DOI: https://doi.org/10.21437/Interspeech.2006-261

[2]. L. Besacier, B. Zhou, and Y. Gao, “Towards speech translation of non written languages,” in 2006 IEEE Spoken Language Technology Workshop, pp. 222–225, (2006).

[3]. G. Adda et al., “Breaking the unwritten language barrier: The BULB project,” Procedia Comput. Sci., vol. 81, pp. 8–14, (2016).

[4]. Y.-F. Cheng, H.-S. Lee, and H.-M. Wang, “AlloST: Low-resource Speech Translation without Source Transcription.” arXiv. (2021). http://arxiv.org/abs/2105.00171 DOI: https://doi.org/10.21437/Interspeech.2021-526

[5]. P. K. Muthukumar and A. W. Black, “Automatic discovery of a phonetic inventory for unwritten languages for statistical speech synthesis,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2594–2598, (2014). DOI: https://doi.org/10.1109/ICASSP.2014.6854069

[6]. O. Scharenborg et al., “Speech Technology for Unwritten Languages,” IEEEACM Trans. Audio Speech Lang. Process., vol. 28, pp. 964–975, (2020). DOI: https://doi.org/10.1109/TASLP.2020.2973896

[7]. V. Đ. Phạm et al., “How to generate Muong speech directly from Vietnamese text: Cross-lingual speech synthesis for close language pair,” J. Mil. Sci. Technol., no. 81, (2022). DOI: https://doi.org/10.54939/1859-1043.j.mst.81.2022.138-147

[8]. N.-H. Doan, “Generation of Vietnamese for French-Vietnamese and English-Vietnamese Machine Translation,” in Proceedings of the 8th European Workshop on Natural Language Generation - Volume 8, Stroudsburg, PA, USA, pp. 1–10 (2001).

[9]. D. Thi Ngoc Diep, L. Besacier, and E. Castelli, “Improved Vietnamese-French Parallel Corpus Mining Using English Language,” in IWSLT, (2010).

[10]. D. Thi-Ngoc-Diep, M. Utiyama, and E. Sumita, “Machine translation from Japanese and French to Vietnamese, the difference among language families,” in 2015 International Conference on Asian Language Processing (IALP), pp. 17–20, (2015).

[11]. T. Duarte, R. Prikladnicki, F. Calefato, and F. Lanubile, “Speech recognition for voice-based machine translation,” IEEE Softw., vol. 31, no. 1, pp. 26–31, (2014). DOI: https://doi.org/10.1109/MS.2014.14

[12]. P. Koehn et al., “Moses: Open source toolkit for statistical machine translation,” in Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, pp. 177–180, (2007). DOI: https://doi.org/10.3115/1557769.1557821

[13]. R. Zens, F. J. Och, and H. Ney, “Phrase-based statistical machine translation,” in Annual Conference on Artificial Intelligence, pp. 18–32, (2002). DOI: https://doi.org/10.1007/3-540-45751-8_2

[14]. K. Cho et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” ArXiv Prepr. ArXiv14061078, (2014). DOI: https://doi.org/10.3115/v1/D14-1179

[15]. I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” Adv. Neural Inf. Process. Syst., vol. 27, (2014).

[16]. M.-T. Luong, I. Sutskever, Q. V. Le, O. Vinyals, and W. Zaremba, “Addressing the rare word problem in neural machine translation,” ArXiv Prepr. ArXiv14108206, (2014). DOI: https://doi.org/10.3115/v1/P15-1002

[17]. R. Sennrich and B. Zhang, “Revisiting low-resource neural machine translation: A case study,” ArXiv Prepr. ArXiv190511901, (2019). DOI: https://doi.org/10.18653/v1/P19-1021

[18]. J. Shen et al., “Natural tts synthesis by conditioning wavenet on mel spectrogram predictions,” in 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4779–4783, (2018). DOI: https://doi.org/10.1109/ICASSP.2018.8461368

[19]. Y. Wang et al., “Tacotron: Towards end-to-end speech synthesis,” ArXiv Prepr. ArXiv170310135, (2017). DOI: https://doi.org/10.21437/Interspeech.2017-1452

[20]. L. Besacier, B. Zhou, and Y. Gao, “Towards speech translation of non written languages,” in 2006 IEEE Spoken Language Technology Workshop, pp. 222–225, (2006). DOI: https://doi.org/10.1109/SLT.2006.326795

[21]. G. Adda et al., “Breaking the Unwritten Language Barrier: The BULB Project,” Procedia Comput. Sci., vol. 81, pp. 8–14, (2016), doi: 10.1016/j.procs.2016.04.023. DOI: https://doi.org/10.1016/j.procs.2016.04.023

[22]. J. Jiang, Z. Ahmed, J. Carson-Berndsen, P. Cahill, and A. Way, “Phonetic representation-based speech translation,” in Proceedings of Machine Translation Summit XIII: Papers, (2011).

[23]. Z. Ahmed, J. Jiang, J. Carson-Berndsen, P. Cahill, and A. Way, “Hierarchical phrase-based mt for phonetic representation-based speech translation,” in Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers, (2012).

[24]. F. Stahlberg, T. Schlippe, S. Vogel, and T. Schultz, “Pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment,” in International Conference on Statistical Language and Speech Processing, pp. 260–272, (2013). DOI: https://doi.org/10.1007/978-3-642-39593-2_23

[25]. S. Palkar, A. W. Black, and A. Parlikar, “Text-To-Speech for Languages without an Orthography,” in Coling, (2012).

[26]. S. Sitaram, S. Palkar, Y.-N. Chen, A. Parlikar, and A. W. Black, “Bootstrapping text-to-speech for speech processing in languages without an orthography,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7992–7996. DOI: https://doi.org/10.1109/ICASSP.2013.6639221

[27]. S. Sitaram, G. K. Anumanchipalli, J. Chiu, A. Parlikar, and A. W. Black, “Text to speech in new languages without a standardized orthography,” in Proceedings of 8th Speech Synthesis Workshop, Barcelona, (2013).

[28]. Ban chỉ đạo Tổng điều tra dân số và nhà ở Trung ương, "Tổng điều tra dân số và nhà ở Việt Nam năm 2009: Kết quả toàn bộ". Hà Nội: Nxb Thống kê, (2010), (in Vietnamese).

[29]. Nguyễn Văn Tài, "Ngữ âm tiếng Mường qua các phương ngôn". Hà Nội: Nxb Từ điển Bách khoa, (2005), (in Vietnamese).

[30]. Trần Trí Dõi, "Một vài vấn đề nghiên cứu so sánh - lịch sử nhóm ngôn ngữ Việt - Mường". Hà Nội: Nxb Đại học Quốc gia Hà Nội, (2011) , (in Vietnamese).

[31]. Nguyễn Kim Thản, “Vài nét về hệ thống âm vị tiếng Mường và phương án phiên âm tiếng Mường,” Ngôn Ngữ, vol. 1, (1971), (in Vietnamese).

[32]. M. E. Barker, M. A. Barker, and L. Assessment, “Mường-Vietnamese-English dictionary”, https://www.sil.org/resources/archives/35773

[33]. Nguyễn Như Ý, “Dự thảo phương án chữ Mường.” Tọa đàm Viện Ngôn ngữ học, (1994), (in Vietnamese).

[34]. LDC, “Linguistic data annotation specification: Assessment of fluency and adequacy in translations. Revision 1.5,” (2005).

Dịch tiếng nói cho ngôn ngữ chưa có chữ viết sử dụng biểu diễn trung gian: Thử nghiệm cho cặp ngôn ngữ Việt-Mường

Các tác giả

DOI:

Từ khóa:

Tóm tắt

Tài liệu tham khảo

Tải xuống

Đã Xuất bản

Cách trích dẫn

Số

Chuyên mục

ISSN: 1859-1043

Ngôn ngữ

Gửi bài mới

Indexed by

Thông tin

Visitors

GTM