Improving Myanmar Image Caption Generation Using NASNetLarge and Bi-directional LSTM

San Pa Pa Aung,Win Pa Pa,Tin Lay Nwe

2023 IEEE Conference on Computer Applications (ICCA)(2023)

引用 0|浏览0
暂无评分
摘要
The main objective of this paper is to improve the automatic Myanmar captions by learning the contents of images using NASNetLarge and Bi-LSTM model. Describing the contents of an image is a complex task for machine without human intervention. Computer Vision and Natural Language Processing are widely used to tackle this problem. This paper proposed a deep learning-based Myanmar image captioning system which used a NASNetLarge feature extraction model of CNN as an encoder and a deep Recurrent Neural Network (RNN) with Bi-directional Long Short-Term Memory (LSTM) as a decoder. For corpus construction, we created and annotated the Myanmar image captions corpus (consists of over 40k Myanmar sentences), which is based on Flickr8k dataset. Furthermore, two different types of segmentations such as word segmentation level and syllable segmentation level are studied in text preprocessing step. In this work, the proposed Bi-directional LSTM model is compared with LSTM, GRU as well as the baseline model. Experiments on the updated dataset is presented that all of our models using syllable segmentation give higher and comparable BLEU scores than word segmentation for Myanmar image captioning system. NASNetLarge with Bi-directional LSTM model using syllable segmentation approach achieved the highest BLEU-4 score 40.05% which is 12.5% better than word segmentation in this work and 15.67% BLEU-4 score better than our previous work.
更多
查看译文
关键词
NASNetLarget,Recurrent Neural Network,Long Short-Term Memory,Gated Recurrent Unit
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要