BLEU is the most well-known automatic evaluation technology in assessing the performance of machine translation systems. However, BLEU does not know which parts of the NMT translation results are good or bad. This paper describes the automatic evaluation approach of NMT systems by linguistic test points. This approach allows automatic evaluation of each linguistic test point not shown in BLEU and provides intuitive insight into the strengths and flaws of NMT systems in handling various important linguistic test points. The linguistic test points used for automatic evaluation were 58 and consisted of 630 sentences. We conducted the evaluation of two bidirectional English/Korean NMT systems. BLEUs of English-to-Korean NMT systems were 0.0898 and 0.2081 respectively, and their automatic evaluations by linguistic test points were 58.35% and 77.31%, respectively. BLEUs of Korean-to-English NMT systems were 0.3939 and 0.4512 respectively, and their automatic evaluations by linguistic test points were 33.10% and 40.47%, respectively. This means that the automatic evaluation approach by linguistic test points has similar results as BLEU assessment. According to automatic evaluation by linguistic test points, we know that both English-to-Korean NMT systems and Korean-to-English NMT systems have strengths in polysemy translations, but has flaws in style translations and translations of sentences with complex syntactic structures.
展开▼