基於深度學習與Hybrid N-grams之英文語法錯誤更正系統
摘要
在英文使用者當中有過半數的人是非母語的英文使用者,對於這些人來說如何快速且有效的檢查自己的文章有沒有語法錯誤是一件相當重要的事情。(Natural Language Processing) 一直是計算機科學領域中一門相當重要的議題,文法錯誤更正 (Grammatical Error Correction)是其中的一項主要研究議題之一。這幾年來,已有多種文法錯誤更正的解決方案陸續被提出來,各有其優缺點。 本論文結合深度學習與混和N元語法(Hybrid N-gram)來為文法錯誤更正問題提出另一種解決方案。此解決方案由三種類神經網路所組成:(1)混和N元語法語意分類器、(2)混合N元語法轉換器和(3)混和N元語法反轉換器。此系統會先判斷輸入的英文句子是否具有混和N元語法, 接著,再檢查與更正語法錯誤,最後才反轉換混和N元語法並重組回英文句子。藉此三階段的方式,達到利用混和N元語法檢查英文語法的效果。 本論文將使用StringNet及CoNLL2013兩種資料集,來驗證所題方法之有效性。會針對三種類神經網路,分別進行不同網路結構及資料前處理方法的效果比較及分析。

關鍵字: 英文語法更正、深度學習、混和N元語法、英文語法檢查

 

 

A Grammatical Error Correction System based on the Integration of Deep Learning and Hybrid N-grams
Abstract
More than half of English-speaking users are non-native English speakers. For these people, how to quickly and effectively check whether there are grammatical errors in their articles is quite important. Natural Language Processing has always been a very important topic in the field of computer science. Grammatical Error Correction is one of the main research topics. Over the past few years, different approaches to grammatical error correction have been proposed. Each approach has its own advantages and disadvantages. This thesis tries to combine deep learning with mixed N-grams to propose an alternative solution to the problem of grammatical error correction. This solution consists of three types of neural networks: (1) a hybrid N-gram semantic classifier, (2) a hybrid N-gram grammar converter, and (3) a hybrid N-gram grammar converter. This system will first determine whether an English sentence has a mixed N-gram, then check and correct its grammatical error, and finally transform the corrected N-gram back into its corresponding correct English sentence. In this three-stage way, the effect of using the hybrid N-gram to check the English grammar is achieved. Finally, this thesis will use StringNet and CoNLL2013 data sets to verify the performance of the proposed method. The effects of different network structures and data pre-processing methods will be compared and analyzed for three types of neural networks.

Keywords: grammatical error correction, deep learning, hybrid n-gram, grammatical error detection