防治垃圾郵件的方式從黑名單比對、內容過濾、阻斷IP位址等技術,直到最新的智慧型防禦引擎,反垃圾郵件技術不斷翻新,然而,很少能百分之百杜絕垃圾信。本論文提出一個階層式二次連結類神經網路(a quadratic-neuron-based neural tree, QUANT) ,結合了決策樹與類神經網路的優點,利用二次連結的神經元,能找出資料在高維度間的關係,除可有效保留舊有資料的特徵,並能同時吸收新型的變種郵件,達到部分漸進式的學習的效果;這樣的郵件過濾系統,除了能有效防堵既有的垃圾郵件,並能適應新型郵件特性之挑戰。
A Neural Tree with Partial Incremental Learning Capability and Its Application in Spam Filtering
People have been struggling with spam for 10 years and more. E-mail’s ubiquitous, no-cost ease of use encourages “bombing,” “flaming,” and other forms of abuse. E-mail messages that bear embedded and attached viruses, or ill-behaved or malevolent executables, can wreak havoc on computers. The standard techniques filtering spam are black-listing, ip-tracing, content-filtering and etc. The trouble is, neither of these traditional techniques works particularly very well. In this thesis, a new approach to constructing a neural tree with partial incremental learning capability is presented.
The proposed neural tree, called a quadratic-neuron-based neural tree (QUANT), is a tree structured neural network composed of neurons with quadratic neural-type junctions for pattern classification. The proposed QUANT integrates the advantages of decision trees and neural networks. Via a batch-mode training algorithm, the QUANT grows a neural tree containing quadratic neurons in its nodes. These quadratic neurons recursively partition the feature space into hyper-ellipsoidal-shaped sub-regions. The QUANT has the partial incremental capability so that it does not need to re-construct a new neural tree to accommodate new training data whenever new data are introduced to a trained QUANT.
To demonstrate the performance of the proposed QUANT, a design of spam filter was tested. The spam filter is able to learn new type of spam mail besides keeping the property of existed mail. The spam filter can both prevent the existed spam and adapt itself to the new one.
Keywords: neural tree, decision tree, incremental learning, pattern recognition