基因表現量之分析方法比較及其在心理影響癌症病人存活時間預測之應用
摘要
 醫學分辨癌組織與正常組織,可作為醫師後續癌症病人治療的依據,分辨癌 組織的方法包括組織的影像辨識、組織切片的病理檢查、或者由組織的基因表現 量資料集之統計顯著差異基因來分辨是否為癌組織。本論文第一個醫學應用為癌 組織與正常組織的分群與分類,其中有兩個肺部組織資料集、兩個胰臟組織資料 集及一個白血病資料集,採用K-means 做分群及perceptron 做分類。我們取出各 個資料集的癌組織與正常組織400 個差異基因,再使用這400 個差異基因,做癌 組織與正常組織的分群與分類,結果perceptron 的分類平均正確率(99.6%)比 K-means 的分群平均正確率(91.7%)高。而我們使用Shannon entropy 取代尤拉 (Euclidean)距離,提高了K-means 分群癌組織與正常組織的平均正確率,平均正 確率從91.7%提高至93.4%。 本論文的第二個醫學應用,我們希望以科學的證據來證明心理影響健康,包 括心理會影響癌症病人的存活時間。寂寞之心理因素影響人類的存活時間建立在 流行病理學的基礎上,尚未發展以基因體為基礎的研究,我們第一個以心理相關 的基因指出心理因素會影響各種癌症病人的存活時間。 我們使用統計方法找出高寂寞的群體相對於低寂寞的群體之心理相關差異 基因,再應用這些與心理相關的基因,使用醫學的Cox 危險比例迴歸方法來證 實心理相關基因影響癌症病人的存活時間。我們的研究證實高危險分數的癌症病 人比低危險分數的癌症病人之平均存活時間較短,然後我們使用Kaplan-Meier 存活曲線驗證三個腦癌病人資料庫(樣本數分別為77,85,和191 人),高危險分數 的腦癌病人與低危險分數的腦癌病人之存活曲線明顯地分開,而三個腦癌病人資 料庫的高危險分數的腦癌病人比低危險分數的腦癌病人之統計誤差值p-value 都 小於0.0001。 我們進一步驗證了骨癌病人、肺癌病人、卵巢癌病人、血癌病人、最後為淋 巴癌病人。我們結果顯示寂寞相關的基因對於各種癌症病人有很好的存活預測能 iv 力,尤其對於骨癌病人。藉由各種癌症病人存活時間的基因資料庫,我們的論文 是第一個利用基因闡述寂寞相關基因與各種癌症病人的存活時間有關聯性。 做癌組織與正常組織的差異性基因分析使用五種統計分析方法,結果 Student’s t-test 的統計方法所找出的400 個差異性基因,做5 個資料集的癌組織 分群結果,平均正確率為95.3%,統計方法AUC 的平均正確率為93.2%;而做 心理影響癌症病人存活時間研究使用五種統計方法,找出高寂寞群體相對於低寂 寞群體的差異基因,來預測癌症病人的存活,結果AUC 的統計方法於8 個癌症 病人資料集的平均危險比率(Hazard ratio=3.875),比其他統計方法計算出來的平 均危險比率較高。 關鍵字: 基因表現量、統計顯著差異、K -means 分群、寂寞、Cox 危險比例迴歸、 Kaplan-Meier 存活曲線、危險比率(Hazard ratio) 、統計分析

關鍵字:手部辨識、手部特徵擷取、放射狀基底函數網路、人機互動、虛擬實驗

 

 

The Comparisons of Gene Expression Profile Analysis Methods and Their Applications in the Identification of the Loneliness-Associated Genes for Survival Prediction in Cancer Patients
Abstract
Based on the diagnosis result of clustering cancer and normal tissues, doctors can go further the cancer treatment for the patients. By means of the medical image recognition, the histopathology of the surgical biopsy, or identifing the statistically differential genes with gene expression profiles to achieve the medical application of clustering cancer and normal tissues. First medical application of the dissertation, we clustered cancer and normal tissues by K-means and classified cancer and normal tissues by the perceptron model for two datasets of lung tissues, two datasets of pancreas, and one leukemia dataset. We derived out 400 differential genes between cancer and normal tissues in each datasets. Then we clustered and classified the cancer and normal tissues using the the 400 differential genes. The mean accuracy of classifying cancer and normal tissues by the perceptron model was 99.6% which was higher than that of 91.7% mean accuracy by the K-means. Moreover, we improved the mean accuracy of clustering cancer and normal tissues by K-means with the Shannon entropy (93.4%) instead of Euclidean distance (91.7%). The second medical application, we wanted to find the scientific evidence to prove that psychology influenced health, including the survival time of cancer patients. The psychological factor of loneliness influenced on human survival which was established in the epidemiologically, but genomic research was undeveloped. We applied statistical methods to get the loneliness-associated genes between the high lonely and low lonely groups. With the loneliness-associated genes, we made use of Cox proportional hazards regression to prove that the psychological factor of loneliness influenced on the survival time of different kinds of cancer patients. We verified that the high-risk score of cancer patients have shorter mean survival time than the low-risk score of cancer patients. After that we validated the loneliness-associated gene signature in three independent brain cancer cohorts with Kaplan-Meier survival curves (n=77, 85, and 191). Kaplan-Meier survival curves of ii the log-rank test in brain cancer cohorts were significantly separable and had hazard ratio (HR) >1, p-value <0.0001 with log-rank test. Moreover, we testified the loneliness-associated gene signature in the bone cancer cohort, lung cancer cohort, ovarian cancer cohort, and leukemia cohort. The last lymphoma cohort was also proved. The loneliness-associated genes had good survival prediction for different kinds of cancer patients, especially bone cancer patients. In addition, our study furnished the first indication that the psychological factor of loneliness influenced on the survival time in different kinds of cancer patients with genome transcription. We employed statistical methods of Student's t-test, area under the Receiver Operating Characteristic (ROC) - we called it as AUC, Wilcoxon test, Cheronoff bound, and relative entropy to find out the statistically significant difference genes between cancer and normal tissues in our first medical application. And we found out the loneliness-associated genes with the five statistical methods to predict survival of cancer patients in our second medical application In our experiments of clustering cancer and normal tissues, the statistical method of Student’s t-test to figure out the genes of significant difference between cancer and normal tissues that resulted 95.3% mean accuracy in clustering cancer and normal tissues, and 93.2% mean accuracy by AUC. Next in the section of psychological factor of loneliness influenced on the survival time of cancer patients, we got the highest average value (3.875) of hazard ratio for 8 cancer corhorts by the AUC method to identify the loneliness-associated genes to predict survival in cancer patients. Key words: statistically significant genes, gene expression profiles, K-means, Shannon entropy, loneliness, Cox proportional hazards regression, cancer cohorts, Kaplan-Meier survival curve, hazard ratio, Receiver Operating Characteristic (ROC), Student’s t-test , Wilcoxon test