

KNN method

(1)Table data

如下有一表格資料 groupData (92X3)

step 1: 拆成訓練資料與測試資料



step 2: 訓練模型

Construct a k-nearest neighbor classifier named 'mdl' where k = 3. Train the model using 'dataTrain' with the response variable 'group'.

mdl = fitcknn(dataTrain,"group","NumNeighbors",3); 

step 3: 測試模型 

Predict the groups for the data in 'dataTest'. Assign the result to a variable named 'pg3'.

pg3 = predict(mdl,dataTest); 

step 4: 可視化分類模型


Use 'plotGroup' to plot the predicted groups for 'dataTest' with the marker "o". Use hold on and hold off to add this plot to the existing figure.

hold on


hold off   

step 5: 驗證模型

一種確認模型是否能正確分類的方法是,採用 loss 函數,即誤報率。

Notice the point near (7,1) which has been misclassified. The true class (x marker) is different from the predicted class (o marker). What proportion of points have been misclassified? 

Calculate the loss for the test data 'dataTest', and assign it to a variable named 'err3'. 

err3 = loss(mdl,dataTest) 

step 6: 修改模型

可以試著調整 nearest neighbors 來改善模型

mdl.NumNeighbors = 10;

step 7: 重新測試與可視化模型

pg10 = predict(mdl,dataTest);

hold on


hold off

(2)Array data

與 table 資料類似,但現在改成陣列來處理

step 1: 拆成訓練資料與測試資料 

step 2: 訓練模型

mdl = fitcknn(coordTrain,groupTrain,"NumNeighbors",3);

step3 : 測試模型

predGroups = predict(mdl,coordTest);

step4: 驗證模型

mdlLoss = loss(mdl,coordTest,groupTest)

Decision Trees


This code loads and formats the data.

heartData = readtable("heartDataNum.txt");

heartData.HeartDisease = categorical(heartData.HeartDisease);

This code partitions the data into training and test sets.

pt = cvpartition(heartData.HeartDisease,"HoldOut",0.3);

hdTrain = heartData(training(pt),:);

hdTest = heartData(test(pt),:);

Create a classification tree model named mdl using the training data 'hdTrain'

 mdl = fitctree(hdTrain,"HeartDisease");

Change the level of branching of mdl to 3. 

mdl = prune(mdl,"Level",3)


errTrain = resubLoss(mdl);

errTest = loss(mdl,hdTest);

disp("Training Error: " + errTrain)

disp("Test Error: " + errTest)

Naïve Bayes


If we assume the data set comes from an underlying distribution, we can treat the data as a statistical sample. This can reduce the influence of the outliers on our model.

A naïve Bayes classifier assumes the independence of the predictors within each class. This classifier is a good choice for relatively simple problems.

Discriminant Analysis

判別分析,又稱為線性判別分析(Linear Discriminant Analysis),是利用已知類別的樣本建立判別模型,為未知類別的樣本判別的一種統計方法。

Similar to naive Bayes, discriminant analysis works by assuming that the observations in each prediction class can be modeled with a normal probability distribution. However, there is no assumption of independence in each predictor. Hence, a multivariate normal distribution is fitted to each class.

Linear Discriminant Analysis

By default, the covariance for each response class is assumed to be the same. This results in linear boundaries between classes.

daModel = fitcdiscr(dataTrain,"ResponseVarName")

Quadratic Discriminant Analysis

Removing the assumption of equal covariances results in a quadratic boundary between classes. Use the "DiscrimType" option to do this: 

daModel = fitcdiscr(dataTrain,"ResponseVarName","DiscrimType","quadratic")

Support Vector Machines

A Support Vector Machine (SVM) algorithm classifies data by finding the "best" hyperplane that separates all data points.

Step1: process data

如下程式碼,已經將一組透過cvpartition 將資料分類

load groups

cvpt = cvpartition(groupData.group,"Holdout",0.35);

dataTrain = groupData(training(cvpt),:);

dataTest = groupData(test(cvpt),:);

Step2: 訓練模式與驗證

mdlSVM = fitcsvm(dataTrain,"group","KernelFunction","gaussian");

errSVM = loss(mdlSVM,dataTest)

Step3: 可視化資料

predGroups = predict(mdlSVM,dataTest);


hold on


hold off

其中,function plotGroup 如下:

function plotGroup(data,grp,mkr) 


 % Plot data by group 

 colors = colororder; 

 p = gscatter(data.x,data.y,grp,colors([1 2 4],:),mkr,9); 

 % Format plot 

 [p.LineWidth] = deal(1.5); 


 xlim([-0.5 10.5]) 

 ylim([-0.5 10.5])



Step1: process data 

load points


cvpt = cvpartition(points.group,"Holdout",0.38);

trainPoints = points(training(cvpt),:);

testPoints = points(test(cvpt),:);


Step2: 訓練模式與驗證繪圖 

mdl = fitcsvm(trainPoints,"group","KernelFunction","polynomial");

mdlLoss = loss(mdl,testPoints)

predGroups = predict(mdl,testPoints);


hold on


hold off


Multiclass Support Vector Machine Models

perform multiclass SVM classification by creating an error-correcting output codes (ECOC) classifier.


Suppose there are three classes in the data. By default, the ECOC model reduces the model to multiple, binary classifiers using the one-vs-one design.

The combination of the resulting classifiers is used for prediction.

Creating a multiclass SVM model is a two-step process.

如˙下一組資料,欲進行Multiclass Support Vector Machine Models

load groups


cvpt = cvpartition(groupData.group,"Holdout",0.35);

dataTrain = groupData(training(cvpt),:);

dataTest = groupData(test(cvpt),:);


Create a multiclass SVM classifier named 'mdlMSVM' using the training data 'dataTrain' and the response variable 'group'. Calculate the classification loss for the test data 'dataTest', and name it 'errMSVM'.

mdlMSVM = fitcecoc(dataTrain,"group");

errMSVM = loss(mdlMSVM,dataTest)

predGroups = predict(mdlMSVM,dataTest);


hold on


hold off

上圖的分類並不成功,loss大約0.1588,create a template learner by passing the property name-value pairs to the ‘templateSVM’ function.

t = templateSVM("PropertyName",PropertyValue);

Then provide the template learner to the ‘fitcecoc’ function as the value for the Learners property.

mdl = fitcecoc(tableData,"Response", "Learners",t);


Create 'mdlMSVM' so that it uses the value "polynomial" for the "KernelFunction" property. Calculate the loss errMSVM for the new model.

template = templateSVM("KernelFunction","polynomial");

mdlMSVM = fitcecoc(dataTrain,"group","Learners",template);

errMSVM = loss(mdlMSVM,dataTest)


下表為一組乳腺癌圖像數字化的資料, "diagnosis" 為 response variable

step 1: 讀取與分割資料

bcDiag = readtable("breastCancerData.txt");

bcDiag.diagnosis = categorical(bcDiag.diagnosis);

pt = cvpartition(bcDiag.diagnosis,"HoldOut",0.3);

bcTrain = bcDiag(training(pt),:);

bcTest = bcDiag(test(pt),:)

step 2: 訓練模型與計算misclassification loss

mdl = fitcknn(bcTrain,"diagnosis","NumNeighbors",5);

errRate = loss(mdl,bcTest)

step 3: 計算假陰性與繪製confustion chart

False negatives (假陰性) 代表診斷是惡性但預測為良性

p = predict(mdl,bcTest);

falseNeg = mean((bcTest.diagnosis == "M") & (p == "B"))


