首先分清聚類和分類的區(qū)別:
分類——監(jiān)督學(xué)習(xí)算法,需要給定訓(xùn)練數(shù)據(jù)
聚類——無監(jiān)督學(xué)習(xí)算法,無訓(xùn)練數(shù)據(jù)。
聚類分為 層次方法和非層次方法:
層次方法——最后形成一棵tree,每個(gè)node或者有k個(gè)分支,或者是葉子節(jié)點(diǎn)。( 過程似huffman tree)
非層次方法——是一個(gè)迭代過程,直至滿足某個(gè)閥值退出。(主要包括k-mean 和 EM算法)
k-mean算法的步驟:(每個(gè)樣本只能屬于一個(gè)聚類)
1)隨機(jī)選出k個(gè)centroid(質(zhì)心)
2)將每個(gè)樣本分配給與之距離最近的centroid
3)重新計(jì)算centroid
4)重復(fù)2) 3)直至centroid所對應(yīng)的set不變
EM算法:(每個(gè)樣本可以屬于不同的聚類,但概率不同)
理論基礎(chǔ)—計(jì)算極大似然估計(jì),需要求似然函數(shù)的極值。輸入是樣本,但這個(gè)樣本并不是完整數(shù)據(jù),它只是觀測數(shù)據(jù)。
完整數(shù)據(jù)(Z)包括觀測數(shù)據(jù)(X)和未知數(shù)據(jù)(Y)。
log似然函數(shù):
? ( θ; Z ) = log p( Z|
θ
)
=
log p( X,Y|
θ
)-----(1)
它的步驟跟EM所代表的單詞有關(guān),
<wbr><wbr><wbr><wbr>E——expectation,<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>已知: 觀測數(shù)據(jù)X,<span style="word-wrap:normal; word-break:normal; line-height:17px; font-size:11pt">參數(shù)θ的當(dāng)前值</span><span style="word-wrap:normal; word-break:normal; line-height:17px; font-size:11pt">θt</span><br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><span style="word-wrap:normal; word-break:normal; line-height:17px; font-size:11pt">未知:Y<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>對公式(1)求Y的期望,</wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></span>得表達(dá)式Q(<span style="word-wrap:normal; word-break:normal; line-height:17px; font-size:11pt">θ,</span><span style="word-wrap:normal; word-break:normal; line-height:17px; font-size:11pt">θt</span>), Q中不含有變量Y。<br><wbr><wbr><wbr><wbr>M——maximization,對E步得到的Q(<span style="word-wrap:normal; word-break:normal; line-height:17px; font-size:11pt">θ,</span><span style="word-wrap:normal; word-break:normal; line-height:17px; font-size:11pt">θt</span>)求極大值,<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><span style="word-wrap:normal; word-break:normal; line-height:17px; font-size:11pt">即θt+1 =</span>argmax Q(<span style="word-wrap:normal; word-break:normal; line-height:17px; font-size:11pt">θ,</span><span style="word-wrap:normal; word-break:normal; line-height:17px; font-size:11pt">θt )</span><br><wbr><wbr><wbr><wbr>每次參數(shù)更新會(huì)增加非完整似然函數(shù)值<br><wbr><wbr><wbr><wbr>重復(fù)E、M兩步,直至似然函數(shù)收斂到局部極大值。<br><br> EM會(huì)收斂到局部極值,但不保證收斂到全局最優(yōu);對初值很敏感,需要一個(gè)好的、快速的初始化過程。<br></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr>
<wbr></wbr>
<wbr></wbr>
k-mean聚類算法如下:
1.<wbr><wbr><wbr><wbr><wbr><wbr>從數(shù)據(jù)點(diǎn)中,隨機(jī)選取k個(gè)數(shù)據(jù)中心作為初始的聚類中心。例如k=3,則選擇3個(gè)數(shù)據(jù)點(diǎn)</wbr></wbr></wbr></wbr></wbr></wbr>
2.<wbr><wbr><wbr><wbr><wbr><wbr>分別計(jì)算每一個(gè)點(diǎn)到k個(gè)中心點(diǎn)的距離(本文計(jì)算的是歐式距離),如果當(dāng)前計(jì)算的數(shù)據(jù)點(diǎn)離第i個(gè)(i=1,2,…,k)中心點(diǎn)最近,則把當(dāng)前點(diǎn)歸到第i類.</wbr></wbr></wbr></wbr></wbr></wbr>
3.<wbr><wbr><wbr><wbr><wbr><wbr>重新計(jì)算k個(gè)聚類中心點(diǎn)。計(jì)算方式如下,如果第i類有n個(gè)數(shù)據(jù)點(diǎn),則第i類新的中心為:</wbr></wbr></wbr></wbr></wbr></wbr>
<wbr><a target="_blank" style="text-decoration:none; color:rgb(27,113,155)"></a><br><a target="_blank" style="text-decoration:none; color:rgb(27,113,155)"><img src="http://s3.sinaimg.cn/bmiddle/48b8276ftc1ef9044f7c3&690" name="image_operate_65101339134613281" alt="分類與聚類" title="分類與聚類" style="margin:0px; padding:0px; border:0px; list-style:none"></a><br><br><br></wbr>
<wbr><wbr><wbr><wbr>4.如果新的聚類中心跟上一次的聚類中心比較變化小于某值算法結(jié)束,否則轉(zhuǎn)到第二步。</wbr></wbr></wbr></wbr>
聚類結(jié)果如下:
<wbr></wbr>
代碼如下:http://download.csdn.net/source/3374443
load gaussdata; %由于我先前生成好了,直接load進(jìn)來
maxiter = 50;<wbr><wbr>%設(shè)定最大頻數(shù)</wbr></wbr>
iter = 1;
num = size(X,2);<wbr>%num為數(shù)據(jù)點(diǎn)個(gè)數(shù)</wbr>
index = randperm(num); %產(chǎn)生1到num個(gè)數(shù)字的一個(gè)隨機(jī)排列
center = X(:,index(1:3)); %選擇出3個(gè)初始的聚類中心
?nter = X(:,1:3);
old = center;<wbr><wbr><wbr><wbr><wbr><wbr>%記錄舊的聚類中心</wbr></wbr></wbr></wbr></wbr></wbr>
<wbr></wbr>
hold on ;
plot(X(1,:),X(2,:),'g.');<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>%繪制數(shù)據(jù)點(diǎn)</wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr>
plot(center(1,:),center(2,:),'ro'); %繪制數(shù)據(jù)中心
title(num2str(iter));<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>%顯示迭代步數(shù)</wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr>
hold off;
xnum = size(X,2);
cdim = size(center,2);
while iter<=maxiter
<wbr><wbr>clf;</wbr></wbr>
<wbr><wbr>hold on;</wbr></wbr>
<wbr><wbr>5-39行計(jì)算每一個(gè)點(diǎn)到k個(gè)中心的距離,一個(gè)很神奇的技巧,自己想吧,呵呵</wbr></wbr>
<wbr><wbr>sumX = sum(X.^2,1);</wbr></wbr>
<wbr><wbr>sumC = sum(center.^2,1);</wbr></wbr>
<wbr><wbr>XY = (2*X'*center)';</wbr></wbr>
<wbr><wbr>distance = repmat(sumX,cdim,1)+repmat(sumC',1,xnum)-XY;</wbr></wbr>
<wbr><wbr>[v,idx] = min(distance,[],1);<wbr><wbr>%求出數(shù)據(jù)點(diǎn)到哪一個(gè)中心的距離最近</wbr></wbr></wbr></wbr>
<wbr><wbr>Y = idx;<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>%對數(shù)據(jù)點(diǎn)進(jìn)行分類</wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr>
<wbr><wbr>idx1 = find(idx==1);</wbr></wbr>
<wbr><wbr>idx2 = find(idx==2);</wbr></wbr>
<wbr><wbr>idx3 = find(idx==3);</wbr></wbr>
<wbr><wbr>%下面三行計(jì)算出新的聚類中心</wbr></wbr>
<wbr><wbr>center(:,1) = mean(X(:,idx1),2);</wbr></wbr>
<wbr><wbr>center(:,2) = mean(X(:,idx2),2);</wbr></wbr>
<wbr><wbr>center(:,3) = mean(X(:,idx3),2);</wbr></wbr>
<wbr><wbr>title(num2str(iter));</wbr></wbr>
<wbr><wbr>plot_data(X,Y,center);</wbr></wbr>
<wbr><wbr>hold off;</wbr></wbr>
<wbr><wbr>pause(0.1);</wbr></wbr>
<wbr><wbr>error = sum((center(:,1)-old(:,1)).^2,1);<wbr><wbr><wbr>%計(jì)算迭代中止條件</wbr></wbr></wbr></wbr></wbr>
<wbr><wbr>if error<0.000001</wbr></wbr>
<wbr><wbr><wbr><wbr><wbr>break;</wbr></wbr></wbr></wbr></wbr>
<wbr><wbr>end</wbr></wbr>
<wbr><wbr>old = center;</wbr></wbr>
<wbr><wbr>iter = iter+1;</wbr></wbr>
end
更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號聯(lián)系: 360901061
您的支持是博主寫作最大的動(dòng)力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點(diǎn)擊下面給點(diǎn)支持吧,站長非常感激您!手機(jī)微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點(diǎn)擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對您有幫助就好】元
