国产成人99,奇米影视第四色在线观看,日韩免费av一区二区

/*先把標(biāo)題給寫了，這樣就能經(jīng)常提醒自己*/

　　決策樹是一種容易理解的分類算法，它可以認(rèn)為是if-then規(guī)則的一個(gè)集合。主要的優(yōu)點(diǎn)是模型具有可讀性，且分類速度較快，不用進(jìn)行過多的迭代訓(xùn)練之類。決策樹學(xué)習(xí)通常包括3個(gè)步驟：特征選擇、決策樹的生成和決策樹的修剪。比較常用到的算法有ID3、C4.5和CART。

1. 決策樹模型

　　決策樹是一種樹形結(jié)構(gòu)的分類模型，它由結(jié)點(diǎn)和有向邊組成，結(jié)點(diǎn)分為內(nèi)部結(jié)點(diǎn)和葉結(jié)點(diǎn)，內(nèi)部結(jié)點(diǎn)表示一個(gè)特征或?qū)傩?，葉結(jié)點(diǎn)表示一個(gè)類。

決策樹的分類即是從樹的根節(jié)點(diǎn)開始對實(shí)例的某一個(gè)特征進(jìn)行判斷，通過內(nèi)部結(jié)點(diǎn)逐步下潛到葉結(jié)點(diǎn)的過程。

2. 特征選擇

　　特征選擇在于選取對訓(xùn)練數(shù)據(jù)具有分類能力的特征，通常的選擇準(zhǔn)則是信息增益或信息增益率。為了便于說明，書中給出了一個(gè)例子

希望通過所給的訓(xùn)練數(shù)據(jù)學(xué)習(xí)一個(gè)貸款申請的決策樹，當(dāng)新客戶提出貸款申請時(shí)，根據(jù)申請人的特征決定是否可貸。

????? 從認(rèn)知上個(gè)人覺得特征的選擇就是找出一些具有代表性，對于分類辨識(shí)度高的特征，如此能夠快速準(zhǔn)確的為實(shí)例分類，從數(shù)學(xué)的角度上來講，就要涉及到信息論與概率統(tǒng)計(jì)中的熵了。在此不贅述太多，直接給出特征選擇的算法（信息增益）。

????? 輸入：訓(xùn)練數(shù)據(jù)集D和特征A；

????? 輸出：特征A對訓(xùn)練數(shù)據(jù)集D的信息增益和增益率

（1）?? 計(jì)算數(shù)據(jù)集D的經(jīng)驗(yàn)熵

（2）?? 計(jì)算特征A的經(jīng)驗(yàn)條件熵

（3）?? 計(jì)算信息增益

（4）?? 信息增益率

????? 對于書中的例子，首先計(jì)算經(jīng)驗(yàn)熵

然后計(jì)算各特征的信息增益，分別以表示年齡、有工作、有房子和信貸情況4個(gè)特征，則

　　　　? ?

分別計(jì)算的信息增益，由于的信息增益值最大，則選擇其為最優(yōu)特征，當(dāng)然也可以計(jì)算出信息增益率的結(jié)果作為選擇的依據(jù)。

3. 決策樹的生成

ID3和C4.5算法基本上一樣，只是在特征選擇的依據(jù)上C4.5采用了改進(jìn)后的信息增益率。因?yàn)楸疚闹唤榻B其中的ID3算法即可。?

ID3算法步驟

輸入：訓(xùn)練數(shù)據(jù)集D，特征集A，閾值e

輸出：決策樹T

（1）?? 若D中所有實(shí)例屬于同一類Ck，則T為單結(jié)點(diǎn)樹，并將類Ck作為該結(jié)點(diǎn)的類標(biāo)記，返回T；

（2）?? 若A=空，則T為單結(jié)點(diǎn)樹，將D中實(shí)例數(shù)最多的類Ck作為結(jié)點(diǎn)類標(biāo)記，返回T；

（3）?? 否則，計(jì)算A中各特征對D的信息增益，選擇信息增益值最大的特征Ag；

（4）?? 如果Ag的信息增益小于閾值e，則T為單結(jié)點(diǎn)樹，將D中最多的類Ck作為結(jié)點(diǎn)類標(biāo)記，返回T；

（5）?? 否則，對Ag的每一可能值ai，依Ag=ai將D分割為若干子集Di，將Di中實(shí)例數(shù)最大多的類作為類標(biāo)記，構(gòu)建子結(jié)點(diǎn)，由結(jié)點(diǎn)及其子結(jié)點(diǎn)構(gòu)成樹T，返回T；

（6）?? 對于第i個(gè)子結(jié)點(diǎn)，以Di為訓(xùn)練集，以A-Ag為特征集，遞歸調(diào)用步驟（1）~（5），得到子樹Ti，返回Ti。

從描述上感覺決策樹的生成還是挺簡單明了的，但是具體的實(shí)現(xiàn)上樹的生成是最最難的，要注意的細(xì)節(jié)很多，花了倆個(gè)晚上才搞好的，遇到了好多坑

代碼塊1：信息增益類

      
        package
      
      
         org.juefan.decisiontree;
        

      
      
        import
      
      
         java.util.ArrayList;

      
      
        import
      
      
         java.util.HashMap;

      
      
        import
      
      
         java.util.Map;

      
      
        import
      
      
         org.juefan.basic.FileIO;

      
      
        import
      
      
         org.juefan.bayes.Data;


      
      
        public
      
      
        class
      
      
         InfoGain {
        
      
      
        //
      
      
        數(shù)據(jù)實(shí)例存儲(chǔ)類
      
      
        class
      
      
         Data {
        
      
      
        public
      
       ArrayList<Object>
      
         x;
        
      
      
        public
      
      
         Object y;
        
        
      
      
        /**
      
      
        讀取一行數(shù)據(jù)轉(zhuǎn)化為標(biāo)準(zhǔn)格式
      
      
        */
      
      
        public
      
      
         Data(String content){
            String[] strings 
      
      = content.split("\t| |:"
      
        );
            ArrayList
      
      <Object> xList = 
      
        new
      
       ArrayList<Object>
      
        ();
            
      
      
        for
      
      (
      
        int
      
       i = 1; i < strings.length; i++
      
        ){
                xList.add(strings[i]);
            }
            
      
      
        this
      
      .x = 
      
        new
      
       ArrayList<>
      
        ();
            
      
      
        this
      
      .x =
      
         xList;
            
      
      
        this
      
      .y = strings[0
      
        ];
        }
        
        
      
      
        public
      
      
         Data(){
            x  
      
      = 
      
        new
      
       ArrayList<>
      
        ();
            y 
      
      = 0
      
        ;
        }
        
        
      
      
        public
      
      
         String toString(){
            StringBuilder builder 
      
      = 
      
        new
      
      
         StringBuilder();
            builder.append(
      
      "[ "
      
        );
            
      
      
        for
      
      (
      
        int
      
       i = 0; i < x.size() - 1; i++
      
        )
                builder.append(x.get(i).toString()).append(
      
      ","
      
        );
            builder.append(x.get(x.size() 
      
      - 1
      
        ).toString());
            builder.append(
      
      " ]"
      
        );
            
      
      
        return
      
      
         builder.toString();
        }
    }
    
    
      
      
        //
      
      
        返回底數(shù)為2的對數(shù)值
      
      
        public
      
      
        static
      
      
        double
      
       log2(
      
        double
      
      
         d){
        
      
      
        return
      
       Math.log(d)/Math.log(2
      
        );
    }
    
    
      
      
        /**
      
      
        
     * 計(jì)算經(jīng)驗(yàn)熵
     * 
      
      
        @param
      
      
         datas 當(dāng)前數(shù)據(jù)集，可以為訓(xùn)練數(shù)據(jù)集中的子集
     * 
      
      
        @return
      
      
         返回當(dāng)前數(shù)據(jù)集的經(jīng)驗(yàn)熵
     
      
      
        */
      
      
        public
      
      
        double
      
       getEntropy(ArrayList<Data>
      
         datas){
        
      
      
        int
      
       counts =
      
         datas.size();
        
      
      
        double
      
       entropy = 0
      
        ;
        Map
      
      <Object, Double> map = 
      
        new
      
       HashMap<Object, Double>
      
        ();
        
      
      
        for
      
      
        (Data data: datas){
            
      
      
        if
      
      
        (map.containsKey(data.y)){
                map.put(data.y, map.get(data.y) 
      
      + 1
      
        );
            }
      
      
        else
      
      
         {
                map.put(data.y, 1D);
            }
        }      
        
      
      
        for
      
      (
      
        double
      
      
         v: map.values())
            entropy 
      
      -= (v/counts * log2(v/
      
        counts));
        
      
      
        return
      
      
         entropy;
    }

    
      
      
        /**
      
      
        
     * 計(jì)算條件熵
     * 
      
      
        @param
      
      
         datas 當(dāng)前數(shù)據(jù)集，可以為訓(xùn)練數(shù)據(jù)集中的子集
     * 
      
      
        @param
      
      
         feature 待計(jì)算的特征位置
     * 
      
      
        @return
      
      
         第feature個(gè)特征的條件熵
     
      
      
        */
      
      
        public
      
      
        double
      
       getCondiEntropy(ArrayList<Data> datas, 
      
        int
      
      
         feature){
        
      
      
        int
      
       counts =
      
         datas.size();
        
      
      
        double
      
       condiEntropy = 0
      
        ;
        Map
      
      <Object, ArrayList<Data>> tmMap = 
      
        new
      
       HashMap<>
      
        ();
        
      
      
        for
      
      
        (Data data: datas){
            
      
      
        if
      
      
        (tmMap.containsKey(data.x.get(feature))){
                tmMap.get(data.x.get(feature)).add(data);
            }
      
      
        else
      
      
         {
                ArrayList
      
      <Data> tmDatas = 
      
        new
      
       ArrayList<>
      
        ();
                tmDatas.add(data);
                tmMap.put(data.x.get(feature), tmDatas);
            }
        }        
        
      
      
        for
      
      (ArrayList<Data>
      
         datas2: tmMap.values()){
            condiEntropy 
      
      += (
      
        double
      
      )datas2.size()/counts *
      
         getEntropy(datas2);
        }
        
      
      
        return
      
      
         condiEntropy;
    }
    
    
      
      
        /**
      
      
        
     * 計(jì)算信息增益（ID3算法）
     * 
      
      
        @param
      
      
         datas 當(dāng)前數(shù)據(jù)集，可以為訓(xùn)練數(shù)據(jù)集中的子集
     * 
      
      
        @param
      
      
         feature 待計(jì)算的特征位置
     * 
      
      
        @return
      
      
         第feature個(gè)特征的信息增益
     
      
      
        */
      
      
        public
      
      
        double
      
       getInfoGain(ArrayList<Data> datas, 
      
        int
      
      
         feature){
        
      
      
        return
      
       getEntropy(datas) -
      
         getCondiEntropy(datas, feature);
    }
    
    
      
      
        /**
      
      
        
     * 計(jì)算信息增益率（C4.5算法）
     * 
      
      
        @param
      
      
         datas 當(dāng)前數(shù)據(jù)集，可以為訓(xùn)練數(shù)據(jù)集中的子集
     * 
      
      
        @param
      
      
         feature 待計(jì)算的特征位置
     * 
      
      
        @return
      
      
         第feature個(gè)特征的信息增益率
     
      
      
        */
      
      
        public
      
      
        double
      
       getInfoGainRatio(ArrayList<Data> datas, 
      
        int
      
      
         feature){
        
      
      
        return
      
       getInfoGain(datas, feature)/
      
        getEntropy(datas);
    }
}

代碼塊2：決策樹類

      
        package
      
      
         org.juefan.decisiontree;
        

      
      
        import
      
      
         java.util.ArrayList;

      
      
        import
      
      
         java.util.List;


      
      
        public
      
      
        class
      
      
         TreeNode {
    
      
      
        private
      
      
         String feature;　　//候選特征
    
      
      
        private
      
       List<TreeNode>
      
         childTreeNode;
    
      
      
        private
      
      
         String targetFunValue;　　//特征對應(yīng)的值
    
      
      
        private
      
      
         String nodeName;　　//分類的類別
    
    
      
      
        public
      
      
         TreeNode(String nodeName){       
        
      
      
        this
      
      .nodeName =
      
         nodeName;
        
      
      
        this
      
      .childTreeNode = 
      
        new
      
       ArrayList<TreeNode>
      
        ();
    }
    
    
      
      
        public
      
      
         TreeNode(){
        
      
      
        this
      
      .childTreeNode = 
      
        new
      
       ArrayList<TreeNode>
      
        ();
    }

    
      
      
        public
      
      
        void
      
      
         printTree(){
        
      
      
        if
      
      (targetFunValue != 
      
        null
      
      
        )
            System.out.print(
      
      "特征值: " + targetFunValue + "\t"
      
        );
        
      
      
        if
      
      (nodeName != 
      
        null
      
      
        )
            System.out.print(
      
      "類型: " + nodeName + "\t"
      
        );
        System.out.println();
        
      
      
        for
      
      
        (TreeNode treeNode: childTreeNode){
            System.out.println(
      
      "當(dāng)前特征為：" +
      
         feature);
            treeNode.printTree();
        }
    }
        

      
      
        public
      
      
         String getAttributeValue() {
        
      
      
        return
      
      
         feature;
    }

    
      
      
        public
      
      
        void
      
      
         setAttributeValue(String attributeValue) {
        
      
      
        this
      
      .feature =
      
         attributeValue;
    }

    
      
      
        public
      
       List<TreeNode>
      
         getChildTreeNode() {
        
      
      
        return
      
      
         childTreeNode;
    }

    
      
      
        public
      
      
        void
      
       setChildTreeNode(List<TreeNode>
      
         childTreeNode) {
        
      
      
        this
      
      .childTreeNode =
      
         childTreeNode;
    }

    
      
      
        public
      
      
         String getTargetFunValue() {
        
      
      
        return
      
      
         targetFunValue;
    }

    
      
      
        public
      
      
        void
      
      
         setTargetFunValue(String targetFunValue) {
        
      
      
        this
      
      .targetFunValue =
      
         targetFunValue;
    }

    
      
      
        public
      
      
         String getNodeName() {
        
      
      
        return
      
      
         nodeName;
    }

    
      
      
        public
      
      
        void
      
      
         setNodeName(String nodeName) {
        
      
      
        this
      
      .nodeName =
      
         nodeName;
    }
}

代碼塊3：決策樹的生成

      
        package
      
      
         org.juefan.decisiontree;


      
      
        import
      
      
         java.util.ArrayList;

      
      
        import
      
      
         java.util.HashMap;

      
      
        import
      
      
         java.util.HashSet;

      
      
        import
      
      
         java.util.List;

      
      
        import
      
      
         java.util.Map;

      
      
        import
      
      
         java.util.Set;

      
      
        import
      
      
         org.juefan.basic.FileIO;

      
      
        import
      
      
         org.juefan.bayes.Data;


      
      
        public
      
      
        class
      
      
         DecisionTree {
    
      
      
        public
      
      
        static
      
      
        final
      
      
        double
      
       e = 0.1
      
        ;
    
      
      
        public
      
       InfoGain infoGain = 
      
        new
      
      
         InfoGain();
    
    
      
      
        public
      
       TreeNode buildTree(ArrayList<Data> datas, ArrayList<String>
      
         featureName){
        TreeNode treeNode 
      
      = 
      
        new
      
      
         TreeNode();
        ArrayList
      
      <String> feaName = 
      
        new
      
       ArrayList<>
      
        ();
        feaName 
      
      =
      
         featureName;
        
      
      
        if
      
      (isSingle(datas) || getMaxInfoGain(datas) <
      
         e){
            treeNode.setNodeName(getLabel(datas).toString());
            
      
      
        return
      
      
         treeNode;
        }
      
      
        else
      
      
          {
            
      
      
        int
      
       feature =
      
         getMaxInfoGainFeature(datas);
            treeNode.setAttributeValue(feaName.get(feature 
      
      + 1
      
        ));
            ArrayList
      
      <String> tList = 
      
        new
      
       ArrayList<>
      
        ();
            tList 
      
      =
      
         feaName;
            Map
      
      <Object, ArrayList<Data>> tMap = 
      
        new
      
       HashMap<>
      
        ();
            
      
      
        for
      
      
        (Data data: datas){
                
      
      
        if
      
      
        (tMap.containsKey(data.x.get(feature))){
                    Data tData 
      
      = 
      
        new
      
      
         Data();
                    
      
      
        for
      
      (
      
        int
      
       i = 0; i < data.x.size(); i++
      
        )
                        
      
      
        if
      
      (i !=
      
         feature)
                            tData.x.add(data.x.get(i));
                    tData.y 
      
      =
      
         data.y;
                    tMap.get(data.x.get(feature)).add(tData);
                }
      
      
        else
      
      
         {
                    Data tData 
      
      = 
      
        new
      
      
         Data();
                    
      
      
        for
      
      (
      
        int
      
       i = 0; i < data.x.size(); i++
      
        )
                        
      
      
        if
      
      (i !=
      
         feature)
                            tData.x.add(data.x.get(i));
                    tData.y 
      
      =
      
         data.y;
                    ArrayList
      
      <Data> tDatas = 
      
        new
      
       ArrayList<>
      
        ();
                    tDatas.add(tData);
                    tMap.put(data.x.get(feature),tDatas);
                }
            }
            List
      
      <TreeNode> treeNodes = 
      
        new
      
       ArrayList<>
      
        ();
            
      
      
        int
      
       child = 0
      
        ;
            
      
      
        for
      
      
        (Object key: tMap.keySet()){
                
      
      
        //
      
      
        這一步太坑爹了，java的拷背坑真多啊，害我浪費(fèi)了半天的時(shí)間
      
      
                ArrayList<String> tList2 = 
      
        new
      
       ArrayList<>
      
        (tList);
                tList2.remove(feature 
      
      + 1
      
        );
                treeNodes.add(buildTree(tMap.get(key), tList2));
                treeNodes.get(child 
      
      ++
      
        ).setTargetFunValue(key.toString());
            }
            treeNode.setChildTreeNode(treeNodes);
            feaName.remove(feature 
      
      + 1
      
        );
        }    
        
      
      
        return
      
      
         treeNode;
    }
    
    
      
      
        /**
      
      
        
     * 獲取實(shí)例中的最大類
     * 
      
      
        @param
      
      
         datas 實(shí)例集
     * 
      
      
        @return
      
      
         出現(xiàn)次數(shù)最多的類
     
      
      
        */
      
      
        public
      
       Object getLabel(ArrayList<Data>
      
         datas){
        Map
      
      <Object, Integer> map = 
      
        new
      
       HashMap<Object, Integer>
      
        ();
        Object label 
      
      = 
      
        null
      
      
        ;
        
      
      
        int
      
       max = 0
      
        ;
        
      
      
        for
      
      
        (Data data: datas){
            
      
      
        if
      
      
        (map.containsKey(data.y)){
                map.put(data.y, map.get(data.y) 
      
      + 1
      
        );
                
      
      
        if
      
      (map.get(data.y) >
      
         max){
                    max 
      
      =
      
         map.get(data.y);
                    label 
      
      =
      
         data.y;
                }
            }
      
      
        else
      
      
         {
                map.put(data.y, 
      
      1
      
        );
            }
        }
        
      
      
        return
      
      
         label;
    }
    
    
      
      
        /**
      
      
        
     * 計(jì)算信息增益（率）的最大值
     * 
      
      
        @param
      
      
         datas
     * 
      
      
        @return
      
      
         最大的信息增益值
     
      
      
        */
      
      
        public
      
      
        double
      
       getMaxInfoGain(ArrayList<Data>
      
         datas){
        
      
      
        double
      
       max = 0
      
        ;
        
      
      
        for
      
      (
      
        int
      
       i = 0; i < datas.get(0).x.size(); i++
      
        ){
            
      
      
        double
      
       temp =
      
         infoGain.getInfoGain(datas, i);
            
      
      
        if
      
      (temp >
      
         max)
                max 
      
      =
      
         temp;
        }
        
      
      
        return
      
      
         max;
    }
    
    
      
      
        /**
      
      
        信息增益最大的特征
      
      
        */
      
      
        public
      
      
        int
      
       getMaxInfoGainFeature(ArrayList<Data>
      
         datas){
        
      
      
        double
      
       max = 0
      
        ;
        
      
      
        int
      
       feature = 0
      
        ;
        
      
      
        for
      
      (
      
        int
      
       i = 0; i < datas.get(0).x.size(); i++
      
        ){
            
      
      
        double
      
       temp =
      
         infoGain.getInfoGain(datas, i);
            
      
      
        if
      
      (temp >
      
         max){
                max 
      
      =
      
         temp;
                feature 
      
      =
      
         i;
            }
        }
        
      
      
        return
      
      
         feature;
    }
    
    
      
      
        /**
      
      
        判斷是否只有一類
      
      
        */
      
      
        public
      
      
        boolean
      
       isSingle(ArrayList<Data>
      
         datas){
        Set
      
      <Object> set = 
      
        new
      
       HashSet<>
      
        ();
        
      
      
        for
      
      
        (Data data: datas)
            set.add(data.y);
        
      
      
        return
      
       set.size() == 1? 
      
        true
      
      :
      
        false
      
      
        ;
    }
    
    
      
      
        public
      
      
        static
      
      
        void
      
      
         main(String[] args) {
        ArrayList
      
      <Data> datas = 
      
        new
      
       ArrayList<>
      
        ();
        FileIO fileIO 
      
      = 
      
        new
      
      
         FileIO();
        DecisionTree decisionTree 
      
      = 
      
        new
      
      
         DecisionTree();
        fileIO.setFileName(
      
      ".//file//decision.tree.txt"
      
        );
        fileIO.FileRead(
      
      "utf-8"
      
        );
        ArrayList
      
      <String> featureName = 
      
        new
      
       ArrayList<>
      
        ();
        
      
      
        //
      
      
        獲取文件的標(biāo)頭
      
      
        for
      
      (String string: fileIO.fileList.get(0).split("\t"
      
        ))
            featureName.add(string);
        
      
      
        for
      
      (
      
        int
      
       i = 1; i < fileIO.fileList.size(); i++
      
        ){
            datas.add(
      
      
        new
      
      
         Data(fileIO.fileList.get(i)));
        }
        TreeNode treeNode 
      
      = 
      
        new
      
      
         TreeNode();
        treeNode 
      
      =
      
         decisionTree.buildTree(datas, featureName);
        treeNode.printTree();
    }
}

?運(yùn)行情況：

輸入文件?".//file//decision.tree.txt" 內(nèi)容為:

類型年齡有工作有自己的房子信貸情況
否青年否否一般
否青年否否好
是青年是否好
是青年是是一般
否青年否否一般
否中年否否一般
否中年否否好
是中年是是好
是中年否是非常好
是中年否是非常好
是老年否是非常好
是老年否是好
是老年是否好
是老年是否非常好
否老年否否一般

運(yùn)行結(jié)果為：

當(dāng)前特征為：有自己的房子
特征值: 是類型: 是
當(dāng)前特征為：有自己的房子
特征值: 否
當(dāng)前特征為：有工作
特征值: 是類型: 是
當(dāng)前特征為：有工作
特征值: 否類型: 否

對代碼有興趣的可以上本人的GitHub查看： https://github.com/JueFan/StatisticsLearningMethod/

里面也有具體的實(shí)例數(shù)據(jù)

統(tǒng)計(jì)學(xué)習(xí)方法（五）——決策樹

更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主

微信掃碼或搜索：z360901061

微信掃一掃加我為好友

QQ號(hào)聯(lián)系： 360901061

您的支持是博主寫作最大的動(dòng)力，如果您喜歡我的文章，感覺我的文章對您有幫助，請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧，狠狠點(diǎn)擊下面給點(diǎn)支持吧，站長非常感激您！手機(jī)微信長按不能支付解決辦法：請將微信支付二維碼保存到相冊，切換到微信，然后點(diǎn)擊微信右上角掃一掃功能，選擇支付二維碼完成支付。

【本文對您有幫助就好】元

2元

5元

10元

20元

自定義