日本免费高清视频-国产福利视频导航-黄色在线播放国产-天天操天天操天天操天天操|www.shdianci.com

學無先后,達者為師

網站首頁 編程語言 正文

python?sklearn?畫出決策樹并保存為PDF的實現過程_python

作者:Dragon水魅 ? 更新時間: 2022-09-07 編程語言

利用sklearn畫出決策樹并保存為PDF

下載Graphviz

進入官網下載并安裝:

https://graphviz.gitlab.io/_pages/Download/Download_windows.html

并將下列路徑配置為環境變量:

  • D:\software\Graphviz\bin

在cmd中測試:

  • dot -version

python代碼

import numpy as np
import pandas as pd
from sklearn import tree
import graphviz
# x,y是sklearn中需要擬合的數據
x = np.array(exam_train)
y = np.array(classes_train)
clf = tree.DecisionTreeClassifier(criterion='entropy', class_weight='balanced', max_depth=25)
clf = clf.fit(x, y)
dot_data = tree.export_graphviz(clf, out_file=None, feature_names=None, filled=True, rounded=True)  # 重要參數可定制
graph = graphviz.Source(dot_data)
graph.render(view=True, format="pdf", filename="decisiontree_pdf")

可以生成一張賊帥的決策樹PDF:

python sklearn 決策樹運用

數據形式(tree.csv)

age look income orderly target
older ugly low yes no
young ugly high no no
young handsome low no no
young handsome high yes yes
young handsome medium yes yes
young handsome medium no no

python源代碼:

# -*- coding:utf-8*-
# 將字典 轉化為 sklearn 用的數據形式 數據型 矩陣
from sklearn.feature_extraction import DictVectorizer
import csv
from sklearn import preprocessing
from sklearn import tree

allElectronicsData = open('c:/pic/data/tree.csv','rb')
reader = csv.reader(allElectronicsData)
header = reader.next()
# print header
## 數據預處理
featureList = []
labelList = []
for row in reader:
    # print row[-1]
    labelList.append(row[-1])
    # 下面這幾步的目的是為了讓特征值轉化成一種字典的形式,就可以調用sk-learn里面的DictVectorizer,直接將特征的類別值轉化成0,1值
    rowDict = {}
    for i in range(1, len(row) - 1):
        rowDict[header[i]] = row[i]
    featureList.append(rowDict)

for each in featureList:
    print each

# Vectorize features
vec = DictVectorizer()
dummyX = vec.fit_transform(featureList).toarray()
print("dummyX:"+str(dummyX))
print(vec.get_feature_names())

# label的轉化,直接用preprocessing的LabelBinarizer方法
lb = preprocessing.LabelBinarizer()
dummyY = lb.fit_transform(labelList)
print("dummyY:"+str(dummyY))
print("labelList:"+str(labelList))

#criterion是選擇決策樹節點的 標準 ,這里是按照“熵”為標準,即ID3算法;默認標準是gini index,即CART算法。
clf = tree.DecisionTreeClassifier()
clf = clf.fit(dummyX,dummyY)
print("clf:"+str(clf))
# 導入相關函數,可視化決策樹
# 導出的結果是一個dot文件(在系統默認路勁),需要安裝Graphviz才能將它住哪華為PDF或png格式
# 輸出的dot文件可以使用graphvize軟件轉為PDF,graphvize安裝目錄中的bin目錄放入到環境變量的Path中
# 使用如下命令
#cmd
# dot -Tpdf  c:/tree.dot -o c:/tree.pdf
#下載地址:http://www.graphviz.org/Download_windows.php
#生成dot文件
with open("c:/tree.dot",'w') as f:
    f = tree.export_graphviz(clf, feature_names= vec.get_feature_names(),out_file= f)

原文鏈接:https://blog.csdn.net/qq_43650934/article/details/107286860

欄目分類
最近更新