網(wǎng)站首頁(yè) 編程語(yǔ)言 正文
scikit-learn庫(kù)
scikit-learn已經(jīng)封裝好很多數(shù)據(jù)挖掘的算法
現(xiàn)介紹數(shù)據(jù)挖掘框架的搭建方法
1.轉(zhuǎn)換器(Transformer)用于數(shù)據(jù)預(yù)處理,數(shù)據(jù)轉(zhuǎn)換
2.流水線(Pipeline)組合數(shù)據(jù)挖掘流程,方便再次使用(封裝)
3.估計(jì)器(Estimator)用于分類,聚類,回歸分析(各種算法對(duì)象)
所有的估計(jì)器都有下面2個(gè)函數(shù)
fit() 訓(xùn)練
用法:estimator.fit(X_train, y_train)
estimator = KNeighborsClassifier() 是scikit-learn算法對(duì)象
X_train = dataset.data 是numpy數(shù)組
y_train = dataset.target 是numpy數(shù)組
predict() 預(yù)測(cè)
用法:estimator.predict(X_test)
estimator = KNeighborsClassifier() 是scikit-learn算法對(duì)象
X_test = dataset.data 是numpy數(shù)組
示例
%matplotlib inline # Ionosphere數(shù)據(jù)集 # https://archive.ics.uci.edu/ml/machine-learning-databases/ionosphere/ # 下載ionosphere.data和ionosphere.names文件,放在 ./data/Ionosphere/ 目錄下 import os home_folder = os.path.expanduser("~") print(home_folder) # home目錄 # Change this to the location of your dataset home_folder = "." # 改為當(dāng)前目錄 data_folder = os.path.join(home_folder, "data") print(data_folder) data_filename = os.path.join(data_folder, "ionosphere.data") print(data_filename) import csv import numpy as np
# Size taken from the dataset and is known已知數(shù)據(jù)集形狀 X = np.zeros((351, 34), dtype='float') y = np.zeros((351,), dtype='bool') with open(data_filename, 'r') as input_file: reader = csv.reader(input_file) for i, row in enumerate(reader): # Get the data, converting each item to a float data = [float(datum) for datum in row[:-1]] # Set the appropriate row in our dataset用真實(shí)數(shù)據(jù)覆蓋掉初始化的0 X[i] = data # 1 if the class is 'g', 0 otherwise y[i] = row[-1] == 'g' # 相當(dāng)于if row[-1]=='g': y[i]=1 else: y[i]=0
# 數(shù)據(jù)預(yù)處理 from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=14) print("訓(xùn)練集數(shù)據(jù)有 {} 條".format(X_train.shape[0])) print("測(cè)試集數(shù)據(jù)有 {} 條".format(X_test.shape[0])) print("每條數(shù)據(jù)有 {} 個(gè)features".format(X_train.shape[1]))
輸出:
訓(xùn)練集數(shù)據(jù)有 263 條
測(cè)試集數(shù)據(jù)有 88 條
每條數(shù)據(jù)有 34 個(gè)features
# 實(shí)例化算法對(duì)象->訓(xùn)練->預(yù)測(cè)->評(píng)價(jià) from sklearn.neighbors import KNeighborsClassifier estimator = KNeighborsClassifier() estimator.fit(X_train, y_train) y_predicted = estimator.predict(X_test) accuracy = np.mean(y_test == y_predicted) * 100 print("準(zhǔn)確率 {0:.1f}%".format(accuracy)) # 其他評(píng)價(jià)方式 from sklearn.cross_validation import cross_val_score scores = cross_val_score(estimator, X, y, scoring='accuracy') average_accuracy = np.mean(scores) * 100 print("平均準(zhǔn)確率 {0:.1f}%".format(average_accuracy)) avg_scores = [] all_scores = [] parameter_values = list(range(1, 21)) # Including 20 for n_neighbors in parameter_values: estimator = KNeighborsClassifier(n_neighbors=n_neighbors) scores = cross_val_score(estimator, X, y, scoring='accuracy') avg_scores.append(np.mean(scores)) all_scores.append(scores)
輸出:
準(zhǔn)確率 86.4%
平均準(zhǔn)確率 82.3%
from matplotlib import pyplot as plt plt.figure(figsize=(32,20)) plt.plot(parameter_values, avg_scores, '-o', linewidth=5, markersize=24) #plt.axis([0, max(parameter_values), 0, 1.0])
for parameter, scores in zip(parameter_values, all_scores): n_scores = len(scores) plt.plot([parameter] * n_scores, scores, '-o')
plt.plot(parameter_values, all_scores, 'bx')
from collections import defaultdict all_scores = defaultdict(list) parameter_values = list(range(1, 21)) # Including 20 for n_neighbors in parameter_values: for i in range(100): estimator = KNeighborsClassifier(n_neighbors=n_neighbors) scores = cross_val_score(estimator, X, y, scoring='accuracy', cv=10) all_scores[n_neighbors].append(scores) for parameter in parameter_values: scores = all_scores[parameter] n_scores = len(scores) plt.plot([parameter] * n_scores, scores, '-o')
plt.plot(parameter_values, avg_scores, '-o')
原文鏈接:https://blog.csdn.net/qq_42034590/article/details/129243282
- 上一篇:沒(méi)有了
- 下一篇:沒(méi)有了
相關(guān)推薦
- 2022-03-16 linux下FastDFS搭建圖片服務(wù)器_Linux
- 2022-11-18 Go與Redis實(shí)現(xiàn)分布式互斥鎖和紅鎖_Golang
- 2022-09-16 Kubernetes教程之Windows?HostProcess?運(yùn)行容器化負(fù)載_云其它
- 2022-09-14 Python定制類你不知道的魔術(shù)方法_python
- 2022-06-02 C++零基礎(chǔ)精通數(shù)據(jù)結(jié)構(gòu)之帶頭雙向循環(huán)鏈表_C 語(yǔ)言
- 2022-03-17 SQL?Server?DATEDIFF()?函數(shù)用法_MsSql
- 2022-10-20 Flutter?StreamBuilder實(shí)現(xiàn)局部刷新實(shí)例詳解_Android
- 2022-09-07 詳解C語(yǔ)言結(jié)構(gòu)體,枚舉,聯(lián)合體的使用_C 語(yǔ)言
- 欄目分類
-
- 最近更新
-
- window11 系統(tǒng)安裝 yarn
- 超詳細(xì)win安裝深度學(xué)習(xí)環(huán)境2025年最新版(
- Linux 中運(yùn)行的top命令 怎么退出?
- MySQL 中decimal 的用法? 存儲(chǔ)小
- get 、set 、toString 方法的使
- @Resource和 @Autowired注解
- Java基礎(chǔ)操作-- 運(yùn)算符,流程控制 Flo
- 1. Int 和Integer 的區(qū)別,Jav
- spring @retryable不生效的一種
- Spring Security之認(rèn)證信息的處理
- Spring Security之認(rèn)證過(guò)濾器
- Spring Security概述快速入門(mén)
- Spring Security之配置體系
- 【SpringBoot】SpringCache
- Spring Security之基于方法配置權(quán)
- redisson分布式鎖中waittime的設(shè)
- maven:解決release錯(cuò)誤:Artif
- restTemplate使用總結(jié)
- Spring Security之安全異常處理
- MybatisPlus優(yōu)雅實(shí)現(xiàn)加密?
- Spring ioc容器與Bean的生命周期。
- 【探索SpringCloud】服務(wù)發(fā)現(xiàn)-Nac
- Spring Security之基于HttpR
- Redis 底層數(shù)據(jù)結(jié)構(gòu)-簡(jiǎn)單動(dòng)態(tài)字符串(SD
- arthas操作spring被代理目標(biāo)對(duì)象命令
- Spring中的單例模式應(yīng)用詳解
- 聊聊消息隊(duì)列,發(fā)送消息的4種方式
- bootspring第三方資源配置管理
- GIT同步修改后的遠(yuǎn)程分支