網(wǎng)站首頁 編程語言 正文
1線性回歸
1.1簡單線性回歸
在簡單線性回歸中,通過調(diào)整a和b的參數(shù)值,來擬合從x到y(tǒng)的線性關(guān)系。下圖為進(jìn)行擬合所需要優(yōu)化的目標(biāo),也即是MES(Mean Squared Error),只不過省略了平均的部分(除以m)。
對于簡單線性回歸,只有兩個參數(shù)a和b,通過對MSE優(yōu)化目標(biāo)求極值(最小二乘法),即可求得最優(yōu)a和b如下,所以在訓(xùn)練簡單線性回歸模型時,也只需要根據(jù)數(shù)據(jù)求解這兩個參數(shù)值即可。
下面使用波士頓房價數(shù)據(jù)集中,索引為5的特征RM (average number of rooms per dwelling)來進(jìn)行簡單線性回歸。其中使用的評價指標(biāo)為:
# 以sklearn的形式對simple linear regression 算法進(jìn)行封裝 import numpy as np import sklearn.datasets as datasets from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt from sklearn.metrics import mean_squared_error,mean_absolute_error np.random.seed(123) class SimpleLinearRegression(): def __init__(self): """ initialize model parameters self.a_=None self.b_=None def fit(self,x_train,y_train): training model parameters Parameters ---------- x_train:train x ,shape:data [N,] y_train:train y ,shape:data [N,] assert (x_train.ndim==1 and y_train.ndim==1),\ """Simple Linear Regression model can only solve single feature training data""" assert len(x_train)==len(y_train),\ """the size of x_train must be equal to y_train""" x_mean=np.mean(x_train) y_mean=np.mean(y_train) self.a_=np.vdot((x_train-x_mean),(y_train-y_mean))/np.vdot((x_train-x_mean),(x_train-x_mean)) self.b_=y_mean-self.a_*x_mean def predict(self,input_x): make predictions based on a batch of data input_x:shape->[N,] assert input_x.ndim==1 ,\ """Simple Linear Regression model can only solve single feature data""" return np.array([self.pred_(x) for x in input_x]) def pred_(self,x): give a prediction based on single input x return self.a_*x+self.b_ def __repr__(self): return "SimpleLinearRegressionModel" if __name__ == '__main__': boston_data = datasets.load_boston() x = boston_data['data'][:, 5] # total x data (506,) y = boston_data['target'] # total y data (506,) # keep data with target value less than 50. x = x[y < 50] # total x data (490,) y = y[y < 50] # total x data (490,) plt.scatter(x, y) plt.show() # train size:(343,) test size:(147,) x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3) regs = SimpleLinearRegression() regs.fit(x_train, y_train) y_hat = regs.predict(x_test) rmse = np.sqrt(np.sum((y_hat - y_test) ** 2) / len(x_test)) mse = mean_squared_error(y_test, y_hat) mae = mean_absolute_error(y_test, y_hat) # notice R_squared_Error = 1 - mse / np.var(y_test) print('mean squared error:%.2f' % (mse)) print('root mean squared error:%.2f' % (rmse)) print('mean absolute error:%.2f' % (mae)) print('R squared Error:%.2f' % (R_squared_Error))
輸出結(jié)果:
mean squared error:26.74
root mean squared error:5.17
mean absolute error:3.85
R squared Error:0.50
數(shù)據(jù)的可視化:
1.2 多元線性回歸
多元線性回歸中,單個x的樣本擁有了多個特征,也就是上圖中帶下標(biāo)的x。
其結(jié)構(gòu)可以用向量乘法表示出來:
為了便于計算,一般會將x增加一個為1的特征,方便與截距bias計算。
而多元線性回歸的優(yōu)化目標(biāo)與簡單線性回歸一致。
通過矩陣求導(dǎo)計算,可以得到方程解,但求解的時間復(fù)雜度很高。
下面使用正規(guī)方程解的形式,來對波士頓房價的所有特征做多元線性回歸。
import numpy as np from PlayML.metrics import r2_score from sklearn.model_selection import train_test_split import sklearn.datasets as datasets from PlayML.metrics import root_mean_squared_error np.random.seed(123) class LinearRegression(): def __init__(self): self.coef_=None # coeffient self.intercept_=None # interception self.theta_=None def fit_normal(self, x_train, y_train): """ use normal equation solution for multiple linear regresion as model parameters Parameters ---------- theta=(X^T * X)^-1 * X^T * y assert x_train.shape[0] == y_train.shape[0],\ """size of the x_train must be equal to y_train """ X_b=np.hstack([np.ones((len(x_train), 1)), x_train]) self.theta_=np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y_train) # (featere,1) self.coef_=self.theta_[1:] self.intercept_=self.theta_[0] def predict(self,x_pred): """給定待預(yù)測數(shù)據(jù)集X_predict,返回表示X_predict的結(jié)果向量""" assert self.intercept_ is not None and self.coef_ is not None, \ "must fit before predict!" assert x_pred.shape[1] == len(self.coef_), \ "the feature number of X_predict must be equal to X_train" X_b=np.hstack([np.ones((len(x_pred),1)),x_pred]) return X_b.dot(self.theta_) def score(self,x_test,y_test): Calculate evaluating indicator socre --------- x_test:x test data y_test:true label y for x test data y_pred=self.predict(x_test) return r2_score(y_test,y_pred) def __repr__(self): return "LinearRegression" if __name__ == '__main__': # use boston house price dataset for test boston_data = datasets.load_boston() x = boston_data['data'] # total x data (506,) y = boston_data['target'] # total y data (506,) # keep data with target value less than 50. x = x[y < 50] # total x data (490,) y = y[y < 50] # total x data (490,) # train size:(343,) test size:(147,) x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3,random_state=123) regs = LinearRegression() regs.fit_normal(x_train, y_train) # calc error score=regs.score(x_test,y_test) rmse=root_mean_squared_error(y_test,regs.predict(x_test)) print('R squared error:%.2f' % (score)) print('Root mean squared error:%.2f' % (rmse))
輸出結(jié)果:
R squared error:0.79
Root mean squared error:3.36
1.3 使用sklearn中的線性回歸模型
import sklearn.datasets as datasets from sklearn.linear_model import LinearRegression import numpy as np from sklearn.model_selection import train_test_split from PlayML.metrics import root_mean_squared_error np.random.seed(123) if __name__ == '__main__': # use boston house price dataset boston_data = datasets.load_boston() x = boston_data['data'] # total x size (506,) y = boston_data['target'] # total y size (506,) # keep data with target value less than 50. x = x[y < 50] # total x size (490,) y = y[y < 50] # total x size (490,) # train size:(343,) test size:(147,) x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=123) regs = LinearRegression() regs.fit(x_train, y_train) # calc error score = regs.score(x_test, y_test) rmse = root_mean_squared_error(y_test, regs.predict(x_test)) print('R squared error:%.2f' % (score)) print('Root mean squared error:%.2f' % (rmse)) print('coeffient:',regs.coef_.shape) print('interception:',regs.intercept_.shape)
R squared error:0.79 Root mean squared error:3.36 coeffient: (13,) interception: ()
原文鏈接:https://blog.csdn.net/Demon_LMMan/article/details/123114890
相關(guān)推薦
- 2022-07-18 Redis服務(wù)器連接本地Linux所踩的坑
- 2022-12-06 python中讀取txt文件時split()函數(shù)的妙用_python
- 2022-12-10 Android入門之ScrollView的使用教程_Android
- 2022-07-21 Python中直接賦值、淺拷貝和深拷貝的區(qū)別
- 2023-07-29 [plugin:vite:import-analysis]Failed to resolve imp
- 2022-07-30 go語言中slice,map,channl底層原理_Golang
- 2022-07-24 Python實現(xiàn)FIFO緩存置換算法_python
- 2022-09-26 React 函數(shù)式組件怎樣進(jìn)行優(yōu)化
- 最近更新
-
- window11 系統(tǒng)安裝 yarn
- 超詳細(xì)win安裝深度學(xué)習(xí)環(huán)境2025年最新版(
- Linux 中運行的top命令 怎么退出?
- MySQL 中decimal 的用法? 存儲小
- get 、set 、toString 方法的使
- @Resource和 @Autowired注解
- Java基礎(chǔ)操作-- 運算符,流程控制 Flo
- 1. Int 和Integer 的區(qū)別,Jav
- spring @retryable不生效的一種
- Spring Security之認(rèn)證信息的處理
- Spring Security之認(rèn)證過濾器
- Spring Security概述快速入門
- Spring Security之配置體系
- 【SpringBoot】SpringCache
- Spring Security之基于方法配置權(quán)
- redisson分布式鎖中waittime的設(shè)
- maven:解決release錯誤:Artif
- restTemplate使用總結(jié)
- Spring Security之安全異常處理
- MybatisPlus優(yōu)雅實現(xiàn)加密?
- Spring ioc容器與Bean的生命周期。
- 【探索SpringCloud】服務(wù)發(fā)現(xiàn)-Nac
- Spring Security之基于HttpR
- Redis 底層數(shù)據(jù)結(jié)構(gòu)-簡單動態(tài)字符串(SD
- arthas操作spring被代理目標(biāo)對象命令
- Spring中的單例模式應(yīng)用詳解
- 聊聊消息隊列,發(fā)送消息的4種方式
- bootspring第三方資源配置管理
- GIT同步修改后的遠(yuǎn)程分支