呼叫python的sklearn實現Logistic Reression演算法
呼叫python的sklearn實現Logistic Reression演算法
先說如何實現,其中的匯入資料庫和類、方法的關係,之前不是很清楚,現在知道了。。。
from numpy import *
from sklearn.datasets import load_iris # import datasets
# load the dataset: iris
iris = load_iris()
samples = iris.data
#print samples
target = iris.target
# import the LogisticRegression
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression() # 使用類,引數全是預設的
classifier.fit(samples, target) # 訓練資料來學習,不需要返回值
x = classifier.predict([5, 3, 5, 2.5]) # 測試資料,分類返回標記
print x
#其實匯入的是sklearn.linear_model的一個類:LogisticRegression, 它裡面有許多方法
#常用的方法是fit(訓練分類模型)、predict(預測測試樣本的標記)
#不過裡面沒有返回LR模型中學習到的權重向量w,感覺這是一個缺陷
上面使用的
classifier = LogisticRegression() # 使用類,引數全是預設的
是預設的,所有的引數全都是預設的,其實我們可以自己設定許多。這需要用到官方給定的引數說明,如下:
sklearn.linear_model.LogisticRegression
- class sklearn.linear_model.LogisticRegression(penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True,intercept_scaling=1, class_weight=None, random_state=None)
-
Logistic Regression (aka logit, MaxEnt) classifier.
In the multiclass case, the training algorithm uses a one-vs.-all (OvA) scheme, rather than the “true” multinomial LR.
This class implements L1 and L2 regularized logistic regression using the liblinear library. It can handle both dense and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit floats for optimal performance; any other input format will be converted (and copied).
Parameters: penalty : string, ‘l1’ or ‘l2’ 懲罰項的種類
Used to specify the norm used in the penalization.
dual : boolean
Dual or primal formulation. Dual formulation is only implemented for l2 penalty. Prefer dual=False when n_samples > n_features.
C : float, optional (default=1.0)
Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.
fit_intercept : bool, default: True
Specifies if a constant (a.k.a. bias or intercept) should be added the decision function.
intercept_scaling : float, default: 1
when self.fit_intercept is True, instance vector x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equals to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic feature weight Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased
class_weight : {dict, ‘auto’}, optional 考慮類不平衡,類似於代價敏感
Over-/undersamples the samples of each class according to the given weights. If not given, all classes are supposed to have weight one. The ‘auto’ mode selects weights inversely proportional to class frequencies in the training set.
random_state: int seed, RandomState instance, or None (default) :
The seed of the pseudo random number generator to use when shuffling the data.
tol: float, optional :
Tolerance for stopping criteria.
Attributes: `coef_` : array, shape = [n_classes, n_features]
Coefficient of the features in the decision function.
coef_ is readonly property derived from raw_coef_ that follows the internal memory layout of liblinear.
`intercept_` : array, shape = [n_classes]
Intercept (a.k.a. bias) added to the decision function. If fit_intercept is set to False, the intercept is set to zero.
LogisticRegression類中的方法有如下幾種,我們常用的是fit和predict~
Methods
decision_function(X) | Predict confidence scores for samples. |
densify() | Convert coefficient matrix to dense array format. |
fit(X, y) | Fit the model according to the given training data. 用來訓練LR分類器,其中的X是訓練樣本,y是對應的標記向量 |
fit_transform(X[, y]) | Fit to data, then transform it. |
get_params([deep]) | Get parameters for this estimator. |
predict(X) | Predict class labels for samples in X. 用來預測測試樣本的標記,也就是分類。X是測試樣本集 |
predict_log_proba(X) | Log of probability estimates. |
predict_proba(X) | Probability estimates. |
score(X, y[, sample_weight]) | Returns the mean accuracy on the given test data and labels. |
set_params(**params) | Set the parameters of this estimator. |
sparsify() | Convert coefficient matrix to sparse format. |
transform(X[, threshold]) | Reduce X to its most important features. |
使用predict返回的就是測試樣本的標記向量,其實個人覺得還應有LR分類器中的重要過程引數:權重向量,其size應該是和feature的個數相同。但是就沒有這個方法,所以這就萌生了自己實現LR演算法的念頭,那樣子就可以輸出權重向量了。
參考連結:
http://www.cnblogs.com/xupeizhi/archive/2013/07/05/3174703.html
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression
相關文章
- 【機器學習演算法實現】logistic迴歸__基於Python和Numpy函式庫機器學習演算法Python函式
- 決策樹在sklearn中的實現
- 機器學習|決策樹-sklearn實現機器學習
- 【機器學習】線性迴歸sklearn實現機器學習
- 用opencv實現的PCA演算法,非API呼叫OpenCVPCA演算法API
- 機器學習之決策樹在sklearn中的實現機器學習
- 【機器學習】多項式迴歸sklearn實現機器學習
- 基本排序演算法的Python實現排序演算法Python
- python排序演算法的實現-冒泡Python排序演算法
- python排序演算法的實現-插入Python排序演算法
- python中sklearn包的錯誤Python
- Python: 安裝 sklearn 包出現錯誤的解決方法Python
- python和C的如何實現互相呼叫?Python
- 機器學習之Logistic迴歸演算法機器學習演算法
- FM演算法python實現演算法Python
- python實現冒泡演算法Python演算法
- python實現FM演算法Python演算法
- PYTHON實現DFS演算法Python演算法
- python實現Floyd演算法Python演算法
- Python實現KNN演算法PythonKNN演算法
- Python+sklearn使用DBSCAN聚類演算法案例一則Python聚類演算法
- TensorFlow 呼叫預訓練好的模型—— Python 實現模型Python
- 利用swig實現python呼叫C/C++的方法PythonC++
- 第七篇:Logistic迴歸分類演算法原理分析與程式碼實現演算法
- python排序演算法的實現-選擇Python排序演算法
- python排序演算法的實現-快速排序Python排序演算法
- python如何呼叫subprocess模組實現外部命令?Python
- 模仿sklearn進行機器學習演算法的封裝機器學習演算法封裝
- python實現希爾排序演算法Python排序演算法
- RSA演算法與Python實現演算法Python
- python演算法 - python實現氣泡排序Python演算法排序
- 目標匹配:匈牙利演算法的python實現演算法Python
- 機器學習Sklearn系列:(五)聚類演算法機器學習聚類演算法
- sklearn調包俠之KNN演算法KNN演算法
- Python機器學習筆記:sklearn庫的學習Python機器學習筆記
- python中匯入不了sklearn的問題Python
- Eureka實現微服務的呼叫微服務
- 機器學習筆記之Logistic迴歸演算法機器學習筆記演算法