Feature Selection 特徵選取 | Wrapper (二) Python sklearn 實作

Jasmine

5 min readMay 18, 2021

上一篇介紹了 Filter 的方法，以統計指標過濾相關性高的特徵

Feature Selection 特徵選取 | Filter (ㄧ) , Python sklearn 實作

jasmine880809.medium.com

接著到第二種方法 — Wrapper ，根據資料的特徵，產生可能的特徵組合(subset)，用不同的 subset 去訓練模型來找到最適合訓練模型的 feature subset。

與 Filter 只以統計標準過濾特徵相比，Wrapper 的方法能夠偵測特徵之間互相影響的關係，不過 Wrapper 是 greedy 的方法，所以速度比較慢。

2. Wrapper Method

透過特徵 subset 選取的方式分成以下三種：

Forward Selection
Backward Elimination
Recursive Feature Elimination

2.1 Forward Selection

以 iterative 迭代的方式，從沒有任何特徵開始，依序添加最能優化模型的特徵，直到添加任何特徵都不會改變模型的效能為止。這個範例中「最能優化」模型的標準設定用 recall 來衡量。

# SVM with Forward selection
clf = SVC(kernel='linear')
fsSvm = SequentialFeatureSelector(clf, scoring='recall', n_features_to_select=5)
fsSvm.fit(X_train, y_train)# Selected Features
selectedFeatureIndices = fsSvm.get_support(indices=True)
selectedFeatureColNames = X_train.columns[selectedFeatureIndices]
print("Selected Features(Forward selection):")
list(selectedFeatureColNames)

2.2 Backward Elimination

跟 Forward Selection 相反，從加入所有特徵開始，依序刪除對模型最沒有幫助的特徵，直到刪除任何特徵都不會改變模型的效能為止。

2.3 Recursive Feature elimination

第一步、以原始的特徵訓練模型，並根據衡量標準計算出每個特徵的分數
第二步、刪除最不重要的 N 個特徵後再訓練一次模型
第三步、不斷重複第二部，直到特徵數等於 NFeatures
NFeatures：n_features_to_select
N：step

# SVM with Recursive Feature Elimination
clf = SVC(kernel='linear')
rfeSelector = RFE(estimator=clf, n_features_to_select=10, step=1)
rfeSvm = make_pipeline(rfeSelector, clf)
rfeSvm.fit(X_train, y_train)
rfePredictions = rfeSvm.predict(X_test)# Table - feature ranking
ranking = rfeSelector.ranking_
selectedFeatureIndices = rfeSelector.get_support(indices=True)
selectedFeatureColNames = X_train.columns[selectedFeatureIndices]
rankingDic = dict((i,X_train.columns[i]) for i in ranking)
rankingDic = OrderedDict(sorted(rankingDic.items(), key=lambda t: t[0]))
print("Feature Ranking:\n")
print(tabulate(rankingDic.items() , headers=['rank', 'columnName'], tablefmt='orgtbl'))