A Brief Review of Fisher Score:
Fisher Score is one of the most widely used supervised feature selection methods. However, it selects each feature independently according to their scores under the Fisher Criterion, which leads to a suboptimal subset of features. Kydavra 0.3's latest release provides a generalized Fisher score to jointly select features.
The key idea of Fisher score is to find a subset of features, such that in the data space spanned by the selected features, the distances between data points in different classes are as large as possible, while the distances between data points in the same class are as small as possible. For each feature, it selects the top-m ranked features with large scores. Because the score of each feature is computed independently, the features selected by the heuristic algorithm is suboptimal.
Using Kydavra FisherSelector:
To get started, make sure you have Kydravra installed on your machine.
Next, we need to import the model, create the selector, and apply it to our data.
selector = FisherSelector(10)
selected_cols = selector.select(df, ‘target’)
The select function takes as parameters the pandas data frame and the name of the target column.
The FisherSelector() takes the next parameter:
n_features(int, default=5) it represents the number of top features (according to the fisher score) to retain after feature selection is applied.
In our test, we use the
load_boston data set provided by the sklearn library.
Note: The algorithm works only on numeric data.
After doing some cleaning, the result our selector gave us looks like the following:
Now let’s compare the results before and after applying the selector.
Going furthermore, I compared the Mean Squared Error for every algorighm.
MSE for SVR : 34.728436105564406
MSE for SVR : 30.82835204515224
Results show that SVR performed better after the selection and with Linear Regression we got almost the same accuracy when used on all features.
Made with ❤ from Sigmoid.
Follow us on Facebook, Instagram and LinkedIn: