Feature Selection with PhikSelector from Kydavra

Feature Selection with PhikSelector from Kydavra

We all now how important is to select the best features when it comes to getting the best possible accuracy score for our model. And here comes to help PhikSelector.

𝜙k — is a good way to measure the correlation coefficient of the data, given the fact that it surely has improvements over Pearson’s correlation coefficient and others, such as working with ordinal, categorical and continuous variables, has a noise reduction method built-in if it comes to statistical changes, it can seize dependencies that are not linear. Thus, it is a really nice tool to use for feature selection.

Using 𝜙k with PhikSelector from Kydavra

First of all, you will need to install the Kydavra library:

pip install kydavra 

Don’t forget to update it in case it is already installed:

pip install kydavra -- upgrade

Now, we will need to import the model, generate the selector, and then apply it to our dataset :

from kydavra import PhikSelector

phks = PhikSelector()
phks.select(df, 'target')

The select function is taking the dataframe and the given target column as parameters. As for the selector , it takes the following parameters:

  • min_corr (float, between 0 and 1, default = 0.5): The minimal positive correlation value that must the feature have with the target value.
  • max_corr (float, between 0 and 1, default = 0.8): The maximal positive correlation value that must the feature have with the target value.
  • erase_corr (boolean, default = False): If set as False the selector doesn’t erase features that are highly correlated between themselves. If set as True the selector does erase features that are highly correlated between themselves

Let’s try it out:

Let us test the performance of the PhikSelector on the Heart Disease UCI dataset:

Now, we will apply Linear Regression for the next features :

['age' 'sex' 'cp' 'trestbps' 'chol' 'fbs' 'restecg' 'thalach' 'exang' 'oldpeak' 'slope' 'ca' 'thal']

We are getting the next mean absolute error : 0.323376589727377

After applying the PhikSelector, we can see that the best features according to it are :

['age', 'sex', 'cp', 'thalach', 'exang', 'oldpeak', 'ca', 'thal']

So, we will re-use the Linear Regression and get the following mean absolute error : 0.2936848634351146.

As we can see, the error decreased, which is the goal we wanted to achieve.

Conclusion

PhikSelector is a really useful tool if you want to train your model on the best features in order to achieve an amazing score ( or at least acceptable ).

Made with ❤ by Sigmoid.

If you tried kydavra we invite you to share your impression by filling out this form.

Useful links :

Follow us on:

Discussion

Community guidelines