The post is not explaining details about the process of feature selection but purpose is to share some very important points regarding features selection.
Feature selection can be an important part of model selection. In supervised learning, including features in a model which do not provide information on the label is useless at best and may prevent generalization at worst.
Feature selection can involve application of several methods. Two important methods include:
- Eliminating features with low variance and zero variance. Zero variance features are comprised of the same values. Low variance features arise from features with most values the same and with few unique values. One way low variance features can arise, is from dummy variables for categories with very few members. The dummy variable will be mostly 0s with very few 1s.
- Training machine learning models with features that are uninformative can create a variety of problems. An uninformative feature does not significantly improve model performance. In many cases, the noise in the uninformative features will increase the variance of the model predictions. In other words, uninformative models are likely to reduce the ability of the machine learning model to generalize.