Today I worked on creating Python code to automatically select features to use for SVM. The idea (based on Fabiana’s paper) was to start with no features and iterate over all possible features adding the feature that had the best result. I then repeat until either no features remain or their was no improvement to the model.

My first attempt didn’t work out, but I then found scikit-learn has a class called SequentialFeatureSelector which sounds similar to what I was trying to do:

Forward-SFS is a greedy procedure that iteratively finds the best new feature to add to the set of selected features. Concretely, we initially start with zero features and find the one feature that maximizes a cross-validated score when an estimator is trained on this single feature. Once that first feature is selected, we repeat the procedure by adding a new feature to the set of selected features. The procedure stops when the desired number of selected features is reached, as determined by the n_features_to_select parameter. Backward-SFS follows the same idea but works in the opposite direction: instead of starting with no features and greedily adding features, we start with all the features and greedily remove features from the set. The direction parameter controls whether forward or backward SFS is used.

(Source: https://scikit-learn.org/stable/modules/feature_selection.html#sequential-feature-selection)

The python code is given below:

scorer = make_scorer(matthews_corrcoef)
clf_svm = svm.SVC(**model_params)
clf_sfs = SequentialFeatureSelector(
    clf_svm,
    scoring = scorer
)
pipeline = make_pipeline(clf_sfs, clf_svm)
pipeline.fit(X, y)