Support Vector Machines (SVM) Concepts

Support Vector Machines find the decision boundary that maximises the margin between classes, making predictions that are as confidently separated as possible from training examples.

The Maximum Margin Classifier

Among all possible hyperplanes that separate two classes, SVMs find the one with the largest margin — the distance to the nearest data points of each class (the support vectors). A wider margin means more robust generalisation.

Support Vectors

Support vectors are the training examples that lie closest to the decision boundary. They are the only points that actually influence where the boundary is placed — all other training examples are irrelevant to the final model. This makes SVMs memory-efficient and robust to outliers far from the boundary.

Hard vs. Soft Margin

A hard-margin SVM requires perfect linear separability — if even one point is on the wrong side, no solution exists. The soft-margin SVM (controlled by hyperparameter C) allows some misclassifications, trading a wider margin for tolerance of noisy data points.

SVM in scikit-learn

scikit-learn's SVC implements soft-margin SVMs with multiple kernel options.

Basic SVC Usage

<pre><code class="language-python">from sklearn.svm import SVC from sklearn.preprocessing import StandardScaler from sklearn.pipeline import make_pipeline from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split X, y = load_breast_cancer(return_X_y=True) X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=42) svm = make_pipeline(StandardScaler(), SVC(C=1.0, kernel="rbf")) svm.fit(X_tr, y_tr) print(f"SVM Accuracy: {svm.score(X_te, y_te):.4f}")</pre>