Support Vector Machines Filter

Uses the Support Vector Machines algorithm to classify data.

Category

category_classification Classification

Node

svm_node

Parameters

Solver: The SVM solver to use (currently only C-SVC is available)

Classification Type: When more than 2 groups are used, whether to perform the SVM algorithm pairwise between groups (One-Versus-One), or whether to perform the algorithm between each group and the totality of the rest of the groups (One-Versus-All)

C-Value: Which hyperplanes the SVM should prioritize. A large value will try to fit the hyperplane such that it best separates the training data, even if outliers cause the separating plane to lie closer to one of the distributions; a smaller value will tend to ignore outliers more and separate the distributions of the training data better. Which is preferable will depend on in which manner the training data sample is representative of the future data that will be classified

Kernel Type: Which kernel to use (see below)

a, b, c, d: the kernel parameters for the non-linear case

Classification Threshold: the threshold to use for the distance between a data point and a group. If the calculated similarity is below the given threshold the value will not be considered to be part of that group, even if that group matches better

Output Configuration/SimilarityLevel: whether to also output the similarity level in a second optional output

Inputs

Input: The input data

Outputs

Classification: the (single) classification result

SimilarityLevel (optional): the similarity level of the data with the group of the classification result (or 0 if no classification could be determined)

Effect of the Filter

The filter requires input data where individual data points (pixels) were assigned to various groups. The classifier trains this data using the chosen support vector machines (SVM) solver. It will then apply that solution to the input data during execution, finding a unique group that fits each individual input data point (pixel) best. (Or it will indicate that for a given pixel no group matches that point well, depending on the settings such as Classification Threshold.)

Only groups that have been marked Applicable in the parameters are considered for this filter. (By default all groups are considered applicable.)

The training time will increase with the number of spectra assigned to each group (at times significantly), but the execution time will only depend on the parameters and the number of groups.

One-Versus-One and One-Versus-All

Since the SVM algorithm separates groups for the purpose of classification, there is only a single way to classify data when only two groups are part of the training data set.

If there are more than two groups, then there are two possible variants that may be employed:

  • In the One-Versus-One case a SVM solution is found for each pair of groups, and the group that has the most wins (relative to each other group) will be considered the classification result. This will scale as \mathcal{O}(N^2) with the number of groups.

  • In the One-Versus-All case a SVM solution is found for each group as compared to the data of all remaining groups put together. For example, if there are three groups A, B, and C, then three classifiers A vs. B+C, B vs. A+C, and C vs. A+B, will be generated. In that case, if an individual group wins out against the remainder of the data put together, that will be considered the classification result. (If multiple individual groups win out, the one with the closest distance will be used.)

Kernel Types

The user may select from various different kernel types that rearrange the data for classification purposes so that the hyperplanes separating the data may be curved in the original space.

Note that there is a large performance difference switching from the linear kernel to any non-linear kernel.

The following kernels are implemented:

Kernel

Formula

Linear

K(\mathbf{x}_m,\mathbf{x}_n) = \mathbf{x}_m^{\mathrm{T}} \mathbf{x}_n

Polynomial

K(\mathbf{x}_m,\mathbf{x}_n) = \left(a \cdot \mathbf{x}_m^{\mathrm{T}} \mathbf{x}_n + b\right)^d

Gaussian

K(\mathbf{x}_m,\mathbf{x}_n) = \mathrm{exp} \left( - a |\mathbf{x}_m - \mathbf{x}_n|^2 \right)

Sigmoidal

K(\mathbf{x}_m,\mathbf{x}_n) = \mathrm{tanh} \left(a \cdot \mathbf{x}_m^{\mathrm{T}} \mathbf{x}_n + b\right)

Threshold

The threshold may be used to exclude matches that aren’t similar enough to the training distributions. A threshold of 0 indicates that all matches that fulfill the general algorithm will be used. The value ranges from 0 to 1, where 1 indicates the match should be infinitely strict. The similarity level can also be output using the optional second output of the filter.

See also