Principal Component Analysis Filter

Reduces the dimensionality of data via the Principal Component Analysis.

Category

category_dimensionality_reduction Dimensionality Reduction

Node

pca_node

Parameters

CutoffMethod: whether to determine the number of resulting dimensions of the PCA dynamically by variance (ByVariance, see below), or to fix the number of dimensions (Fixed)

CutOffAdditionalVariance: when the dimensionality is determined dynamically, how much variance must the next principal component explain if it is still to be considered (range is from 0 to 1, where 1 is 100%)

CutOffPrincipalComponents: when the dimensionality is determined dynamically, how much variance may the existing principal components explain to cause a cutoff of the rest (range is from 0 to 1, where 1 is 100%)

MaxDimensionality: the maximum number of principal components to find in the dynamic case before forcing a cutoff

Dimensionality: in the fixed case this specifies the number of principal components to find

Scaling: how the resulting data should be scaled (see below)

Inputs

Input: the high-dimensional input data

Outputs

Output: the data projected onto the subspace

Effect of the Filter

This filter may be used to reduce the dimensionality of the input data. During training it processes all input data that it was given (except for pixels that were marked as ignored) and tries to organzie the training data into principal components in order of the explained variance.

All data is then projected onto that subspace to reduce the dimensionality.

For example, if a data set with the following set of spectra is loaded and used in conjunction with the PCA:

_images/pca_input.png

Then the PCA can easily separate the various different spectra, in this case with two latent vectors:

_images/pca_output.png

Fixed vs. By Variance Cutoff

The number of principal components used can either be fixed by the user (by selecting CutoffMethod to be Fixed), or it can be automatically determined.

When automatically determined there are three different cutoffs that are all checked after all possible principal components have been ordered in decreasing level of variance of the input data that they explain:

  • If the number of principal components reaches the maximum number as specified in the MaxDimensionality parameter, then the cutoff happens at this point

  • If the next principal component that would be added to the already existing list of principal components explains less variance than what the parameter CutOffAdditionalVariance requires, that component will not be included, and the algorithm will terminate at this point

  • If after adding a principal component the total variance that is now explained is larger than the parameter CutOffPrincipalComponents, the algorithm will terminate at this point

By using this dynamic cutoff method it allows the user to determine how much variance there actually is in the data, without fitting to noise that is present in the training data.

See Also