Inductive Solutions, Inc. - RUNPCA Principal Component Analysis

Inductive Solutions, Inc.

380 Rector Place, Suite 4A, New York, New York 10280

Email Telephone: +1 (212)945.0630 Fax: +1 (212)945.0367

Products and Services
Software Products
Recommended Books

Bibliography and White Papers
Free Downloads

RunPCA

How do you mine a set of data for the most significant factors? How can you pre-condition a set of data to increase the speed and accuracy of regression analysis or neural network training? How do you "reduce the dimensionality" of high dimensional problems?

RunPCA is an information discovery ("datamining") tool based on Principal Component Analysis, a statistical method that transforms a set of data inputs into a new smaller set of uncorrelated inputs ordered by information content.

RunPCA Features

Computes means, variances, covariances, and correlations of large data sets

Computes and ranks principal components and their variances

Automatically transforms data sets

Can analyze datasets up to 50,000 rows and 200 columns

Benefits

Easy-to-Learn and Easy-to-Use Excel Spreadsheet User Interface

Computation is very fast

The RunPCA C/C++ Library is available for further customization

License

The standard single user license is for Microsoft Windows. Other licensing plans for other platforms are also available. Contact us about versions for other operating systems (such as Linux or Solaris), about site licenses, or about academic discounts.

For example, suppose we have a table of 10,000 rows and 3 columns (or "factors") and we want to discover some sort of relationship between the three columns. The following table shows how the variance of the data of each column is distributed:

Variance Fraction Accumulated

0.381745 66.57 66.57

0.095436 16.64 18.32

0.096271 16.79 100

The most information (highest variance) is contained in the first column (almost two-thirds of the information as indicated in the first row of the table). The remaining information is split almost evenly into the other two factors (as indicated in the next two rows).

After processing by RunPCA, the three original columns are transformed to "principal factors." Now the variance of the transformed data (consisting of the three principal factors) is distributed as follows:

Variance	Fraction	Accumulated
0.494189	86.18	86.18
0.079263	13.82	100
0	0	100

Now most of the information (highest variance) is contained in the first principal factor (86% of the information). The remaining information is contained entirely in the second principal factor. This effectively reduces the dimension of the data by 33%. This means that if we have additional data of observed responses (or target outputs), then we can perform regression (or train a neural network) using only two columns of the transformed data, rather than the three columns of the original data. This can improve the speed and accuracy of the training or regression.

Purchase Student Version