Correlate is an Excel plug-in that performs sparse canonical correlation analysis.
If two sets of assays (e.g. gene expression and DNA copy number) have been performed on the same set of patient samples then sparse CCA can be used to find a set of variables in assay 1 that is maximally correlated with a set of variables in assay 2.
Overview of Correlate:
- Correlate is a very flexible tool for correlating any pair of data sets with measurements taken on the same set of samples. For instance you can use it to correlate a set of clinical variables with a set of genomic measurements.
- Correlate is a point-and-click Excel interface for the R package PMA.
- Correlate implements methods proposed in the following paper: Witten DM, Tibshirani R, and T Hastie (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3): 515-534.[pdf]
- See some Correlate screenshots.
- Meet the authors!
DOWNLOAD (37MB) and please EMAIL Rob Tibshirani to let him know.
You may have to clear your history, cache, if you have visited this site recently because of a recent change that affected a redirect!
Getting started with Correlate:
- Download Correlate and follow the installation instructions.
- Flip through the Correlate manual.
- Step through a typical Correlate analysis:
- Put the data in a single Excel workbook containing two worksheets: one containing data set 1 and the other containing data set 2. An example is here.
- Open the Addins menu item in Excel and click on ``Correlate''.
- Load the data into Correlate.
- Run Correlate using automatic tuning parameter selection.
- Inspect the resulting plots to choose a tuning parameter value.
- Re-run Correlate using a large number of permutations to get a meaningful p-value.
- The resulting weight vectors for the two data sets define a set of variables in the first data set that is maximally correlated with a set of variables in the second data set.