What is nearest shrunken centroid classification?

PAM uses the nearest shrunken centroid methodology described in:

  • Narashiman and Chu (2002):
    "Diagnosis of multiple cancer types by shrunken centroids of gene expression" .
    PNAS 2002 99:6567-6572 (May 14).
  • Talk slides in Postscript ; Talk slides in PDF

    Briefly, the method computes a standardized centroid for each class. This is the average gene expression for each gene in each class divided by the within-class standard deviation for that gene.

    Nearest centroid classification takes the gene expression profile of a new sample, and compares it to each of these class centroids. The class whose centroid that it is closest to, in squared distance, is the predicted class for that new sample.

    Nearest shrunken centroid classification makes one important modification to standard nearest centroid classification. It "shrinks" each of the class centroids toward the overall centroid for all classes by an amount we call the threshold . This shrinkage consists of moving the centroid towards zero by threshold, setting it equal to zero if it hits zero. For example if threshold was 2.0, a centroid of 3.2 would be shrunk to 1.2, a centroid of -3.4 would be shrunk to -1.4, and a centroid of 1.2 would be shrunk to zero.

    After shrinking the centroids, the new sample is classified by the usual nearest centroid rule, but using the shrunken class centroids.

    This shrinkage has two advantages: 1) it can make the classifier more accurate by reducing the effect of noisy genes, 2) it does automatic gene selection. In particular, if a gene is shrunk to zero for all classes, then it is eliminated from the prediction rule. Alternatively, it may be set to zero for all classes except one, and we learn that high or low expression for that gene characterizes that class.

    The user decides on the value to use for threshold. Typically one examines a number of different choices. To guide in this choice, PAM does K-fold cross-validation for a range of threshold values. The samples are divided up at random into K roughly equally sized parts. For each part in turn, the classifier is built on the other K-1 parts then tested on the remaining part. This is done for a range of threshold values, and the cross-validated misclassification error rate is reported for each threshold value. Typically, the user would choose the threshold value giving the minimum cross-validated misclassification error rate.

    What one gets from this is a (typically) accurate classifier, that is simple to understand.