Comparison of classifiers

Dettling (2004) compared PAM to six other classfiers on a number of real datasets. Full details are at
Bioinformatics paper
Slightly outdated version

The datasets are available at Bagboost site

We reran PAM on the datasets, using default parameter choices in every case. We got similar results to Dettling, except on the Prostate and lymphoma datasets, where our results were quite a bit better.
Dettling's results for PAM are not within 2 SEs of ours.

The following table gives a summary of the results. All figures are from Dettling's Table 1, except for the PAM results which we produced.

% Misclassification rates on test set

         Leuk Colon  Pros  Lym SBRCT Brain  Ave error  Ave rank
Bagboost 4.08 16.10  7.53 1.62 1.24 23.86     9.07      3.3
Boosting 5.67 19.14  8.71 6.29 6.19 27.57    12.26      5.7
RanFor   1.92 14.86  9.00 1.24 3.71 33.71    10.74      4.0
SVM      1.83 15.05  7.88 1.62 2.00 28.29     9.45      3.1
PAM      3.55 13.53  8.87 1.65 2.40 23.45     8.87      3.2
DLDA     2.92 12.86 14.18 2.19 2.19 28.57    10.48      4.3        
kNN      3.83 16.38 10.59 1.52 1.43 29.71    10.58      4.5


Bagboost is Dettling's method that combines bagging and boosting; RanFor is Breiman's random forests. SVM is support vector machines. DLDA is diagonal linear discrimiant analysis --- similar to PAM, but without a facility for shrinkage. kNN is k-nearest neighbors.

SEs for PAM


Leukemia  (.04%)
Colon     (.83%)
Prostate  (.63%)
Lymphoma  (.32%)
SBRBT     (.29%)
Brain     (1.7%)


Overall, PAM has the lowest average error rate and is just slightly behind SVM in average rank. PAM is probably the simplest of all of these methods, and does automatic feature (gene) selection as well.