Daisy Yi Ding, Shuangning Li, Balasubramanian Narasimhan, Robert Tibshirani
Multiview analysis with “-omics” data such as genomics and proteomics measured on a common set of samples represents an increasingly important challenge in biology and medicine.
Cooperative learning combines the usual squared error loss of predictions with an ''agreement'' penalty to encourage the predictions from different data views to agree. The method can be especially powerful when the different data views share some underlying relationship in their signals that can be exploited to boost the signals.
PNAS paper arXiv multiview R pkg vignetteLet \(X \in \mathbb{R}^{n \times p_x}\), \(Z \in \mathbb{R}^{n \times p_z}\) — representing two data views — and \(\mathbf{y} \in \mathbb{R}^{n}\) be a real-valued random variable (the target). Fixing the hyperparameter \(\rho\geq 0\), we propose to minimize:
\({\rm min} \; {\rm E}\Bigl[\frac{1}{2} (\mathbf{y}-f_X(X)-f_Z(Z))^2+ \frac{\rho}{2}(f_X(X)-f_Z(Z))^2\Bigr]\)
\(\scriptsize {\rm min} \; {\rm E}\Bigl[\frac{1}{2} (\mathbf{y}-f_X(X)-f_Z(Z))^2+ \frac{\rho}{2}(f_X(X)-f_Z(Z))^2\Bigr]\)
The first term is the usual prediction error, while the second term is an ''agreement'' penalty, encouraging the predictions from different views to agree.
The idea behind the method: Imagine that there are hidden underlying factors that are correlated with both of the feature views and the response. The agreement penalty exploits this underlying structure by encouraging the predictions from different views to align.
Commonly-used approaches to the multi-view problem can be broadly categorized into early and late fusion. Early fusion begins by transforming all datasets into a single representation, which is then used as the input to a supervised learning model of choice. Late fusion works by developing first-level models from individual data views and then combining the predictions by training a second-level model as the final predictor.
By varying the weight of the agreement penalty, cooperative learning yields a continuum of solutions that include early and late fusion:
Cooperative learning chooses the degree of agreement in an adaptive manner, using a validation set or cross-validation to estimate test set prediction error.
In the setting of cooperative regularized linear regression, the method combines the lasso penalty with the agreement penalty, yielding feature sparsity:
\({\rm min} \; \frac{1}{2} ||\mathbf{y}-X\theta_x- Z\theta_z||^2+ \frac{\rho}{2}||(X\theta_x- Z\theta_z)||^2 + \lambda (||\theta_x||_1+ ||\theta_z||_1). \)
\(\scriptsize {\rm min} \; \frac{1}{2} ||\mathbf{y}-X\theta_x- Z\theta_z||^2+ \frac{\rho}{2}||(X\theta_x- Z\theta_z)||^2 \) \(\scriptsize + \lambda (||\theta_x||_1+ ||\theta_z||_1). \)
One version of our fitting procedure is modular, where we can choose different fitting mechanisms (e.g. lasso, random forests, boosting, neural networks) appropriate for different data views.
We compare cooperative learning in the regression setting with early and late fusion methods in simulations. We generated Gaussian data with \(n=200\) and \(p=500\) in each of two views \(X\) and \(Z\), and created correlation between them using latent factors. The response y was generated as a linear combination of the latent factors, corrupted by Gaussian noise. Data are simulated with different levels of correlation between the two data views \(X\) and \(Z\), different contributions of \(X\) and \(Z\) to the signal, and different signal-to-noise ratios (SNR).
We compare the following methods: (1) separate \(X\) and separate \(Z\): the standard lasso is applied on the separate data views of \(X\) and \(Z\) with 10-fold CV; (2) early fusion: the standard lasso is applied on the concatenated data views of \(X\) and \(Z\) with 10-fold CV (note that this is equivalent to cooperative learning with \(\rho = 0\)); (3) late fusion: separate lasso models are first fitted on \(X\) and \(Z\) independently with 10-fold CV, and the two resulting predictors are then combined through linear least squares for the final prediction; (4) cooperative learning (regression) and adaptive cooperative learning. We evaluated the performance based on the mean-squared error (MSE) on a test set and conducted each simulation experiment 10 times.
Cooperative learning performs the best in terms of test MSE across the range of SNR and correlation settings. It is most helpful when the data views are correlated and both contain signal. When the correlation between data views is higher, higher values of \(\rho\) are more likely to be selected.
We applied cooperative learning to a data set of labor onset, collected from a cohort of women who went into labor spontaneously, as described in Stelzer et al., 2021. Proteome and metabolome were measured from blood samples collected from the patients during the last 120 days of pregnancy. The goal of the analysis is to predict time to spontaneous labor using proteomics and metabolomics data.
As shown in Table 1, cooperative learning outperforms early and late fusion, achieving the lowest MSE on the test set. In addition, cooperative learning identifies C1q as one of the most important features. While not identified by other methods, C1q plays a critical role in the complement cascade, which influences implantation and fetal development, and worth further investigation for its role in predicting labor onset.
Table 1. Multiomics studies on labor onset prediction.
Methods | Test MSE | Relative to Early Fusion | Number of Features Selected | ||
---|---|---|---|---|---|
Mean | Std | Mean | Std | Mean | |
Separate Proteomics | 475.51 | 80.89 | 69.14 | 81.44 | 26 |
Separate Metabolomics | 381.13 | 36.88 | -25.24 | 30.91 | 11 |
Early fusion | 406.37 | 44.77 | 0 | 0 | 15 |
Late fusion | 493.34 | 63.44 | 86.97 | 68.13 | 21 |
Cooperative learning | 335.84 | 38.51 | -70.53 | 32.60 | 52 |
dingd@stanford.edu
and tibs@stanford.edu