Title: | Penalised Multivariate Regression ('Multi-Target Learning') |
---|---|
Description: | Implements penalised multivariate regression (i.e., for multiple outcomes and many features) by stacked generalisation (<doi:10.1093/bioinformatics/btab576>). For positively correlated outcomes, a single multivariate regression is typically more predictive than multiple univariate regressions. Includes functions for model fitting, extracting coefficients, outcome prediction, and performance measurement. For optional comparisons, install 'remMap' from GitHub (<https://github.com/cran/remMap>). |
Authors: | Armin Rauschenberger [aut, cre] |
Maintainer: | Armin Rauschenberger <[email protected]> |
License: | GPL-3 |
Version: | 1.0.0 |
Built: | 2024-10-27 06:06:11 UTC |
Source: | https://github.com/rauschenberger/joinet |
The R package joinet
implements multivariate
ridge and lasso regression using stacked generalisation.
This multivariate regression typically outperforms
univariate regression at predicting correlated outcomes.
It provides predictive and interpretable models
in high-dimensional settings.
Use function joinet
for model fitting.
Type library(joinet)
and then ?joinet
or
help("joinet)"
to open its help file.
See the vignette for further examples.
Type vignette("joinet")
or browseVignettes("joinet")
to open the vignette.
Maintainer: Armin Rauschenberger [email protected] (ORCID)
Armin Rauschenberger and Enrico Glaab (2021) "Predicting correlated outcomes from molecular data". Bioinformatics 37(21):3889–3895. doi:10.1093/bioinformatics/btab576. (Click here to access PDF.)
Useful links:
Report bugs at https://github.com/rauschenberger/joinet/issues
## Not run: #--- data simulation --- n <- 50; p <- 100; q <- 3 X <- matrix(rnorm(n*p),nrow=n,ncol=p) Y <- replicate(n=q,expr=rnorm(n=n,mean=rowSums(X[,1:5]))) # n samples, p inputs, q outputs #--- model fitting --- object <- joinet(Y=Y,X=X) # slot "base": univariate # slot "meta": multivariate #--- make predictions --- y_hat <- predict(object,newx=X) # n x q matrix "base": univariate # n x q matrix "meta": multivariate #--- extract coefficients --- coef <- coef(object) # effects of inputs on outputs # q vector "alpha": intercepts # p x q matrix "beta": slopes #--- model comparison --- loss <- cv.joinet(Y=Y,X=X) # cross-validated loss # row "base": univariate # row "meta": multivariate ## End(Not run)
## Not run: #--- data simulation --- n <- 50; p <- 100; q <- 3 X <- matrix(rnorm(n*p),nrow=n,ncol=p) Y <- replicate(n=q,expr=rnorm(n=n,mean=rowSums(X[,1:5]))) # n samples, p inputs, q outputs #--- model fitting --- object <- joinet(Y=Y,X=X) # slot "base": univariate # slot "meta": multivariate #--- make predictions --- y_hat <- predict(object,newx=X) # n x q matrix "base": univariate # n x q matrix "meta": multivariate #--- extract coefficients --- coef <- coef(object) # effects of inputs on outputs # q vector "alpha": intercepts # p x q matrix "beta": slopes #--- model comparison --- loss <- cv.joinet(Y=Y,X=X) # cross-validated loss # row "base": univariate # row "meta": multivariate ## End(Not run)
Extracts pooled coefficients. (The meta learners linearly combines the coefficients from the base learners.)
## S3 method for class 'joinet' coef(object, ...)
## S3 method for class 'joinet' coef(object, ...)
object |
joinet object |
... |
further arguments (not applicable) |
This function returns the pooled coefficients.
The slot alpha
contains the intercepts
in a vector of length ,
and the slot
beta
contains the slopes
in a matrix with rows (inputs) and
columns.
## Not run: n <- 50; p <- 100; q <- 3 X <- matrix(rnorm(n*p),nrow=n,ncol=p) Y <- replicate(n=q,expr=rnorm(n=n,mean=rowSums(X[,1:5]))) object <- joinet(Y=Y,X=X) coef <- coef(object) ## End(Not run)
## Not run: n <- 50; p <- 100; q <- 3 X <- matrix(rnorm(n*p),nrow=n,ncol=p) Y <- replicate(n=q,expr=rnorm(n=n,mean=rowSums(X[,1:5]))) object <- joinet(Y=Y,X=X) coef <- coef(object) ## End(Not run)
Compares univariate and multivariate regression.
cv.joinet( Y, X, family = "gaussian", nfolds.ext = 5, nfolds.int = 10, foldid.ext = NULL, foldid.int = NULL, type.measure = "deviance", alpha.base = 1, alpha.meta = 1, compare = FALSE, mice = FALSE, cvpred = FALSE, times = FALSE, ... )
cv.joinet( Y, X, family = "gaussian", nfolds.ext = 5, nfolds.int = 10, foldid.ext = NULL, foldid.int = NULL, type.measure = "deviance", alpha.base = 1, alpha.meta = 1, compare = FALSE, mice = FALSE, cvpred = FALSE, times = FALSE, ... )
Y |
outputs:
numeric matrix with |
X |
inputs:
numeric matrix with |
family |
distribution:
vector of length |
nfolds.ext |
number of external folds |
nfolds.int |
number of internal folds |
foldid.ext |
external fold identifiers:
vector of length |
foldid.int |
internal fold identifiers:
vector of length |
type.measure |
loss function:
vector of length |
alpha.base |
elastic net mixing parameter for base learners:
numeric between |
alpha.meta |
elastic net mixing parameter for meta learners:
numeric between |
compare |
experimental arguments:
character vector with entries "mnorm", "spls", "mrce",
"sier", "mtps", "rmtl", "gpm" and others
(requires packages |
mice |
missing data imputation:
logical ( |
cvpred |
return cross-validated predictions: logical |
times |
measure computation time: logical |
... |
This function returns a matrix with columns,
including the cross-validated loss from the univariate models
(
base
), the multivariate models (meta
),
and the intercept-only models (none
).
## Not run: n <- 50; p <- 100; q <- 3 X <- matrix(rnorm(n*p),nrow=n,ncol=p) Y <- replicate(n=q,expr=rnorm(n=n,mean=rowSums(X[,1:5]))) cv.joinet(Y=Y,X=X) ## End(Not run) ## Not run: # correlated features n <- 50; p <- 100; q <- 3 mu <- rep(0,times=p) Sigma <- 0.90^abs(col(diag(p))-row(diag(p))) X <- MASS::mvrnorm(n=n,mu=mu,Sigma=Sigma) mu <- rowSums(X[,sample(seq_len(p),size=5)]) Y <- replicate(n=q,expr=rnorm(n=n,mean=mu)) #Y <- t(MASS::mvrnorm(n=q,mu=mu,Sigma=diag(n))) cv.joinet(Y=Y,X=X) ## End(Not run) ## Not run: # other distributions n <- 50; p <- 100; q <- 3 X <- matrix(rnorm(n*p),nrow=n,ncol=p) eta <- rowSums(X[,1:5]) Y <- replicate(n=q,expr=rbinom(n=n,size=1,prob=1/(1+exp(-eta)))) cv.joinet(Y=Y,X=X,family="binomial") Y <- replicate(n=q,expr=rpois(n=n,lambda=exp(scale(eta)))) cv.joinet(Y=Y,X=X,family="poisson") ## End(Not run) ## Not run: # uncorrelated outcomes n <- 50; p <- 100; q <- 3 X <- matrix(rnorm(n*p),nrow=n,ncol=p) y <- rnorm(n=n,mean=rowSums(X[,1:5])) Y <- cbind(y,matrix(rnorm(n*(q-1)),nrow=n,ncol=q-1)) cv.joinet(Y=Y,X=X) ## End(Not run) ## Not run: # sparse and dense models n <- 50; p <- 100; q <- 3 X <- matrix(rnorm(n*p),nrow=n,ncol=p) Y <- replicate(n=q,expr=rnorm(n=n,mean=rowSums(X[,1:5]))) set.seed(1) # fix folds cv.joinet(Y=Y,X=X,alpha.base=1) # lasso set.seed(1) cv.joinet(Y=Y,X=X,alpha.base=0) # ridge ## End(Not run)
## Not run: n <- 50; p <- 100; q <- 3 X <- matrix(rnorm(n*p),nrow=n,ncol=p) Y <- replicate(n=q,expr=rnorm(n=n,mean=rowSums(X[,1:5]))) cv.joinet(Y=Y,X=X) ## End(Not run) ## Not run: # correlated features n <- 50; p <- 100; q <- 3 mu <- rep(0,times=p) Sigma <- 0.90^abs(col(diag(p))-row(diag(p))) X <- MASS::mvrnorm(n=n,mu=mu,Sigma=Sigma) mu <- rowSums(X[,sample(seq_len(p),size=5)]) Y <- replicate(n=q,expr=rnorm(n=n,mean=mu)) #Y <- t(MASS::mvrnorm(n=q,mu=mu,Sigma=diag(n))) cv.joinet(Y=Y,X=X) ## End(Not run) ## Not run: # other distributions n <- 50; p <- 100; q <- 3 X <- matrix(rnorm(n*p),nrow=n,ncol=p) eta <- rowSums(X[,1:5]) Y <- replicate(n=q,expr=rbinom(n=n,size=1,prob=1/(1+exp(-eta)))) cv.joinet(Y=Y,X=X,family="binomial") Y <- replicate(n=q,expr=rpois(n=n,lambda=exp(scale(eta)))) cv.joinet(Y=Y,X=X,family="poisson") ## End(Not run) ## Not run: # uncorrelated outcomes n <- 50; p <- 100; q <- 3 X <- matrix(rnorm(n*p),nrow=n,ncol=p) y <- rnorm(n=n,mean=rowSums(X[,1:5])) Y <- cbind(y,matrix(rnorm(n*(q-1)),nrow=n,ncol=q-1)) cv.joinet(Y=Y,X=X) ## End(Not run) ## Not run: # sparse and dense models n <- 50; p <- 100; q <- 3 X <- matrix(rnorm(n*p),nrow=n,ncol=p) Y <- replicate(n=q,expr=rnorm(n=n,mean=rowSums(X[,1:5]))) set.seed(1) # fix folds cv.joinet(Y=Y,X=X,alpha.base=1) # lasso set.seed(1) cv.joinet(Y=Y,X=X,alpha.base=0) # ridge ## End(Not run)
Implements multivariate elastic net regression.
joinet( Y, X, family = "gaussian", nfolds = 10, foldid = NULL, type.measure = "deviance", alpha.base = 1, alpha.meta = 1, weight = NULL, sign = NULL, ... )
joinet( Y, X, family = "gaussian", nfolds = 10, foldid = NULL, type.measure = "deviance", alpha.base = 1, alpha.meta = 1, weight = NULL, sign = NULL, ... )
Y |
outputs:
numeric matrix with |
X |
inputs:
numeric matrix with |
family |
distribution:
vector of length |
nfolds |
number of folds |
foldid |
fold identifiers:
vector of length |
type.measure |
loss function:
vector of length |
alpha.base |
elastic net mixing parameter for base learners:
numeric between |
alpha.meta |
elastic net mixing parameter for meta learners:
numeric between |
weight |
input-output relations:
matrix with |
sign |
output-output relations:
matrix with |
... |
further arguments passed to |
input-output relations:
In this matrix with rows and
columns,
the entry in the
th row and the
th column
indicates whether the
th input may be used for
modelling the
th output
(where
means "exclude" and
means "include").
By default (
sign=NULL
),
all entries are set to .
output-output relations:
In this matrix with rows and
columns,
the entry in the
th row and the
th column
indicates how the
th output may be used for
modelling the
th output
(where
means negative effect,
means no effect,
means positive effect,
and
means any effect).
There are three short-cuts for filling up this matrix:
(1) sign=1
sets all entries to (non-negativity constraints).
This is useful if all pairs of outcomes
are assumed to be positively correlated
(potentially after changing the sign of some outcomes).
(2)
code=NA
sets all diagonal entries to
and all off-diagonal entries to
NA
(no constraints).
(3) sign=NULL
uses Spearman correlation to determine the entries,
with for significant negative,
for insignificant,
for significant positive correlations.
elastic net:
alpha.base
controls input-output effects,
alpha.meta
controls output-output effects;
lasso renders sparse models (alpha
),
ridge renders dense models (
alpha
)
This function returns an object of class joinet
.
Available methods include
predict
,
coef
,
and weights
.
The slots base
and meta
each contain
cv.glmnet
-like objects.
Armin Rauschenberger and Enrico Glaab (2021) "Predicting correlated outcomes from molecular data". Bioinformatics 37(21):3889–3895. doi:10.1093/bioinformatics/btab576. (Click here to access PDF.)
cv.joinet
, vignette
## Not run: n <- 50; p <- 100; q <- 3 X <- matrix(rnorm(n*p),nrow=n,ncol=p) Y <- replicate(n=q,expr=rnorm(n=n,mean=rowSums(X[,1:5]))) object <- joinet(Y=Y,X=X) ## End(Not run) ## Not run: browseVignettes("joinet") # further examples ## End(Not run)
## Not run: n <- 50; p <- 100; q <- 3 X <- matrix(rnorm(n*p),nrow=n,ncol=p) Y <- replicate(n=q,expr=rnorm(n=n,mean=rowSums(X[,1:5]))) object <- joinet(Y=Y,X=X) ## End(Not run) ## Not run: browseVignettes("joinet") # further examples ## End(Not run)
Predicts outcome from features with stacked model.
## S3 method for class 'joinet' predict(object, newx, type = "response", ...)
## S3 method for class 'joinet' predict(object, newx, type = "response", ...)
object |
joinet object |
newx |
covariates:
numeric matrix with |
type |
character "link" or "response" |
... |
further arguments (not applicable) |
This function returns predictions from base and meta learners.
The slots base
and meta
each contain a matrix
with rows (samples) and
columns (variables).
## Not run: n <- 50; p <- 100; q <- 3 X <- matrix(rnorm(n*p),nrow=n,ncol=p) Y <- replicate(n=q,expr=rnorm(n=n,mean=rowSums(X[,1:5]))) Y[,1] <- 1*(Y[,1]>median(Y[,1])) object <- joinet(Y=Y,X=X,family=c("binomial","gaussian","gaussian")) predict(object,newx=X) ## End(Not run)
## Not run: n <- 50; p <- 100; q <- 3 X <- matrix(rnorm(n*p),nrow=n,ncol=p) Y <- replicate(n=q,expr=rnorm(n=n,mean=rowSums(X[,1:5]))) Y[,1] <- 1*(Y[,1]>median(Y[,1])) object <- joinet(Y=Y,X=X,family=c("binomial","gaussian","gaussian")) predict(object,newx=X) ## End(Not run)
Extracts coefficients from the meta learner, i.e. the weights for the base learners.
## S3 method for class 'joinet' weights(object, ...)
## S3 method for class 'joinet' weights(object, ...)
object |
joinet object |
... |
further arguments (not applicable) |
This function returns a matrix with
rows and
columns.
The first row contains the intercepts,
and the other rows contain the slopes,
which are the effects of the outcomes
in the row on the outcomes in the column.
## Not run: n <- 50; p <- 100; q <- 3 X <- matrix(rnorm(n*p),nrow=n,ncol=p) Y <- replicate(n=q,expr=rnorm(n=n,mean=rowSums(X[,1:5]))) object <- joinet(Y=Y,X=X) weights(object) ## End(Not run)
## Not run: n <- 50; p <- 100; q <- 3 X <- matrix(rnorm(n*p),nrow=n,ncol=p) Y <- replicate(n=q,expr=rnorm(n=n,mean=rowSums(X[,1:5]))) object <- joinet(Y=Y,X=X) weights(object) ## End(Not run)