Package 'cornet'

Title: Penalised Regression for Dichotomised Outcomes
Description: Implements lasso and ridge regression for dichotomised outcomes (<doi:10.1080/02664763.2023.2233057>), i.e., numerical outcomes that were transformed to binary outcomes. Such artificial binary outcomes indicate whether an underlying measurement is greater than a threshold.
Authors: Armin Rauschenberger [aut, cre]
Maintainer: Armin Rauschenberger <[email protected]>
License: GPL-3
Version: 1.0.0
Built: 2024-10-26 05:57:37 UTC
Source: https://github.com/rauschenberger/cornet

Help Index


Arguments

Description

Verifies whether an argument matches formal requirements.

Usage

.check(
  x,
  type,
  dim = NULL,
  miss = FALSE,
  min = NULL,
  max = NULL,
  values = NULL,
  inf = FALSE,
  null = FALSE
)

Arguments

x

argument

type

character "string", "scalar", "vector", "matrix"

dim

vector/matrix dimensionality: integer scalar/vector

miss

accept missing values: logical

min

lower limit: numeric

max

upper limit: numeric

values

only accept specific values: vector

inf

accept infinite (Inf or -Inf) values: logical

null

accept NULL: logical

Examples

cornet:::.check(0.5,type="scalar",min=0,max=1)

Equality

Description

Verifies whether two or more arguments are identical.

Usage

.equal(..., na.rm = FALSE)

Arguments

...

scalars, vectors, or matrices of equal dimensions

na.rm

remove missing values: logical

Examples

cornet:::.equal(1,1,1)

Data simulation

Description

Simulates data for unit tests

Usage

.simulate(n, p, cor = 0, prob = 0.1, sd = 1, exp = 1, frac = 1)

Arguments

n

sample size: positive integer

p

covariate space: positive integer

cor

correlation coefficient : numeric between 00 and 11

prob

effect proportion: numeric between 00 and 11

sd

standard deviation: positive numeric

exp

exponent: positive numeric

frac

class proportion: numeric between 00 and 11

Details

For simulating correlated features (cor>0>0), this function requires the R package MASS (see mvrnorm).

Value

Returns invisible list with elements y and X.

Examples

data <- cornet:::.simulate(n=10,p=20)
names(data)

Single-split test

Description

Compares models for a continuous response with a cut-off value.

Usage

.test(y, cutoff, X, alpha = 1, type.measure = "deviance")

Arguments

y

continuous outcome: vector of length nn

cutoff

cut-off point for dichotomising outcome into classes: meaningful value between min(y) and max(y)

X

features: numeric matrix with nn rows (samples) and pp columns (variables)

alpha

elastic net mixing parameter: numeric between 00 (ridge) and 11 (lasso)

type.measure

loss function for binary classification: character "deviance", "mse", "mae", or "class" (see cv.glmnet)

Details

Splits samples into 8080 percent for training and 2020 percent for testing, calculates squared deviance residuals of logistic and combined regression, conducts the paired one-sided Wilcoxon signed rank test, and returns the pp-value. For the multi-split test, use the median pp-value from 5050 single-split tests (van de Wiel 2009).

Examples

n <- 100; p <- 200
y <- rnorm(n)
X <- matrix(rnorm(n*p),nrow=n,ncol=p)
cornet:::.test(y=y,cutoff=0,X=X)

Extract estimated coefficients

Description

Extracts estimated coefficients from linear and logistic regression, under the penalty parameter that minimises the cross-validated loss.

Usage

## S3 method for class 'cornet'
coef(object, ...)

Arguments

object

cornet object

...

further arguments (not applicable)

Value

This function returns a matrix with nn rows and two columns, where nn is the sample size. It includes the estimated coefficients from linear regression (1st column: "beta") and logistic regression (2nd column: "gamma").

Examples

n <- 100; p <- 200
y <- rnorm(n)
X <- matrix(rnorm(n*p),nrow=n,ncol=p)
net <- cornet(y=y,cutoff=0,X=X)
coef(net)

Combined regression

Description

Implements lasso and ridge regression for dichotomised outcomes. Such outcomes are not naturally but artificially binary. They indicate whether an underlying measurement is greater than a threshold.

Usage

cornet(
  y,
  cutoff,
  X,
  alpha = 1,
  npi = 101,
  pi = NULL,
  nsigma = 99,
  sigma = NULL,
  nfolds = 10,
  foldid = NULL,
  type.measure = "deviance",
  ...
)

Arguments

y

continuous outcome: vector of length nn

cutoff

cut-off point for dichotomising outcome into classes: meaningful value between min(y) and max(y)

X

features: numeric matrix with nn rows (samples) and pp columns (variables)

alpha

elastic net mixing parameter: numeric between 00 (ridge) and 11 (lasso)

npi

number of pi values (weighting)

pi

pi sequence: vector of increasing values in the unit interval; or NULL (default sequence)

nsigma

number of sigma values (scaling)

sigma

sigma sequence: vector of increasing positive values; or NULL (default sequence)

nfolds

number of folds: integer between 33 and nn

foldid

fold identifiers: vector with entries between 11 and nfolds; or NULL (balance)

type.measure

loss function for binary classification: character "deviance", "mse", "mae", or "class" (see cv.glmnet)

...

further arguments passed to glmnet

Details

The argument family is unavailable, because this function fits a gaussian model for the numeric response, and a binomial model for the binary response.

Linear regression uses the loss function "deviance" (or "mse"), but the loss is incomparable between linear and logistic regression.

The loss function "auc" is unavailable for internal cross-validation. If at all, use "auc" for external cross-validation only.

Value

Returns an object of class cornet, a list with multiple slots:

  • gaussian: fitted linear model, class glmnet

  • binomial: fitted logistic model, class glmnet

  • sigma: scaling parameters sigma, vector of length nsigma

  • pi: weighting parameters pi, vector of length npi

  • cvm: evaluation loss, matrix with nsigma rows and npi columns

  • sigma.min: optimal scaling parameter, positive scalar

  • pi.min: optimal weighting parameter, scalar in unit interval

  • cutoff: threshold for dichotomisation

References

Armin Rauschenberger and Enrico Glaab (2024). "Predicting dichotomised outcomes from high-dimensional data in biomedicine". Journal of Applied Statistics 51(9):1756-1771. doi:10.1080/02664763.2023.2233057. (Click here to access PDF. Contact: [email protected].)

See Also

Methods for objects of class cornet include coef and predict.

Examples

n <- 100; p <- 200
y <- rnorm(n)
X <- matrix(rnorm(n*p),nrow=n,ncol=p)
net <- cornet(y=y,cutoff=0,X=X)
net

Performance measurement

Description

Compares models for a continuous response with a cut-off value.

Usage

cv.cornet(
  y,
  cutoff,
  X,
  alpha = 1,
  nfolds.ext = 5,
  nfolds.int = 10,
  foldid.ext = NULL,
  foldid.int = NULL,
  type.measure = "deviance",
  rf = FALSE,
  xgboost = FALSE,
  ...
)

Arguments

y

continuous outcome: vector of length nn

cutoff

cut-off point for dichotomising outcome into classes: meaningful value between min(y) and max(y)

X

features: numeric matrix with nn rows (samples) and pp columns (variables)

alpha

elastic net mixing parameter: numeric between 00 (ridge) and 11 (lasso)

nfolds.ext

number of external folds

nfolds.int

internal fold identifiers: vector of length nn with entries between 11 and nfolds.int; or NULL

foldid.ext

external fold identifiers: vector of length nn with entries between 11 and nfolds.ext; or NULL

foldid.int

number of internal folds

type.measure

loss function for binary classification: character "deviance", "mse", "mae", or "class" (see cv.glmnet)

rf

comparison with random forest: logical

xgboost

comparison with extreme gradient boosting: logical

...

further arguments passed to cornet or glmnet

Details

Computes the cross-validated loss of logistic and combined regression.

Examples

## Not run: n <- 100; p <- 200
y <- rnorm(n)
X <- matrix(rnorm(n*p),nrow=n,ncol=p)
start <- Sys.time()
loss <- cv.cornet(y=y,cutoff=0,X=X)
end <- Sys.time()
end - start

loss
## End(Not run)

Plot loss matrix

Description

Plots the loss for different combinations of scaling (sigma) and weighting (pi) parameters.

Usage

## S3 method for class 'cornet'
plot(x, ...)

Arguments

x

cornet object

...

further arguments (not applicable)

Value

This function plots the evaluation loss (cvm). Whereas the matrix has sigma in the rows, and pi in the columns, the plot has sigma on the xx-axis, and pi on the yy-axis. For all combinations of sigma and pi, the colour indicates the loss. If the R package RColorBrewer is installed, blue represents low. Otherwise, red represents low. White always represents high.

Examples

n <- 100; p <- 200
y <- rnorm(n)
X <- matrix(rnorm(n*p),nrow=n,ncol=p)
net <- cornet(y=y,cutoff=0,X=X)
plot(net)

Predict binary outcome

Description

Predicts the binary outcome with linear, logistic, and combined regression.

Usage

## S3 method for class 'cornet'
predict(object, newx, type = "probability", ...)

Arguments

object

cornet object

newx

covariates: numeric matrix with nn rows (samples) and pp columns (variables)

type

"probability", "odds", "log-odds"

...

further arguments (not applicable)

Details

For linear regression, this function tentatively transforms the predicted values to predicted probabilities, using a Gaussian distribution with a fixed mean (threshold) and a fixed variance (estimated variance of the numeric outcome).

Examples

n <- 100; p <- 200
y <- rnorm(n)
X <- matrix(rnorm(n*p),nrow=n,ncol=p)
net <- cornet(y=y,cutoff=0,X=X)
predict(net,newx=X)

Combined regression

Description

Prints summary of cornet object.

Usage

## S3 method for class 'cornet'
print(x, ...)

Arguments

x

cornet object

...

further arguments (not applicable)

Value

Returns sample size nn, number of covariates pp, information on dichotomisation, tuned scaling parameter (sigma), tuned weighting parameter (pi), and corresponding loss.

Examples

n <- 100; p <- 200
y <- rnorm(n)
X <- matrix(rnorm(n*p),nrow=n,ncol=p)
net <- cornet(y=y,cutoff=0,X=X)
print(net)