Select features using supervised or unsupervised kernel method. A supervised feature selection method is performed if Y is provided.

# S3 method for features
select(
  X,
  Y = NULL,
  kx.func = c("linear", "gaussian.radial.basis", "bray"),
  ky.func = c("linear", "gaussian.radial.basis"),
  keepX = NULL,
  method = c("kernel", "kpca", "graph"),
  lambda = NULL,
  n_components = 2,
  Lg = NULL,
  mu = 1,
  max_iter = 100,
  nstep = 50,
  ...
)

Arguments

X

a numeric matrix (or data frame) used to select variables. NAs not allowed.

Y

a numeric matrix (or data frame) used to select variables. NAs not allowed.

kx.func

the kernel function name to use on X. Widely used kernel functions are pre-implemented, and can be directly used by setting kx.func to one of the following values: "linear", "gaussian.radial.basis" or "bray". Default: "linear". If Y is provided, the kernel "bray" is not allowed.

ky.func

the kernel function name to use on Y. Available kernels are: "linear", and "gaussian.radial.basis". Default: "linear". This value is ignored when Y is not provided.

keepX

the number of variables to select.

method

the method to use. Either an unsupervised variable selection method ("kernel"), a kernel PCA oriented variable selection method ("kpca") or a structure driven variable selection selection ("graph"). Default: "kernel".

lambda

the penalization parameter that controls the trade-off between the minimization of the distorsion and the sparsity of the solution parameter.

n_components

how many principal components should be used with method "kpca". Required with method "kpca". Default: 2.

Lg

the Laplacian matrix of the graph representing relations between the input dataset variables. Required with method "graph".

mu

the penalization parameter that controls the trade-off between the the distorsion and the influence of the graph. Default: 1.

max_iter

the maximum number of iterations. Default: 100.

nstep

the number of values used for the regularization path. Default: 50.

...

the kernel function arguments. In particular, sigma("gaussian.radial.basis"): double. The inverse kernel width used by "gaussian.radial.basis".

Value

ukfs returns a vector of sorted selected features indexes.

References

Brouard C., Mariette J., Flamary R. and Vialaneix N. (2022). Feature selection for kernel methods in systems biology. NAR Genomics and Bioinformatics, 4(1), lqac014. DOI: doi:10.1093/nargab/lqac014 .

See also

Author

Celine Brouard <celine.brouard@inrae.fr> Jerome Mariette <jerome.mariette@inrae.fr> Nathalie Vialaneix <nathalie.vialaneix@inrae.fr>

Examples

## These examples require the installation of python modules
## See installation instruction at: http://mixkernel.clementine.wf

data("Koren.16S")
if (FALSE) {
 sf.res <- select.features(Koren.16S$data.raw, kx.func = "bray", lambda = 1,
                           keepX = 40, nstep = 1)
 colnames(Koren.16S$data.raw)[sf.res]
}

data("nutrimouse")
if (FALSE) {
 grb.func <- "gaussian.radial.basis"
 genes <- center.scale(nutrimouse$gene)
 lipids <- center.scale(nutrimouse$lipid)
 sf.res <- select.features(genes, lipids, kx.func = grb.func, 
                           ky.func = grb.func, keepX = 40)
 colnames(nutrimouse$gene)[sf.res]
}