Select important features

Select features using supervised or unsupervised kernel method. A supervised feature selection method is performed if Y is provided.

# S3 method for class 'features'
select(
  X,
  Y = NULL,
  kx.func = c("linear", "gaussian.radial.basis", "bray"),
  ky.func = c("linear", "gaussian.radial.basis"),
  keepX = NULL,
  method = c("kernel", "kpca", "graph"),
  lambda = NULL,
  n_components = 2,
  Lg = NULL,
  mu = 1,
  max_iter = 100,
  nstep = 50,
  ...
)

Arguments

X: a numeric matrix (or data frame) used to select variables. NAs not allowed.
Y: a numeric matrix (or data frame) used to select variables. NAs not allowed.
kx.func: the kernel function name to use on X. Widely used kernel functions are pre-implemented, and can be directly used by setting kx.func to one of the following values: "linear", "gaussian.radial.basis" or "bray". Default: "linear". If Y is provided, the kernel "bray" is not allowed.
ky.func: the kernel function name to use on Y. Available kernels are: "linear", and "gaussian.radial.basis". Default: "linear". This value is ignored when Y is not provided.
keepX: the number of variables to select.
method: the method to use. Either an unsupervised variable selection method ("kernel"), a kernel PCA oriented variable selection method ("kpca") or a structure driven variable selection selection ("graph"). Default: "kernel".
lambda: the penalization parameter that controls the trade-off between the minimization of the distorsion and the sparsity of the solution parameter.
n_components: how many principal components should be used with method "kpca". Required with method "kpca". Default: 2.
Lg: the Laplacian matrix of the graph representing relations between the input dataset variables. Required with method "graph".
mu: the penalization parameter that controls the trade-off between the the distorsion and the influence of the graph. Default: 1.
max_iter: the maximum number of iterations. Default: 100.
nstep: the number of values used for the regularization path. Default: 50.
...: the kernel function arguments. In particular, sigma("gaussian.radial.basis"): double. The inverse kernel width used by "gaussian.radial.basis".

Value

ukfs returns a vector of sorted selected features indexes.

References

Brouard C., Mariette J., Flamary R. and Vialaneix N. (2022). Feature selection for kernel methods in systems biology. NAR Genomics and Bioinformatics, 4(1), lqac014. DOI: doi:10.1093/nargab/lqac014 .

Author

Celine Brouard <celine.brouard@inrae.fr> Jerome Mariette <jerome.mariette@inrae.fr> Nathalie Vialaneix <nathalie.vialaneix@inrae.fr>

Examples

## These examples require the installation of python modules
## See installation instruction at: http://mixkernel.clementine.wf

data("Koren.16S")
if (FALSE) { # \dontrun{
 sf.res <- select.features(Koren.16S$data.raw, kx.func = "bray", lambda = 1,
                           keepX = 40, nstep = 1)
 colnames(Koren.16S$data.raw)[sf.res]
} # }

data("nutrimouse")
if (FALSE) { # \dontrun{
 grb.func <- "gaussian.radial.basis"
 genes <- center.scale(nutrimouse$gene)
 lipids <- center.scale(nutrimouse$lipid)
 sf.res <- select.features(genes, lipids, kx.func = grb.func, 
                           ky.func = grb.func, keepX = 40)
 colnames(nutrimouse$gene)[sf.res]
} # }

Arguments

Value

References

See also

Author

Examples