Trains a Random Forest of Projection-Pursuit oblique decision trees.
pprf.RdThis function trains a Random Forest of Projection-Pursuit oblique decision tree using either a formula and data frame interface or a matrix-based interface. When using the formula interface, specify the model formula and the data frame containing the variables. For the matrix-based interface, provide matrices for the features and labels directly.
The number of trees is controlled by the size parameter. Each tree is trained on a stratified bootstrap sample drawn from the data.
The number of variables to consider at each split is controlled by the n_vars parameter.
If lambda = 0, the model is trained using Linear Discriminant Analysis (LDA). If lambda > 0, the model is trained using Penalized Discriminant Analysis (PDA).
Usage
pprf(
formula = NULL,
data = NULL,
x = NULL,
y = NULL,
mode = NULL,
size = 100,
lambda = 0,
n_vars = NULL,
p_vars = NULL,
seed = NULL,
max_retries = 3L,
threads = NULL,
pp = NULL,
vars = NULL,
cutpoint = NULL,
stop = NULL,
binarize = NULL,
grouping = NULL,
leaf = NULL
)Arguments
- formula
A formula of the form
y ~ x1 + x2 + ..., whereyis a vector of labels andx1,x2, ... are the features.- data
A data frame containing the variables in the formula.
- x
A matrix containing the features for each observation.
- y
A matrix containing the labels for each observation.
- mode
Training mode: either
"classification"or"regression". WhenNULL(default), mode is auto-detected fromy's type — factor or character vectors trigger classification, numeric vectors trigger regression. Setting it explicitly is useful for the binary-integer-labels case (mode = "classification"with integer 0/1 labels) and for failing fast on a type mismatch (mode = "regression"with a factoryerrors immediately).- size
The number of trees in the forest (default: 100).
- lambda
A regularization parameter. If
lambda = 0, the model is trained using Linear Discriminant Analysis (LDA). Iflambda > 0, the model is trained using Penalized Discriminant Analysis (PDA). Cannot be used together withpp.- n_vars
The number of variables to consider at each split (integer). These are chosen uniformly in each split. By default, half of the variables are used (
p_vars = 0.5). Cannot be used together withp_varsordr.- p_vars
The proportion of variables to consider at each split (number between 0 and 1, exclusive). For example,
p_vars = 0.5uses half the features. Cannot be used together withn_varsordr.- seed
An optional integer seed for reproducibility. If
NULL(default), a seed is drawn from R's RNG, soset.seed()controls reproducibility. If an integer is provided, that value is used directly. The same seed is used for training and for computing permuted variable importance.- max_retries
Maximum number of retries for degenerate trees (default: 3). When a bootstrap sample yields a singular covariance matrix, the tree is retrained with a different seed up to this many times.
- threads
The number of threads to use. The default is the number of cores available.
- pp
A projection pursuit strategy object created by
pp_pda. Cannot be used together withlambda.- vars
A variable selection strategy object created by
vars_uniformorvars_all. Cannot be used together withn_varsorp_vars.- cutpoint
A split cutpoint strategy object created by
cutpoint_mean_of_means(default).- stop
A stopping rule object. Default depends on mode:
stop_pure_node()for classification, andstop_any(stop_min_size(5), stop_min_variance(0.01))for regression.- binarize
A binarization strategy object. Default depends on mode:
binarize_largest_gap()for classification, andbinarize_disabled()for regression (regression's default grouping always yields a 2-group partition, so no binarization is needed).- grouping
A grouping strategy object. Default depends on mode:
grouping_by_label()for classification, andgrouping_by_cutpoint()for regression.- leaf
A leaf strategy object. Default depends on mode:
leaf_majority_vote()for classification, andleaf_mean_response()for regression.
Value
A pprf model. Its S3 class vector is
c("pprf_classification", "pprf", "ppmodel") or
c("pprf_regression", "pprf", "ppmodel") depending on the mode.
Details
Mode is taken from the mode argument when explicit, and otherwise auto-detected from `y` (factor/character → classification, numeric → regression). Pass mode = "classification" to force classification on integer labels (e.g. binary 0/1), or mode = "regression" to assert intent on numeric responses.
OOB error, OOB predictions, permuted variable importance, and weighted variable importance are computed lazily on first access via the accessor functions (`oob_error()`, `oob_predictions()`, `permuted_importance()`, `weighted_importance()`). Training itself is fast because these OOB-based computations are deferred.
See also
predict.pprf_classification, predict.pprf_regression, formula.ppmodel, oob_error, save_json, load_json, pp_rand_forest for parsnip integration, vignette("introduction") for a tutorial
Examples
# Example 1: formula interface with the `iris` dataset
pprf(Species ~ ., data = iris)
#>
#> Random Forest of Projection-Pursuit Oblique Decision Trees
#> Call: pprf(formula = Species ~ ., data = iris)
#> Trees: 100
#> Mode: classification
#> Group names: setosa, versicolor, virginica
#> Formula: Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width - 1
#>
# Example 2: formula interface with the `iris` dataset with regularization
pprf(Species ~ ., data = iris, lambda = 0.5)
#>
#> Random Forest of Projection-Pursuit Oblique Decision Trees
#> Call: pprf(formula = Species ~ ., data = iris, lambda = 0.5)
#> Trees: 100
#> Mode: classification
#> Group names: setosa, versicolor, virginica
#> Formula: Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width - 1
#>
# Example 3: matrix interface with the `iris` dataset
pprf(x = iris[, 1:4], y = iris[, 5])
#>
#> Random Forest of Projection-Pursuit Oblique Decision Trees
#> Call: pprf(x = iris[, 1:4], y = iris[, 5])
#> Trees: 100
#> Mode: classification
#> Group names: setosa, versicolor, virginica
#>
# Example 4: matrix interface with the `iris` dataset with regularization
pprf(x = iris[, 1:4], y = iris[, 5], lambda = 0.5)
#>
#> Random Forest of Projection-Pursuit Oblique Decision Trees
#> Call: pprf(x = iris[, 1:4], y = iris[, 5], lambda = 0.5)
#> Trees: 100
#> Mode: classification
#> Group names: setosa, versicolor, virginica
#>
# Example 5: formula interface with the `crabs` dataset
pprf(Type ~ ., data = crabs)
#>
#> Random Forest of Projection-Pursuit Oblique Decision Trees
#> Call: pprf(formula = Type ~ ., data = crabs)
#> Trees: 100
#> Mode: classification
#> Group names: B, O
#> Formula: Type ~ sex + FL + RW + CL + CW + BD - 1
#>
# Example 6: formula interface with the `crabs` dataset with regularization
pprf(Type ~ ., data = crabs, lambda = 0.5)
#>
#> Random Forest of Projection-Pursuit Oblique Decision Trees
#> Call: pprf(formula = Type ~ ., data = crabs, lambda = 0.5)
#> Trees: 100
#> Mode: classification
#> Group names: B, O
#> Formula: Type ~ sex + FL + RW + CL + CW + BD - 1
#>