Trains a Random Forest of Project-Pursuit oblique decision trees.
pprf.RdThis function trains a Random Forest of Project-Pursuit oblique decision tree using either a formula and data frame interface or a matrix-based interface. When using the formula interface, specify the model formula and the data frame containing the variables. For the matrix-based interface, provide matrices for the features and labels directly.
The number of trees is controlled by the size parameter. Each tree is trained on a stratified bootstrap sample drawn from the data.
The number of variables to consider at each split is controlled by the n_vars parameter.
If lambda = 0, the model is trained using Linear Discriminant Analysis (LDA). If lambda > 0, the model is trained using Penalized Discriminant Analysis (PDA).
Usage
pprf(
formula = NULL,
data = NULL,
x = NULL,
y = NULL,
size = 2,
lambda = 0,
n_vars = NULL,
p_vars = NULL,
seed = NULL,
max_retries = 3L,
threads = NULL,
pp = NULL,
dr = NULL,
sr = NULL
)Arguments
- formula
A formula of the form
y ~ x1 + x2 + ..., whereyis a vector of labels andx1,x2, ... are the features.- data
A data frame containing the variables in the formula.
- x
A matrix containing the features for each observation.
- y
A matrix containing the labels for each observation.
- size
The number of trees in the forest.
- lambda
A regularization parameter. If
lambda = 0, the model is trained using Linear Discriminant Analysis (LDA). Iflambda > 0, the model is trained using Penalized Discriminant Analysis (PDA). Cannot be used together withpp.- n_vars
The number of variables to consider at each split (integer). These are chosen uniformly in each split. The default is all variables. Cannot be used together with
p_varsordr.- p_vars
The proportion of variables to consider at each split (number between 0 and 1, exclusive). For example,
p_vars = 0.5uses half the features. Cannot be used together withn_varsordr.- seed
An optional integer seed for reproducibility. If
NULL(default), a seed is drawn from R's RNG, soset.seed()controls reproducibility. If an integer is provided, that value is used directly. The same seed is used for training and for computing permuted variable importance.- max_retries
Maximum number of retries for degenerate trees (default: 3). When a bootstrap sample yields a singular covariance matrix, the tree is retrained with a different seed up to this many times.
- threads
The number of threads to use. The default is the number of cores available.
- pp
A projection pursuit strategy object created by
pp_pda. Cannot be used together withlambda.- dr
A dimensionality reduction strategy object created by
dr_uniformordr_noop. Cannot be used together withn_varsorp_vars.- sr
A split rule strategy object created by
sr_mean_of_means(default).
See also
predict.pprf, formula.pprf, summary.pprf, print.pprf, save_json, load_json, pp_rand_forest for parsnip integration, pp_pda, dr_uniform, dr_noop, sr_mean_of_means, vignette("introduction") for a tutorial
Examples
# Example 1: formula interface with the `iris` dataset
pprf(Type ~ ., data = iris)
#>
#> Random Forest of Project-Pursuit Oblique Decision Tree
#> -------------------------------------
#> Tree 1:
#> If ([ 0.01 0.05 -0.03 -0.02 ] * x) < 0.06262772:
#> If ([ 0.04 0.09 -0.1 -0.13 ] * x) < -0.2146854:
#> Predict: virginica
#> Else:
#> Predict: versicolor
#> Else:
#> Predict: setosa
#>
#> Tree 2:
#> If ([ 0 0.05 -0.04 -0.01 ] * x) < 0.06345474:
#> If ([ 0.06 0.05 -0.1 -0.12 ] * x) < -0.1598488:
#> Predict: virginica
#> Else:
#> Predict: versicolor
#> Else:
#> Predict: setosa
#>
#>
# Example 2: formula interface with the `iris` dataset with regularization
pprf(Type ~ ., data = iris, lambda = 0.5)
#>
#> Random Forest of Project-Pursuit Oblique Decision Tree
#> -------------------------------------
#> Tree 1:
#> If ([ 0 -0.04 0.02 0.03 ] * x) < 0.02115907:
#> Predict: setosa
#> Else:
#> If ([ 0 0.04 -0.06 -0.16 ] * x) < -0.4436958:
#> Predict: virginica
#> Else:
#> Predict: versicolor
#>
#> Tree 2:
#> If ([ 0 -0.03 0.03 0.04 ] * x) < 0.03752498:
#> Predict: setosa
#> Else:
#> If ([ 0 0.02 -0.06 -0.16 ] * x) < -0.5112084:
#> Predict: virginica
#> Else:
#> Predict: versicolor
#>
#>
# Example 3: matrix interface with the `iris` dataset
pprf(x = iris[, 1:4], y = iris[, 5])
#>
#> Random Forest of Project-Pursuit Oblique Decision Tree
#> -------------------------------------
#> Tree 1:
#> If ([ 0.02 0.04 -0.04 -0.02 ] * x) < 0.08615907:
#> If ([ 0.1 0 -0.12 -0.12 ] * x) < -0.1784111:
#> Predict: virginica
#> Else:
#> Predict: versicolor
#> Else:
#> Predict: setosa
#>
#> Tree 2:
#> If ([ 0.02 0.03 -0.05 0 ] * x) < 0.03522417:
#> If ([ 0.05 0.04 -0.09 -0.16 ] * x) < -0.2597719:
#> Predict: virginica
#> Else:
#> Predict: versicolor
#> Else:
#> Predict: setosa
#>
#>
# Example 4: matrix interface with the `iris` dataset with regularization
pprf(x = iris[, 1:4], y = iris[, 5], lambda = 0.5)
#>
#> Random Forest of Project-Pursuit Oblique Decision Tree
#> -------------------------------------
#> Tree 1:
#> If ([ 0 -0.05 0.02 0.04 ] * x) < -0.04241288:
#> Predict: setosa
#> Else:
#> If ([ 0 0.04 -0.05 -0.16 ] * x) < -0.3842216:
#> Predict: virginica
#> Else:
#> Predict: versicolor
#>
#> Tree 2:
#> If ([ 0 -0.04 0.02 0.04 ] * x) < 0.009132806:
#> Predict: setosa
#> Else:
#> If ([ 0 0.03 -0.06 -0.14 ] * x) < -0.4294196:
#> Predict: virginica
#> Else:
#> Predict: versicolor
#>
#>
# Example 5: formula interface with the `crabs` dataset
pprf(Type ~ ., data = crabs)
#>
#> Random Forest of Project-Pursuit Oblique Decision Tree
#> -------------------------------------
#> Tree 1:
#> If ([ 0.13 0 0 0 0 0 0 ] * x) < 0.06243492:
#> Predict: O
#> Else:
#> Predict: B
#>
#> Tree 2:
#> If ([ 0.14 0 0 0 0 0 0 ] * x) < 0.0746413:
#> Predict: O
#> Else:
#> Predict: B
#>
#>
# Example 6: formula interface with the `crabs` dataset with regularization
pprf(Type ~ ., data = crabs, lambda = 0.5)
#>
#> Random Forest of Project-Pursuit Oblique Decision Tree
#> -------------------------------------
#> Tree 1:
#> If ([ 0.01 -0.01 0.01 0.01 0 0 0.01 ] * x) < 0.3329027:
#> Predict: B
#> Else:
#> Predict: O
#>
#> Tree 2:
#> If ([ 0.01 -0.01 -0.01 0 0 0 -0.01 ] * x) < -0.3170045:
#> Predict: O
#> Else:
#> Predict: B
#>
#>