Custom strategies
custom-strategies.Rmdppforest2 trains trees by composing six pluggable strategies:
| Strategy | Purpose | Built-in |
|---|---|---|
| PP (projection pursuit) | Find the projection that best separates groups |
pp_pda() — Penalized Discriminant Analysis |
| Vars (variable selection) | Select which variables are available at each split |
vars_uniform(), vars_all()
|
| Threshold (split cutpoint) | Compute the split cutpoint in projected space | cutpoint_mean_of_means() |
| Stop (stopping rule) | Decide when to stop growing | stop_pure_node() |
| Binarize (binarization) | Reduce multiclass to binary at each node | binarize_largest_gap() |
| Grouping (group partition) | Route observations to children | grouping_by_label() |
You can add new strategies without modifying the core tree-building logic. This vignette walks through the process.
How strategies work
Each strategy is an R list with a name field that
identifies it, a display_name for summaries, and any
parameters the strategy needs. The name must match a C++
strategy registered under the same name.
library(ppforest2)
pp_pda(0.5)
#> $name
#> [1] "pda"
#>
#> $display_name
#> [1] "PDA"
#>
#> $lambda
#> [1] 0.5
#>
#> attr(,"class")
#> [1] "pp_strategy"
vars_uniform(n_vars = 2)
#> $name
#> [1] "uniform"
#>
#> $display_name
#> [1] "Uniform random"
#>
#> $count
#> [1] 2
#>
#> $p_vars
#> NULL
#>
#> attr(,"class")
#> [1] "vars_strategy"
cutpoint_mean_of_means()
#> $name
#> [1] "mean_of_means"
#>
#> $display_name
#> [1] "Mean of means"
#>
#> attr(,"class")
#> [1] "cutpoint_strategy"When you call pptr() or pprf(), the
strategy lists are passed to C++, where the name field
dispatches to the corresponding C++ implementation. The actual
computation (optimization, variable selection, cutpoint) happens
entirely in C++.
Adding a new strategy
Adding a strategy requires work on both sides:
- C++: Implement the strategy class (the computation).
- R: Write a constructor function (the user-facing API).
Step 1: C++ implementation
Each strategy family has a base class with pure virtual methods. Your new strategy inherits from the appropriate base and implements them.
For example, a new projection pursuit strategy needs to implement
optimize() (find the best projection):
// File: core/src/models/strategies/pp/MyMethod.hpp
#pragma once
#include "models/strategies/pp/ProjectionPursuit.hpp"
#include "models/strategies/Strategy.hpp"
#include "utils/JsonValidation.hpp"
namespace ppforest2::pp {
struct MyMethod : public ProjectionPursuit {
explicit MyMethod(float alpha) : alpha_(alpha) {}
std::string display_name() const override { return "My method"; }
Result optimize(
const types::FeatureMatrix& x,
const stats::GroupPartition& group_spec) const override {
// Find the optimal projector for the data.
// Return Result{ projector_vector, index_value }.
...
}
nlohmann::json to_json() const override {
return {{"name", "my_method"}, {"alpha", alpha_}};
}
static ProjectionPursuit::Ptr from_json(const nlohmann::json& j) {
JsonReader{j, "my_method"}.only_keys({"name", "alpha"});
return my_method(j.at("alpha").get<float>());
}
PPFOREST2_REGISTER_STRATEGY(ProjectionPursuit, "my_method")
private:
const float alpha_;
};
inline ProjectionPursuit::Ptr my_method(float alpha) {
return std::make_shared<MyMethod>(alpha);
}
} // namespace ppforest2::ppThe key pieces:
-
to_json()serializes the strategy name and parameters. This is used for model persistence. -
from_json()deserializes from JSON and validates that no unexpected keys are present. -
PPFOREST2_REGISTER_STRATEGYregisters the factory so JSON deserialization finds it automatically. -
display_name()returns a human-readable label for summaries. -
Factory function (
my_method()) is a convenience wrapper.
The same pattern applies to variable selection strategies
(select()), cutpoint strategies (cutpoint()),
and the other strategy families. See the C++ documentation for complete
interface definitions and examples.
After writing the .cpp file, add it to
core/src/models/CMakeLists.txt.
Step 2: R constructor
Write an R function that creates a strategy list. The
name field must match the C++ registration name
exactly.
#' My custom projection pursuit strategy.
#'
#' @param alpha A tuning parameter.
#' @return A \code{pp_strategy} object.
#' @export
pp_my_method <- function(alpha = 1.0) {
if (!is.numeric(alpha) || length(alpha) != 1)
stop("`alpha` must be a single number.")
structure(
list(name = "my_method", display_name = "My method", alpha = alpha),
class = "pp_strategy"
)
}The constructor should:
- Validate parameters before they reach C++. Catching errors early with clear messages is better than a C++ exception.
-
Set the S3 class to
pp_strategy,vars_strategy,cutpoint_strategy,stop_strategy,binarize_strategy, orgrouping_strategy. This is checked byresolve_strategies(). -
Include
display_namefor readable output insummary(). -
Use the same parameter names as
to_json()in C++. The R list is converted to JSON and passed tofrom_json()on the C++ side.
Step 3: Use it
Once both sides are in place, the new strategy works like any built-in:
# Single tree
tree <- pptr(Species ~ ., data = iris, pp = pp_my_method(alpha = 0.5))
# Forest
forest <- pprf(Species ~ ., data = iris, pp = pp_my_method(alpha = 0.5), vars = vars_uniform(n_vars = 2))
# Summary shows the strategy
summary(tree)The strategy is also available from the CLI:
And models trained with the new strategy can be saved and loaded as usual — the JSON registry handles serialization automatically.
Strategy families reference
PP: Projection pursuit
Controls how the tree finds the best linear combination of variables at each node.
index(x, group_spec, projector) -> scalar
optimize(x, group_spec) -> Result{projector, index}
optimize() is the main method. It receives the data
matrix and group partition and returns the best projection vector.
index() evaluates a given projection (used for variable
importance calculations).
Vars: Variable selection
Controls which variables are available to projection pursuit at each split. This is what makes random forests “random”.
select(x, group_spec, rng) -> Result{selected_indices, original_cols}
The returned VariableSelection::Result tracks which
columns were selected so the reduced-space projector can be expanded
back to the full feature space.
Threshold: Split cutpoint
Controls where the split cutpoint is placed in the projected space.
cutpoint(group_1, group_2, projector) -> scalar
Receives the two groups (already partitioned by projection pursuit) and the projection vector. Returns the cutpoint value.
Stop: Stopping rule
Controls when to stop growing the tree.
should_stop(group_partition, depth) -> bool
Binarize: Binarization
Controls how multiclass nodes (>2 groups) are reduced to a binary problem.
regroup(projected_x, group_partition) -> Result
Grouping: Group partition
Controls how observations are routed to children after a split.
split(partition, lower_group, upper_group) -> Result
Checklist
- Create
core/src/models/strategies/<family>/MyStrategy.hpp(and.cppif needed). - Inherit from the appropriate base class
(
ProjectionPursuit,VariableSelection,Cutpoint,StopRule,Binarization, orGrouping). - Implement the pure virtual methods.
- Implement
to_json()with a"name"field. - Implement
display_name()for human-readable summaries. - Add
static Ptr from_json()with key validation. - Add
PPFOREST2_REGISTER_STRATEGY(Base, "name"). - Add a factory function in the strategy’s namespace.
- Add the
.cpptocore/src/models/CMakeLists.txt. - Write tests in
MyStrategy.test.cpp(JSON round-trip + functional). - Write the R constructor function with validation,
display_name, and the correct S3 class. - Export and document the R function.