Tree training in ppforest2 is parameterised by three pluggable strategy interfaces. You can add new optimisation criteria, variable selection methods, or splitting rules without modifying the tree-building logic.
| Interface | Purpose | Override |
| PPStrategy | Projection pursuit index optimisation | index(), optimize(), to_json() |
| DRStrategy | Variable subset selection | select(), to_json() |
| SRStrategy | Split threshold computation | threshold(), to_json() |
Architecture
All three strategy families share the same infrastructure via the CRTP base class Strategy<Derived>. This base provides:
using Ptr = std::shared_ptr<Derived> — the canonical pointer type.
to_json() — pure virtual, every strategy must serialize itself.
from_json(json) — static dispatcher that looks up the "name" field in the JSON and delegates to the registered factory.
register_strategy(name, factory) — adds a factory to the registry.
Each strategy family (PPStrategy, DRStrategy, SRStrategy) gets its own independent registry because Strategy<PPStrategy> and Strategy<DRStrategy> are different template instantiations.
Conventions
All strategies follow the same structural conventions:
- Immutable configuration — store parameters as
const private members set in the constructor.
- to_json() — serialises the strategy's name and parameters to JSON for model persistence and display.
- from_json() — static method that deserializes from JSON, validating keys and returning a
Ptr.
- PPFOREST2_REGISTER_STRATEGY — macro that registers the strategy's
from_json at static initialization time.
- Factory function — a free function in the strategy's namespace that returns a
Ptr (e.g. pp::pda(lambda)).
- Convenience operator() — calls the main virtual method so strategies can be used as functors.
Strategies are held via shared_ptr and are immutable after construction, so they can be freely shared across trees without deep cloning.
Self-Registration
New strategies are automatically available for JSON deserialization by adding two things to the concrete class:
- A
static Ptr from_json(const nlohmann::json& j) method.
- The
PPFOREST2_REGISTER_STRATEGY(Base, "name") macro.
The macro uses inline static initialization to register the factory at program startup. No central registry file needs to be modified.
struct MyStrategy : public PPStrategy {
static PPStrategy::Ptr from_json(const nlohmann::json& j);
};
#define PPFOREST2_REGISTER_STRATEGY(StrategyBase, name)
Auto-registration macro for strategy factories.
Definition Strategy.hpp:100
After this, PPStrategy::from_json({"name": "my_strategy", ...}) will automatically dispatch to MyStrategy::from_json.
Adding a new PPStrategy
A PPStrategy defines how to evaluate a projection index and how to find the optimal projection for a given dataset.
Interface to implement
struct PPStrategy :
public Strategy<PPStrategy> {
virtual types::Feature index(
const types::FeatureMatrix& x,
const stats::GroupPartition& group_spec,
const Projector& projector
) const = 0;
virtual PPResult optimize(
const types::FeatureMatrix& x,
const stats::GroupPartition& group_spec
) const = 0;
};
CRTP base class providing self-registration for strategy types.
Definition Strategy.hpp:23
Example: a random projection strategy
#pragma once
struct PPRandomStrategy : public PPStrategy {
types::Feature index(
const types::FeatureMatrix& x,
const stats::GroupPartition& group_spec,
const Projector& projector) const override {
return 0;
}
PPResult optimize(
const types::FeatureMatrix& x,
const stats::GroupPartition& group_spec) const override {
Projector proj = Projector::Random(x.cols());
proj.normalize();
return PPResult{ proj, 0 };
}
void to_json(nlohmann::json& j)
const override {
j = {{"name", "random"}};
}
static PPStrategy::Ptr
from_json(
const nlohmann::json& j) {
return random();
}
};
return std::make_shared<PPRandomStrategy>();
}
}
Projection pursuit strategies.
Definition PPPDAStrategy.hpp:8
T from_json(json const &j)
Deserialize from a model block (integer labels only).
json to_json(Model const &model)
void validate_json_keys(nlohmann::json const &j, std::string const &context, std::initializer_list< std::string > allowed)
Validate that a JSON object contains only expected keys.
Definition JsonValidation.hpp:20
std::shared_ptr< PPStrategy > Ptr
Definition Strategy.hpp:24
Adding a new DRStrategy
A DRStrategy selects a subset of variables (columns) before projection pursuit runs. This reduces cost and introduces diversity in forests.
Interface to implement
struct DRStrategy :
public Strategy<DRStrategy> {
virtual DRResult select(
const types::FeatureMatrix& x,
const stats::GroupPartition& group_spec,
stats::RNG& rng
) const = 0;
};
The returned DRResult records which columns were selected and the original column count, so the reduced-space projector can be expanded back to the full space via DRResult::expand().
Example: top-variance variable selection
#pragma once
struct DRVarianceStrategy : public DRStrategy {
explicit DRVarianceStrategy(int n_vars) : n_vars_(n_vars) {}
DRResult select(
const types::FeatureMatrix& x,
const stats::GroupPartition& group_spec,
stats::RNG& rng) const override {
std::vector<std::pair<float, int>> var_idx;
for (int j = 0; j < x.cols(); ++j) {
float var = (x.col(j).array() - x.col(j).mean()).square().sum();
var_idx.push_back({var, j});
}
std::stable_sort(var_idx.begin(), var_idx.end(),
[](auto& a, auto& b) { return a.first > b.first; });
std::vector<int> selected;
for (int i = 0; i < n_vars_ && i < (int)var_idx.size(); ++i)
selected.push_back(var_idx[i].second);
return DRResult(selected, x.cols());
}
void to_json(nlohmann::json& j)
const override {
j = {{"name", "variance"}, {"n_vars", n_vars_}};
}
static DRStrategy::Ptr
from_json(
const nlohmann::json& j) {
return variance(j.at("n_vars").get<int>());
}
private:
const int n_vars_;
};
inline DRStrategy::Ptr variance(int n_vars) {
return std::make_shared<DRVarianceStrategy>(n_vars);
}
}
Dimensionality reduction strategies for variable selection.
Definition DRNoopStrategy.hpp:7
Adding a new SRStrategy
An SRStrategy computes the split threshold in the projected space that separates two groups.
Interface to implement
struct SRStrategy :
public Strategy<SRStrategy> {
virtual types::Feature threshold(
const types::FeatureMatrix& group_1,
const types::FeatureMatrix& group_2,
const pp::Projector& projector
) const = 0;
};
Example: median-based split
#pragma once
#include <algorithm>
struct SRMedianStrategy : public SRStrategy {
types::Feature threshold(
const types::FeatureMatrix& group_1,
const types::FeatureMatrix& group_2,
const pp::Projector& projector) const override {
auto proj_1 = group_1 * projector;
auto proj_2 = group_2 * projector;
std::vector<float> all;
for (int i = 0; i < proj_1.rows(); ++i) all.push_back(proj_1(i));
for (int i = 0; i < proj_2.rows(); ++i) all.push_back(proj_2(i));
std::nth_element(all.begin(), all.begin() + all.size() / 2, all.end());
return all[all.size() / 2];
}
void to_json(nlohmann::json& j) const override {
j = {{"name", "median"}};
}
static SRStrategy::Ptr
from_json(
const nlohmann::json& j) {
return median();
}
};
return std::make_shared<SRMedianStrategy>();
}
}
Split rule strategies for computing decision thresholds.
Definition SRMeanOfMeansStrategy.hpp:7
Composing strategies
Once you have a new strategy, compose it into a TrainingSpec:
TrainingSpec spec(
pp::pda(0.5),
dr::variance(4),
sr::median(),
100, 0);
auto model = Model::train(spec, x, y);
stats::RNG rng(0);
Tree tree = Tree::train(spec, x, y, rng);
TrainingSpec is a concrete class — there is no need to subclass it. New strategies are plugged in via the constructor. Strategies are held via shared_ptr, so TrainingSpec can be freely copied and shared across trees.
Checklist for adding a new strategy
- Create
models/MyStrategy.hpp (and .cpp if needed).
- Inherit from the appropriate base (PPStrategy, DRStrategy, or SRStrategy).
- Implement the pure virtual methods (
index/optimize, select, or threshold).
- Implement
to_json() — must include a "name" field.
- Implement
display_name() — a human-readable label for summaries.
- Add a
static Ptr from_json(const nlohmann::json& j) — validate keys, construct via factory.
- Add
PPFOREST2_REGISTER_STRATEGY(Base, "name") inside the struct.
- Add a factory function (e.g.
inline Ptr my_strategy(...)) in the namespace.
- Add the
.cpp to src/models/CMakeLists.txt.
- Add tests in
MyStrategy.test.cpp (JSON round-trip + functional).
That's it — no central registry file or TrainingSpec subclass needed.
- See also
- Strategy, PPStrategy, DRStrategy, SRStrategy, TrainingSpec