ppforest2 v0.1.0
Projection Pursuit Decision Trees and Random Forests
Loading...
Searching...
No Matches
Extending: Custom Strategies

Tree training in ppforest2 is parameterised by three pluggable strategy interfaces. You can add new optimisation criteria, variable selection methods, or splitting rules without modifying the tree-building logic.

Interface Purpose Override
PPStrategy Projection pursuit index optimisation index(), optimize(), to_json()
DRStrategy Variable subset selection select(), to_json()
SRStrategy Split threshold computation threshold(), to_json()

Architecture

All three strategy families share the same infrastructure via the CRTP base class Strategy<Derived>. This base provides:

  • using Ptr = std::shared_ptr<Derived> — the canonical pointer type.
  • to_json() — pure virtual, every strategy must serialize itself.
  • from_json(json) — static dispatcher that looks up the "name" field in the JSON and delegates to the registered factory.
  • register_strategy(name, factory) — adds a factory to the registry.

Each strategy family (PPStrategy, DRStrategy, SRStrategy) gets its own independent registry because Strategy<PPStrategy> and Strategy<DRStrategy> are different template instantiations.

Conventions

All strategies follow the same structural conventions:

  • Immutable configuration — store parameters as const private members set in the constructor.
  • to_json() — serialises the strategy's name and parameters to JSON for model persistence and display.
  • from_json() — static method that deserializes from JSON, validating keys and returning a Ptr.
  • PPFOREST2_REGISTER_STRATEGY — macro that registers the strategy's from_json at static initialization time.
  • Factory function — a free function in the strategy's namespace that returns a Ptr (e.g. pp::pda(lambda)).
  • Convenience operator() — calls the main virtual method so strategies can be used as functors.

Strategies are held via shared_ptr and are immutable after construction, so they can be freely shared across trees without deep cloning.

Self-Registration

New strategies are automatically available for JSON deserialization by adding two things to the concrete class:

  1. A static Ptr from_json(const nlohmann::json& j) method.
  2. The PPFOREST2_REGISTER_STRATEGY(Base, "name") macro.

The macro uses inline static initialization to register the factory at program startup. No central registry file needs to be modified.

struct MyStrategy : public PPStrategy {
static PPStrategy::Ptr from_json(const nlohmann::json& j);
PPFOREST2_REGISTER_STRATEGY(PPStrategy, "my_strategy")
};
#define PPFOREST2_REGISTER_STRATEGY(StrategyBase, name)
Auto-registration macro for strategy factories.
Definition Strategy.hpp:100

After this, PPStrategy::from_json({"name": "my_strategy", ...}) will automatically dispatch to MyStrategy::from_json.

Adding a new PPStrategy

A PPStrategy defines how to evaluate a projection index and how to find the optimal projection for a given dataset.

Interface to implement

struct PPStrategy : public Strategy<PPStrategy> {
// Evaluate the projection index for a given projector.
virtual types::Feature index(
const types::FeatureMatrix& x, // (n x p)
const stats::GroupPartition& group_spec,
const Projector& projector // (p)
) const = 0;
// Find the optimal projector for the data.
virtual PPResult optimize(
const types::FeatureMatrix& x, // (n x p)
const stats::GroupPartition& group_spec
) const = 0;
// Inherited from Strategy<PPStrategy>:
// virtual void to_json(nlohmann::json& j) const = 0;
// static Ptr from_json(const nlohmann::json& j);
};
CRTP base class providing self-registration for strategy types.
Definition Strategy.hpp:23

Example: a random projection strategy

// File: models/PPRandomStrategy.hpp
#pragma once
namespace ppforest2::pp {
// A toy strategy that returns a random unit projector.
struct PPRandomStrategy : public PPStrategy {
types::Feature index(
const types::FeatureMatrix& x,
const stats::GroupPartition& group_spec,
const Projector& projector) const override {
return 0;
}
PPResult optimize(
const types::FeatureMatrix& x,
const stats::GroupPartition& group_spec) const override {
Projector proj = Projector::Random(x.cols());
proj.normalize();
return PPResult{ proj, 0 };
}
void to_json(nlohmann::json& j) const override {
j = {{"name", "random"}};
}
static PPStrategy::Ptr from_json(const nlohmann::json& j) {
validate_json_keys(j, "random PP", {"name"});
return random();
}
PPFOREST2_REGISTER_STRATEGY(PPStrategy, "random")
};
// Factory function (convention).
inline PPStrategy::Ptr random() {
return std::make_shared<PPRandomStrategy>();
}
} // namespace ppforest2::pp
Projection pursuit strategies.
Definition PPPDAStrategy.hpp:8
T from_json(json const &j)
Deserialize from a model block (integer labels only).
json to_json(Model const &model)
void validate_json_keys(nlohmann::json const &j, std::string const &context, std::initializer_list< std::string > allowed)
Validate that a JSON object contains only expected keys.
Definition JsonValidation.hpp:20
std::shared_ptr< PPStrategy > Ptr
Definition Strategy.hpp:24

Adding a new DRStrategy

A DRStrategy selects a subset of variables (columns) before projection pursuit runs. This reduces cost and introduces diversity in forests.

Interface to implement

struct DRStrategy : public Strategy<DRStrategy> {
// Select a subset of columns.
virtual DRResult select(
const types::FeatureMatrix& x, // (n x p)
const stats::GroupPartition& group_spec,
stats::RNG& rng
) const = 0;
// Inherited from Strategy<DRStrategy>:
// virtual void to_json(nlohmann::json& j) const = 0;
// static Ptr from_json(const nlohmann::json& j);
};

The returned DRResult records which columns were selected and the original column count, so the reduced-space projector can be expanded back to the full space via DRResult::expand().

Example: top-variance variable selection

// File: models/DRVarianceStrategy.hpp
#pragma once
namespace ppforest2::dr {
// Selects the n_vars columns with highest variance.
struct DRVarianceStrategy : public DRStrategy {
explicit DRVarianceStrategy(int n_vars) : n_vars_(n_vars) {}
DRResult select(
const types::FeatureMatrix& x,
const stats::GroupPartition& group_spec,
stats::RNG& rng) const override {
// Compute per-column variance.
std::vector<std::pair<float, int>> var_idx;
for (int j = 0; j < x.cols(); ++j) {
float var = (x.col(j).array() - x.col(j).mean()).square().sum();
var_idx.push_back({var, j});
}
// Sort descending by variance, pick top n_vars.
std::stable_sort(var_idx.begin(), var_idx.end(),
[](auto& a, auto& b) { return a.first > b.first; });
std::vector<int> selected;
for (int i = 0; i < n_vars_ && i < (int)var_idx.size(); ++i)
selected.push_back(var_idx[i].second);
return DRResult(selected, x.cols());
}
void to_json(nlohmann::json& j) const override {
j = {{"name", "variance"}, {"n_vars", n_vars_}};
}
static DRStrategy::Ptr from_json(const nlohmann::json& j) {
validate_json_keys(j, "variance DR", {"name", "n_vars"});
return variance(j.at("n_vars").get<int>());
}
PPFOREST2_REGISTER_STRATEGY(DRStrategy, "variance")
private:
const int n_vars_;
};
inline DRStrategy::Ptr variance(int n_vars) {
return std::make_shared<DRVarianceStrategy>(n_vars);
}
} // namespace ppforest2::dr
Dimensionality reduction strategies for variable selection.
Definition DRNoopStrategy.hpp:7

Adding a new SRStrategy

An SRStrategy computes the split threshold in the projected space that separates two groups.

Interface to implement

struct SRStrategy : public Strategy<SRStrategy> {
// Compute the split threshold for two groups.
virtual types::Feature threshold(
const types::FeatureMatrix& group_1, // (n1 x p)
const types::FeatureMatrix& group_2, // (n2 x p)
const pp::Projector& projector // (p)
) const = 0;
// Inherited from Strategy<SRStrategy>:
// virtual void to_json(nlohmann::json& j) const = 0;
// static Ptr from_json(const nlohmann::json& j);
};

Example: median-based split

// File: models/SRMedianStrategy.hpp
#pragma once
#include <algorithm>
namespace ppforest2::sr {
// Splits at the median of the combined projected values.
struct SRMedianStrategy : public SRStrategy {
types::Feature threshold(
const types::FeatureMatrix& group_1,
const types::FeatureMatrix& group_2,
const pp::Projector& projector) const override {
auto proj_1 = group_1 * projector;
auto proj_2 = group_2 * projector;
std::vector<float> all;
for (int i = 0; i < proj_1.rows(); ++i) all.push_back(proj_1(i));
for (int i = 0; i < proj_2.rows(); ++i) all.push_back(proj_2(i));
std::nth_element(all.begin(), all.begin() + all.size() / 2, all.end());
return all[all.size() / 2];
}
void to_json(nlohmann::json& j) const override {
j = {{"name", "median"}};
}
static SRStrategy::Ptr from_json(const nlohmann::json& j) {
validate_json_keys(j, "median SR", {"name"});
return median();
}
PPFOREST2_REGISTER_STRATEGY(SRStrategy, "median")
};
inline SRStrategy::Ptr median() {
return std::make_shared<SRMedianStrategy>();
}
} // namespace ppforest2::sr
Split rule strategies for computing decision thresholds.
Definition SRMeanOfMeansStrategy.hpp:7

Composing strategies

Once you have a new strategy, compose it into a TrainingSpec:

// Mix and match any PP + DR + SR strategies:
TrainingSpec spec(
pp::pda(0.5), // existing PDA strategy
dr::variance(4), // your new DR strategy
sr::median(), // your new SR strategy
100, 0); // size, seed
// Train via the unified entry point:
auto model = Model::train(spec, x, y);
// Or train directly:
stats::RNG rng(0);
Tree tree = Tree::train(spec, x, y, rng);

TrainingSpec is a concrete class — there is no need to subclass it. New strategies are plugged in via the constructor. Strategies are held via shared_ptr, so TrainingSpec can be freely copied and shared across trees.

Checklist for adding a new strategy

  1. Create models/MyStrategy.hpp (and .cpp if needed).
  2. Inherit from the appropriate base (PPStrategy, DRStrategy, or SRStrategy).
  3. Implement the pure virtual methods (index/optimize, select, or threshold).
  4. Implement to_json() — must include a "name" field.
  5. Implement display_name() — a human-readable label for summaries.
  6. Add a static Ptr from_json(const nlohmann::json& j) — validate keys, construct via factory.
  7. Add PPFOREST2_REGISTER_STRATEGY(Base, "name") inside the struct.
  8. Add a factory function (e.g. inline Ptr my_strategy(...)) in the namespace.
  9. Add the .cpp to src/models/CMakeLists.txt.
  10. Add tests in MyStrategy.test.cpp (JSON round-trip + functional).

That's it — no central registry file or TrainingSpec subclass needed.

See also
Strategy, PPStrategy, DRStrategy, SRStrategy, TrainingSpec