ppforest2 v0.1.0
Projection Pursuit Decision Trees and Random Forests
Loading...
Searching...
No Matches
Extending: Custom Strategies

Tree training in ppforest2 is parameterised by seven pluggable strategy interfaces. You can add new optimisation criteria, variable selection methods, splitting rules, stop conditions, binarization schemes, grouping methods, or leaf creation logic without modifying the tree-building logic.

Interface Namespace Purpose
pp::ProjectionPursuit pp Projection pursuit index optimisation
vars::VariableSelection vars Variable subset selection
cutpoint::Cutpoint cutpoint Split cutpoint computation
stop::StopRule stop Node stopping condition
binarize::Binarization binarize Multiclass → binary regrouping
grouping::Grouping grouping Group partition management
leaf::LeafStrategy leaf Leaf node creation

Architecture

All seven strategy families share the same infrastructure via the CRTP base class Strategy<Derived>. This base provides:

  • using Ptr = std::shared_ptr<Derived> — the canonical pointer type.
  • to_json() — pure virtual, every strategy must serialize itself.
  • from_json(json) — static dispatcher that looks up the "name" field in the JSON and delegates to the registered factory.
  • register_strategy(name, factory) — adds a factory to the registry.

Each strategy family gets its own independent registry because Strategy<ProjectionPursuit> and Strategy<VariableSelection> are different template instantiations.

All strategies share a uniform (NodeContext&, RNG&) interface. NodeContext is a mutable struct that accumulates intermediate results as each strategy in the training pipeline executes.

NVI pattern (public entry + protected compute)

Every strategy family uses the non-virtual interface (NVI) pattern:

  • The public entry point on the base class (e.g. optimize() on ProjectionPursuit, select() on VariableSelection, cutpoint() on Cutpoint, should_stop() on StopRule, regroup() on Binarization, split() on Grouping, create_leaf() on LeafStrategy) is a non-virtual method on the base class. It is the single public way to invoke the strategy and handles shared concerns such as skipping work when ctx.aborted is set.
  • The protected virtual compute() on the concrete subclass is where the strategy-specific logic lives. Subclasses override compute() only; they never override or re-expose the public entry point.

This means concrete strategies do not define their own public callable method — they override compute only, and callers always go through the base class's public entry point.

Conventions

All strategies follow the same structural conventions:

  • class, not struct — every strategy is a class with explicit visibility sections (public: / protected: / private:).
  • Immutable configuration — store parameters as const private members set in the constructor.
  • to_json() — returns a JSON object with the strategy's name and parameters for model persistence and display.
  • from_json() — static method that deserializes from JSON, validating keys and returning a Ptr.
  • PPFOREST2_REGISTER_STRATEGY — macro that registers the strategy's from_json at static initialization time.
  • Factory function — a free function in the strategy's namespace that returns a Ptr (e.g. pp::pda(lambda)).
  • compute() — the only virtual method subclasses override. It is protected on every concrete strategy. Callers invoke the base class's public entry point instead.

Strategies are held via shared_ptr and are immutable after construction, so they can be freely shared across trees without deep cloning.

Self-Registration

New strategies are automatically available for JSON deserialization by adding two things to the concrete class:

  1. A static Ptr from_json(const nlohmann::json& j) method.
  2. The PPFOREST2_REGISTER_STRATEGY(Base, "name") macro.

The macro uses inline static initialization to register the factory at program startup. No central registry file needs to be modified.

class MyStrategy : public ProjectionPursuit {
public:
static ProjectionPursuit::Ptr from_json(const nlohmann::json& j);
PPFOREST2_REGISTER_STRATEGY(ProjectionPursuit, "my_strategy")
protected:
void compute(NodeContext& ctx, stats::RNG& rng) const override;
};
#define PPFOREST2_REGISTER_STRATEGY(StrategyBase, name)
Auto-registration macro for strategy factories.
Definition Strategy.hpp:185

After this, ProjectionPursuit::from_json({"name": "my_strategy", ...}) will automatically dispatch to MyStrategy::from_json.

Adding a new pp::ProjectionPursuit

A ProjectionPursuit defines how to find the optimal 1D projection for a given dataset and group partition.

Interface to implement

class ProjectionPursuit : public Strategy<ProjectionPursuit> {
public:
// Public NVI entry point (non-virtual, provided by the base):
void optimize(NodeContext& ctx, stats::RNG& rng) const;
// Inherited from Strategy<ProjectionPursuit>:
// virtual nlohmann::json to_json() const = 0;
// static Ptr from_json(const nlohmann::json& j);
protected:
// Concrete strategies override this and nothing else:
virtual void compute(NodeContext& ctx, stats::RNG& rng) const = 0;
};
CRTP base class providing self-registration for strategy types.
Definition Strategy.hpp:93

Example: a random projection strategy

// File: models/strategies/pp/Random.hpp
#pragma once
namespace ppforest2::pp {
class Random : public ProjectionPursuit {
public:
static ProjectionPursuit::Ptr from_json(const nlohmann::json& j) {
JsonReader{j, "random"}.only_keys({"name"});
return random();
}
nlohmann::json to_json() const override {
return {{"name", "random"}};
}
std::string display_name() const override { return "Random"; }
std::set<types::Mode> supported_modes() const override {
return {types::Mode::Classification, types::Mode::Regression};
}
PPFOREST2_REGISTER_STRATEGY(ProjectionPursuit, "random")
protected:
void compute(NodeContext& ctx, stats::RNG& rng) const override {
auto const& cols = ctx.var_selection->selected_cols;
ppforest2::pp::Projector reduced = ppforest2::pp::Projector::Random(cols.size());
reduced.normalize();
ctx.projector = ctx.var_selection->expand(reduced);
ctx.pp_index_value = 0;
}
};
inline ProjectionPursuit::Ptr random() {
return std::make_shared<Random>();
}
} // namespace ppforest2::pp
std::shared_ptr< ProjectionPursuit > Ptr
Definition Strategy.hpp:95
Definition Projector.hpp:4
types::FeatureVector Projector
Column vector of projection coefficients (one per variable).
Definition Projector.hpp:6
json to_json(types::OutcomeVector const &y, types::Names const &names)
Serialize a prediction vector as JSON.

Adding a new vars::VariableSelection

A VariableSelection selects a subset of variables (columns) before projection pursuit runs. This reduces cost and introduces diversity in forests.

Interface to implement

class VariableSelection : public Strategy<VariableSelection> {
public:
// Public NVI entry point:
void select(NodeContext& ctx, stats::RNG& rng) const;
// Inherited from Strategy<VariableSelection>:
// virtual nlohmann::json to_json() const = 0;
// static Ptr from_json(const nlohmann::json& j);
protected:
virtual void compute(NodeContext& ctx, stats::RNG& rng) const = 0;
};

The result is written to ctx.var_selection (a VariableSelection::Result that records which columns were selected and allows expanding a reduced-dimension projector back to the original space).

Adding a new cutpoint::Cutpoint

A Cutpoint computes the split cutpoint in the projected space that separates two groups.

Interface to implement

class Cutpoint : public Strategy<Cutpoint> {
public:
// Public NVI entry point:
void cutpoint(NodeContext& ctx, stats::RNG& rng) const;
// Inherited from Strategy<Cutpoint>:
// virtual nlohmann::json to_json() const = 0;
// static Ptr from_json(const nlohmann::json& j);
protected:
virtual void compute(NodeContext& ctx, stats::RNG& rng) const = 0;
};

Composing strategies

Once you have a new strategy, compose it into a TrainingSpec via the builder:

using namespace ppforest2;
.size(100) // forest of 100 trees
.seed(0)
.pp(pp::pda(0.5)) // existing PDA strategy
.vars(vars::uniform(4)) // uniform variable selection
.build();
// Train via the unified entry point:
auto model = Model::train(spec, x, y);
static Ptr train(TrainingSpec const &spec, types::FeatureMatrix &x, types::OutcomeVector &y)
Train a model from a training specification.
static Builder builder(types::Mode mode)
Create a builder for the given mode.
Definition TrainingSpec.hpp:227
Cutpoint::Ptr mean_of_means()
Factory function for mean-of-means split cutpoint.
ProjectionPursuit::Ptr pda(float lambda)
Factory function for a PDA projection pursuit strategy.
@ Classification
Definition Types.hpp:58
VariableSelection::Ptr uniform(int n_vars)
Factory function: uniform random variable selection.
Binarization strategies for multiclass-to-binary reduction.
Definition Benchmark.hpp:25

TrainingSpec is a concrete class — there is no need to subclass it. New strategies are plugged in via the builder. Strategies are held via shared_ptr, so TrainingSpec can be freely copied and shared across trees.

Checklist for adding a new strategy

  1. Create models/strategies/<family>/MyStrategy.hpp (and .cpp if needed).
  2. Declare it as a class inheriting from the appropriate base (e.g. ProjectionPursuit, VariableSelection).
  3. Override compute() (protected) — the single virtual method. Do not override the public entry point on the base class.
  4. Implement to_json() — must return a JSON object with a "name" field.
  5. Implement display_name() — a human-readable label for summaries.
  6. Implement supported_modes() — set of types::Mode values the strategy works with.
  7. Add a static Ptr from_json(const nlohmann::json& j) — validate keys, construct via factory.
  8. Add PPFOREST2_REGISTER_STRATEGY(Base, "name") inside the class.
  9. Add a factory function (e.g. inline Ptr my_strategy(...)) in the namespace.
  10. Add the .cpp to CMakeLists.txt.
  11. Add tests in MyStrategy.test.cpp (JSON round-trip + functional).

That's it — no central registry file or TrainingSpec subclass needed.

See also
Strategy, ProjectionPursuit, VariableSelection, Cutpoint, StopRule, Binarization, Grouping, LeafStrategy, TrainingSpec