|
ppforest2 v0.1.0
Projection Pursuit Decision Trees and Random Forests
|
Tree training in ppforest2 is parameterised by seven pluggable strategy interfaces. You can add new optimisation criteria, variable selection methods, splitting rules, stop conditions, binarization schemes, grouping methods, or leaf creation logic without modifying the tree-building logic.
| Interface | Namespace | Purpose |
|---|---|---|
| pp::ProjectionPursuit | pp | Projection pursuit index optimisation |
| vars::VariableSelection | vars | Variable subset selection |
| cutpoint::Cutpoint | cutpoint | Split cutpoint computation |
| stop::StopRule | stop | Node stopping condition |
| binarize::Binarization | binarize | Multiclass → binary regrouping |
| grouping::Grouping | grouping | Group partition management |
| leaf::LeafStrategy | leaf | Leaf node creation |
All seven strategy families share the same infrastructure via the CRTP base class Strategy<Derived>. This base provides:
using Ptr = std::shared_ptr<Derived> — the canonical pointer type.to_json() — pure virtual, every strategy must serialize itself.from_json(json) — static dispatcher that looks up the "name" field in the JSON and delegates to the registered factory.register_strategy(name, factory) — adds a factory to the registry.Each strategy family gets its own independent registry because Strategy<ProjectionPursuit> and Strategy<VariableSelection> are different template instantiations.
All strategies share a uniform (NodeContext&, RNG&) interface. NodeContext is a mutable struct that accumulates intermediate results as each strategy in the training pipeline executes.
Every strategy family uses the non-virtual interface (NVI) pattern:
optimize() on ProjectionPursuit, select() on VariableSelection, cutpoint() on Cutpoint, should_stop() on StopRule, regroup() on Binarization, split() on Grouping, create_leaf() on LeafStrategy) is a non-virtual method on the base class. It is the single public way to invoke the strategy and handles shared concerns such as skipping work when ctx.aborted is set.compute() on the concrete subclass is where the strategy-specific logic lives. Subclasses override compute() only; they never override or re-expose the public entry point.This means concrete strategies do not define their own public callable method — they override compute only, and callers always go through the base class's public entry point.
All strategies follow the same structural conventions:
class, not struct — every strategy is a class with explicit visibility sections (public: / protected: / private:).const private members set in the constructor.Ptr.from_json at static initialization time.Ptr (e.g. pp::pda(lambda)).protected on every concrete strategy. Callers invoke the base class's public entry point instead.Strategies are held via shared_ptr and are immutable after construction, so they can be freely shared across trees without deep cloning.
New strategies are automatically available for JSON deserialization by adding two things to the concrete class:
static Ptr from_json(const nlohmann::json& j) method.PPFOREST2_REGISTER_STRATEGY(Base, "name") macro.The macro uses inline static initialization to register the factory at program startup. No central registry file needs to be modified.
After this, ProjectionPursuit::from_json({"name": "my_strategy", ...}) will automatically dispatch to MyStrategy::from_json.
A ProjectionPursuit defines how to find the optimal 1D projection for a given dataset and group partition.
A VariableSelection selects a subset of variables (columns) before projection pursuit runs. This reduces cost and introduces diversity in forests.
The result is written to ctx.var_selection (a VariableSelection::Result that records which columns were selected and allows expanding a reduced-dimension projector back to the original space).
A Cutpoint computes the split cutpoint in the projected space that separates two groups.
Once you have a new strategy, compose it into a TrainingSpec via the builder:
TrainingSpec is a concrete class — there is no need to subclass it. New strategies are plugged in via the builder. Strategies are held via shared_ptr, so TrainingSpec can be freely copied and shared across trees.
models/strategies/<family>/MyStrategy.hpp (and .cpp if needed).class inheriting from the appropriate base (e.g. ProjectionPursuit, VariableSelection).compute() (protected) — the single virtual method. Do not override the public entry point on the base class.to_json() — must return a JSON object with a "name" field.display_name() — a human-readable label for summaries.supported_modes() — set of types::Mode values the strategy works with.static Ptr from_json(const nlohmann::json& j) — validate keys, construct via factory.PPFOREST2_REGISTER_STRATEGY(Base, "name") inside the class.inline Ptr my_strategy(...)) in the namespace..cpp to CMakeLists.txt.MyStrategy.test.cpp (JSON round-trip + functional).That's it — no central registry file or TrainingSpec subclass needed.