ppforest2 v0.1.0
Projection Pursuit Decision Trees and Random Forests
Loading...
Searching...
No Matches
ppforest2::stats::GroupPartition Class Reference

Contiguous-block representation of grouped observations. More...

#include <GroupPartition.hpp>

Public Types

using SplitSizes = std::map<types::GroupId, int>
 

Public Member Functions

 GroupPartition (int start, int end)
 Construct a single-group partition covering rows [start, end].
 
 GroupPartition (types::GroupIdVector const &y)
 Construct from a sorted response vector.
 
 GroupPartition (types::OutcomeVector const &y)
 Construct from a float-typed response vector.
 
types::FeatureMatrix bgss (types::FeatureMatrix const &x) const
 Between-group sum of squares matrix (p × p).
 
GroupPartition bisect (int mid) const
 Bisect a single-group partition at row index mid into two groups.
 
GroupPartition collapse () const
 Collapse all groups into a single supergroup.
 
template<typename Derived>
auto data (Eigen::MatrixBase< Derived > const &x) const
 Extract all rows across all groups.
 
Group first_group () const
 Smallest group label in the partition.
 
template<typename Derived>
auto group (Eigen::MatrixBase< Derived > const &x, Group const &group) const
 Extract rows belonging to a group (or supergroup).
 
int group_end (Group const &group) const
 Last row index (inclusive) of the block for group.
 
int group_size (Group const &group) const
 Number of observations in group.
 
int group_start (Group const &group) const
 First row index of the block for group.
 
types::FeatureVector mean (types::FeatureMatrix const &x) const
 Overall mean of all grouped rows (p).
 
GroupPartition remap (GroupMap const &mapping) const
 Merge groups according to a mapping.
 
std::pair< GroupPartition, GroupPartitionsplit (SplitSizes const &left_sizes) const
 Split each group's block into left and right children.
 
GroupPartition subset (GroupSet const &groups) const
 Create a partition containing only the given groups.
 
int total_size () const
 Total number of observations across all groups in the partition.
 
types::FeatureMatrix wgss (types::FeatureMatrix const &x) const
 Within-group sum of squares matrix (p × p).
 

Static Public Member Functions

static bool is_contiguous (GroupVector const &y)
 Check whether all equal values in y form a single contiguous block.
 

Public Attributes

GroupSet const groups
 Set of all group labels in this partition.
 
GroupInvMap const subgroups
 Maps each group to its set of subgroups.
 
GroupMap const supergroups
 Maps each group to its supergroup (identity if no merge).
 

Detailed Description

Contiguous-block representation of grouped observations.

Assumes the response vector is sorted so that observations of the same group are contiguous. Stores the start/end row indices of each group block and provides efficient extraction, subsetting, and computation of between- and within-group statistics.

Groups can be hierarchically merged via remap(), which assigns supergroup labels while tracking the original subgroups.

// y must be sorted so equal values are contiguous.
GroupPartition y_part(y);
// Extract rows belonging to group 0:
auto x_group0 = y_part.group(x, 0);
// Between- and within-group statistics:
auto B = y_part.bgss(x); // between-group sum of squares (p × p)
auto W = y_part.wgss(x); // within-group sum of squares (p × p)
// Restrict to a subset of groups:
GroupPartition sub = y_part.subset({0, 2});
GroupPartition(types::GroupIdVector const &y)
Construct from a sorted response vector.

Member Typedef Documentation

◆ SplitSizes

Constructor & Destructor Documentation

◆ GroupPartition() [1/3]

ppforest2::stats::GroupPartition::GroupPartition ( types::GroupIdVector const & y)
explicit

Construct from a sorted response vector.

Parameters
yOutcome vector (n) with contiguous group blocks.

◆ GroupPartition() [2/3]

ppforest2::stats::GroupPartition::GroupPartition ( types::OutcomeVector const & y)
explicit

Construct from a float-typed response vector.

Classification y is carried as OutcomeVector (float) throughout the training pipeline; this overload casts to integer labels internally before building the block map. Values must encode integer labels.

◆ GroupPartition() [3/3]

ppforest2::stats::GroupPartition::GroupPartition ( int start,
int end )

Construct a single-group partition covering rows [start, end].

Group label is 0. Use bisect(mid) to split into a 2-group partition.

Member Function Documentation

◆ bgss()

types::FeatureMatrix ppforest2::stats::GroupPartition::bgss ( types::FeatureMatrix const & x) const

Between-group sum of squares matrix (p × p).

◆ bisect()

GroupPartition ppforest2::stats::GroupPartition::bisect ( int mid) const

Bisect a single-group partition at row index mid into two groups.

Group 0 covers [start, mid - 1], group 1 covers [mid, end]. The receiver must currently be a single-group partition (typically built via the (int, int) ctor). Distinct from split(SplitSizes) below, which subsets a multi-group partition along its existing structure.

Exceptions
viainvariant if the partition has more than one group, or if mid is outside (start, end].

◆ collapse()

GroupPartition ppforest2::stats::GroupPartition::collapse ( ) const

Collapse all groups into a single supergroup.

Returns
New GroupPartition with one supergroup containing all groups.

◆ data()

template<typename Derived>
auto ppforest2::stats::GroupPartition::data ( Eigen::MatrixBase< Derived > const & x) const
inline

Extract all rows across all groups.

Parameters
xFeature matrix (n × p).
Returns
Sub-matrix with all grouped rows.

◆ first_group()

Group ppforest2::stats::GroupPartition::first_group ( ) const
inline

Smallest group label in the partition.

groups is a std::set<GroupId> so iteration is in ascending key order; this returns the first such label. Caller must ensure the partition is non-empty.

◆ group()

template<typename Derived>
auto ppforest2::stats::GroupPartition::group ( Eigen::MatrixBase< Derived > const & x,
Group const & group ) const
inline

Extract rows belonging to a group (or supergroup).

Returns an Eigen block expression (zero-copy view) into x. The result must be consumed immediately or assigned to a concrete matrix — do not store it in auto across statements.

Parameters
xFeature matrix (n × p).
groupGroup label.
Returns
Block expression over the rows of group.

◆ group_end()

int ppforest2::stats::GroupPartition::group_end ( Group const & group) const

Last row index (inclusive) of the block for group.

◆ group_size()

int ppforest2::stats::GroupPartition::group_size ( Group const & group) const

Number of observations in group.

◆ group_start()

int ppforest2::stats::GroupPartition::group_start ( Group const & group) const

First row index of the block for group.

◆ is_contiguous()

static bool ppforest2::stats::GroupPartition::is_contiguous ( GroupVector const & y)
static

Check whether all equal values in y form a single contiguous block.

◆ mean()

types::FeatureVector ppforest2::stats::GroupPartition::mean ( types::FeatureMatrix const & x) const

Overall mean of all grouped rows (p).

◆ remap()

GroupPartition ppforest2::stats::GroupPartition::remap ( GroupMap const & mapping) const

Merge groups according to a mapping.

Parameters
mappingMaps original group labels to supergroup labels.
Returns
New GroupPartition with merged groups.

◆ split()

std::pair< GroupPartition, GroupPartition > ppforest2::stats::GroupPartition::split ( SplitSizes const & left_sizes) const

Split each group's block into left and right children.

For each leaf group, left_sizes specifies how many rows go to the left child (the first rows of the block). The remaining rows go to the right child. Groups absent from left_sizes go entirely to the right child (left_count = 0).

The caller is responsible for having already reordered rows within each block so that left-bound observations come first.

Parameters
left_sizesMaps each leaf group to its left child row count.
Returns
Pair of {left, right} GroupPartitions.

◆ subset()

GroupPartition ppforest2::stats::GroupPartition::subset ( GroupSet const & groups) const

Create a partition containing only the given groups.

Parameters
groupsSet of group labels to keep.
Returns
New GroupPartition restricted to groups.

◆ total_size()

int ppforest2::stats::GroupPartition::total_size ( ) const

Total number of observations across all groups in the partition.

Used by the tree builder to detect "no-progress" grouping splits: if a child partition covers the same row count as its parent, the split failed to partition the data and the builder converts the node to a leaf to avoid unbounded recursion.

◆ wgss()

types::FeatureMatrix ppforest2::stats::GroupPartition::wgss ( types::FeatureMatrix const & x) const

Within-group sum of squares matrix (p × p).

Member Data Documentation

◆ groups

GroupSet const ppforest2::stats::GroupPartition::groups

Set of all group labels in this partition.

◆ subgroups

GroupInvMap const ppforest2::stats::GroupPartition::subgroups

Maps each group to its set of subgroups.

◆ supergroups

GroupMap const ppforest2::stats::GroupPartition::supergroups

Maps each group to its supergroup (identity if no merge).


The documentation for this class was generated from the following file: