|
ppforest2 v0.1.0
Projection Pursuit Decision Trees and Random Forests
|
Contiguous-block representation of grouped observations. More...
#include <GroupPartition.hpp>
Public Types | |
| using | SplitSizes = std::map<types::GroupId, int> |
Public Member Functions | |
| GroupPartition (int start, int end) | |
Construct a single-group partition covering rows [start, end]. | |
| GroupPartition (types::GroupIdVector const &y) | |
| Construct from a sorted response vector. | |
| GroupPartition (types::OutcomeVector const &y) | |
| Construct from a float-typed response vector. | |
| types::FeatureMatrix | bgss (types::FeatureMatrix const &x) const |
| Between-group sum of squares matrix (p × p). | |
| GroupPartition | bisect (int mid) const |
Bisect a single-group partition at row index mid into two groups. | |
| GroupPartition | collapse () const |
| Collapse all groups into a single supergroup. | |
| template<typename Derived> | |
| auto | data (Eigen::MatrixBase< Derived > const &x) const |
| Extract all rows across all groups. | |
| Group | first_group () const |
| Smallest group label in the partition. | |
| template<typename Derived> | |
| auto | group (Eigen::MatrixBase< Derived > const &x, Group const &group) const |
| Extract rows belonging to a group (or supergroup). | |
| int | group_end (Group const &group) const |
Last row index (inclusive) of the block for group. | |
| int | group_size (Group const &group) const |
Number of observations in group. | |
| int | group_start (Group const &group) const |
First row index of the block for group. | |
| types::FeatureVector | mean (types::FeatureMatrix const &x) const |
| Overall mean of all grouped rows (p). | |
| GroupPartition | remap (GroupMap const &mapping) const |
| Merge groups according to a mapping. | |
| std::pair< GroupPartition, GroupPartition > | split (SplitSizes const &left_sizes) const |
| Split each group's block into left and right children. | |
| GroupPartition | subset (GroupSet const &groups) const |
| Create a partition containing only the given groups. | |
| int | total_size () const |
| Total number of observations across all groups in the partition. | |
| types::FeatureMatrix | wgss (types::FeatureMatrix const &x) const |
| Within-group sum of squares matrix (p × p). | |
Static Public Member Functions | |
| static bool | is_contiguous (GroupVector const &y) |
Check whether all equal values in y form a single contiguous block. | |
Public Attributes | |
| GroupSet const | groups |
| Set of all group labels in this partition. | |
| GroupInvMap const | subgroups |
| Maps each group to its set of subgroups. | |
| GroupMap const | supergroups |
| Maps each group to its supergroup (identity if no merge). | |
Contiguous-block representation of grouped observations.
Assumes the response vector is sorted so that observations of the same group are contiguous. Stores the start/end row indices of each group block and provides efficient extraction, subsetting, and computation of between- and within-group statistics.
Groups can be hierarchically merged via remap(), which assigns supergroup labels while tracking the original subgroups.
| using ppforest2::stats::GroupPartition::SplitSizes = std::map<types::GroupId, int> |
|
explicit |
Construct from a sorted response vector.
| y | Outcome vector (n) with contiguous group blocks. |
|
explicit |
Construct from a float-typed response vector.
Classification y is carried as OutcomeVector (float) throughout the training pipeline; this overload casts to integer labels internally before building the block map. Values must encode integer labels.
| ppforest2::stats::GroupPartition::GroupPartition | ( | int | start, |
| int | end ) |
Construct a single-group partition covering rows [start, end].
Group label is 0. Use bisect(mid) to split into a 2-group partition.
| types::FeatureMatrix ppforest2::stats::GroupPartition::bgss | ( | types::FeatureMatrix const & | x | ) | const |
Between-group sum of squares matrix (p × p).
| GroupPartition ppforest2::stats::GroupPartition::bisect | ( | int | mid | ) | const |
Bisect a single-group partition at row index mid into two groups.
Group 0 covers [start, mid - 1], group 1 covers [mid, end]. The receiver must currently be a single-group partition (typically built via the (int, int) ctor). Distinct from split(SplitSizes) below, which subsets a multi-group partition along its existing structure.
| via | invariant if the partition has more than one group, or if mid is outside (start, end]. |
| GroupPartition ppforest2::stats::GroupPartition::collapse | ( | ) | const |
Collapse all groups into a single supergroup.
|
inline |
Extract all rows across all groups.
| x | Feature matrix (n × p). |
|
inline |
Smallest group label in the partition.
groups is a std::set<GroupId> so iteration is in ascending key order; this returns the first such label. Caller must ensure the partition is non-empty.
|
inline |
Extract rows belonging to a group (or supergroup).
Returns an Eigen block expression (zero-copy view) into x. The result must be consumed immediately or assigned to a concrete matrix — do not store it in auto across statements.
| x | Feature matrix (n × p). |
| group | Group label. |
group. | int ppforest2::stats::GroupPartition::group_end | ( | Group const & | group | ) | const |
Last row index (inclusive) of the block for group.
| int ppforest2::stats::GroupPartition::group_size | ( | Group const & | group | ) | const |
Number of observations in group.
| int ppforest2::stats::GroupPartition::group_start | ( | Group const & | group | ) | const |
First row index of the block for group.
|
static |
Check whether all equal values in y form a single contiguous block.
| types::FeatureVector ppforest2::stats::GroupPartition::mean | ( | types::FeatureMatrix const & | x | ) | const |
Overall mean of all grouped rows (p).
| GroupPartition ppforest2::stats::GroupPartition::remap | ( | GroupMap const & | mapping | ) | const |
Merge groups according to a mapping.
| mapping | Maps original group labels to supergroup labels. |
| std::pair< GroupPartition, GroupPartition > ppforest2::stats::GroupPartition::split | ( | SplitSizes const & | left_sizes | ) | const |
Split each group's block into left and right children.
For each leaf group, left_sizes specifies how many rows go to the left child (the first rows of the block). The remaining rows go to the right child. Groups absent from left_sizes go entirely to the right child (left_count = 0).
The caller is responsible for having already reordered rows within each block so that left-bound observations come first.
| left_sizes | Maps each leaf group to its left child row count. |
| GroupPartition ppforest2::stats::GroupPartition::subset | ( | GroupSet const & | groups | ) | const |
Create a partition containing only the given groups.
| groups | Set of group labels to keep. |
groups. | int ppforest2::stats::GroupPartition::total_size | ( | ) | const |
Total number of observations across all groups in the partition.
Used by the tree builder to detect "no-progress" grouping splits: if a child partition covers the same row count as its parent, the split failed to partition the data and the builder converts the node to a leaf to avoid unbounded recursion.
| types::FeatureMatrix ppforest2::stats::GroupPartition::wgss | ( | types::FeatureMatrix const & | x | ) | const |
Within-group sum of squares matrix (p × p).
| GroupSet const ppforest2::stats::GroupPartition::groups |
Set of all group labels in this partition.
| GroupInvMap const ppforest2::stats::GroupPartition::subgroups |
Maps each group to its set of subgroups.
| GroupMap const ppforest2::stats::GroupPartition::supergroups |
Maps each group to its supergroup (identity if no merge).