ppforest2 v0.1.0
Projection Pursuit Decision Trees and Random Forests
Loading...
Searching...
No Matches
ppforest2::stats Namespace Reference

Statistical infrastructure for training and evaluation. More...

Classes

struct  ConfusionMatrix
 A confusion matrix comparing predicted vs actual group labels. More...
 
struct  DataPacket
 Bundled dataset: features, responses, and group labels. More...
 
class  GroupPartition
 Contiguous-block representation of grouped observations. More...
 
class  Normal
 Normal (Gaussian) random number generator. More...
 
struct  SimulationParams
 Parameters for generating simulated classification data. More...
 
struct  Split
 Indices for a train/test split. More...
 
class  Uniform
 Discrete uniform random integer generator over [min, max]. More...
 

Typedefs

using RNG = pcg32
 

Functions

float accuracy (types::ResponseVector const &predictions, types::ResponseVector const &actual)
 Accuracy of a prediction.
 
double error_rate (types::ResponseVector const &predictions, types::ResponseVector const &actual)
 Error rate of a prediction.
 
std::map< int, int > get_labels_map (types::ResponseVector const &groups)
 Build a sorted mapping from unique group labels to contiguous indices.
 
types::FeatureVector sd (types::FeatureMatrix const &data)
 Column-wise sample standard deviation of a matrix.
 
double sd (types::FeatureVector const &data)
 Sample standard deviation of a vector.
 
DataPacket simulate (int n, int p, int G, RNG &rng, SimulationParams const &params=SimulationParams{})
 Generate a simulated dataset with G groups, n rows, and p features.
 
void sort (types::FeatureMatrix &x, types::ResponseVector &y)
 Sort a feature matrix and a response vector by the response values.
 
Split split (DataPacket const &data, float train_ratio, RNG &rng)
 Perform a stratified random train/test split on a DataPacket.
 
std::set< types::Responseunique (types::ResponseVector const &column)
 Unique values of a response vector.
 

Detailed Description

Statistical infrastructure for training and evaluation.

Provides the random number generator (pcg32), discrete uniform sampling (Lemire's method), grouped-observation bookkeeping (GroupPartition), confusion matrices, data simulation, and basic descriptive statistics used throughout the training pipeline.

Typedef Documentation

◆ RNG

using ppforest2::stats::RNG = pcg32

Function Documentation

◆ accuracy()

float ppforest2::stats::accuracy ( types::ResponseVector const & predictions,
types::ResponseVector const & actual )

Accuracy of a prediction.

Parameters
predictionsPredicted response vector.
actualActual response vector.
Returns
Accuracy (0 to 1).

◆ error_rate()

double ppforest2::stats::error_rate ( types::ResponseVector const & predictions,
types::ResponseVector const & actual )

Error rate of a prediction.

Parameters
predictionsPredicted response vector.
actualActual response vector.
Returns
Error rate (0 to 1).

◆ get_labels_map()

std::map< int, int > ppforest2::stats::get_labels_map ( types::ResponseVector const & groups)

Build a sorted mapping from unique group labels to contiguous indices.

Parameters
groupsA response vector containing group labels.
Returns
A map from label value to its 0-based index.

◆ sd() [1/2]

types::FeatureVector ppforest2::stats::sd ( types::FeatureMatrix const & data)

Column-wise sample standard deviation of a matrix.

Parameters
dataFeature matrix with at least 2 rows.
Returns
FeatureVector of size p (one σ per column).

◆ sd() [2/2]

double ppforest2::stats::sd ( types::FeatureVector const & data)

Sample standard deviation of a vector.

Parameters
dataFeature vector with at least one row.
Returns
Sample standard deviation.

◆ simulate()

DataPacket ppforest2::stats::simulate ( int n,
int p,
int G,
RNG & rng,
SimulationParams const & params = SimulationParams{} )

Generate a simulated dataset with G groups, n rows, and p features.

Each group is drawn from a normal distribution with a shifted mean. The resulting data is sorted by group label.

Parameters
nNumber of rows (observations).
pNumber of feature columns.
GNumber of groups (must be > 1).
rngRandom number generator.
paramsSimulation parameters (mean, separation, sd).
Returns
A DataPacket with the simulated feature matrix and response vector.

◆ sort()

void ppforest2::stats::sort ( types::FeatureMatrix & x,
types::ResponseVector & y )

Sort a feature matrix and a response vector by the response values.

Parameters
xFeature matrix.
yResponse vector.

◆ split()

Split ppforest2::stats::split ( DataPacket const & data,
float train_ratio,
RNG & rng )

Perform a stratified random train/test split on a DataPacket.

Samples indices within each group proportional to train_ratio so that group balance is preserved in both train and test sets.

Parameters
dataThe full dataset.
train_ratioProportion of data to use for training (0, 1).
rngRandom number generator.
Returns
A Split containing train and test index vectors.

◆ unique()

std::set< types::Response > ppforest2::stats::unique ( types::ResponseVector const & column)

Unique values of a response vector.

Parameters
columnResponse vector.
Returns
Set of unique response values.