|
ppforest2 v0.1.0
Projection Pursuit Decision Trees and Random Forests
|
Statistical infrastructure for training and evaluation. More...
Classes | |
| struct | ConfusionMatrix |
| A confusion matrix comparing predicted vs actual group labels. More... | |
| struct | DataPacket |
| Bundled dataset: features, responses, and group labels. More... | |
| class | GroupPartition |
| Contiguous-block representation of grouped observations. More... | |
| class | Normal |
| Normal (Gaussian) random number generator. More... | |
| struct | SimulationParams |
| Parameters for generating simulated classification data. More... | |
| struct | Split |
| Indices for a train/test split. More... | |
| class | Uniform |
| Discrete uniform random integer generator over [min, max]. More... | |
Typedefs | |
| using | RNG = pcg32 |
Functions | |
| float | accuracy (types::ResponseVector const &predictions, types::ResponseVector const &actual) |
| Accuracy of a prediction. | |
| double | error_rate (types::ResponseVector const &predictions, types::ResponseVector const &actual) |
| Error rate of a prediction. | |
| std::map< int, int > | get_labels_map (types::ResponseVector const &groups) |
| Build a sorted mapping from unique group labels to contiguous indices. | |
| types::FeatureVector | sd (types::FeatureMatrix const &data) |
| Column-wise sample standard deviation of a matrix. | |
| double | sd (types::FeatureVector const &data) |
| Sample standard deviation of a vector. | |
| DataPacket | simulate (int n, int p, int G, RNG &rng, SimulationParams const ¶ms=SimulationParams{}) |
| Generate a simulated dataset with G groups, n rows, and p features. | |
| void | sort (types::FeatureMatrix &x, types::ResponseVector &y) |
| Sort a feature matrix and a response vector by the response values. | |
| Split | split (DataPacket const &data, float train_ratio, RNG &rng) |
| Perform a stratified random train/test split on a DataPacket. | |
| std::set< types::Response > | unique (types::ResponseVector const &column) |
| Unique values of a response vector. | |
Statistical infrastructure for training and evaluation.
Provides the random number generator (pcg32), discrete uniform sampling (Lemire's method), grouped-observation bookkeeping (GroupPartition), confusion matrices, data simulation, and basic descriptive statistics used throughout the training pipeline.
| using ppforest2::stats::RNG = pcg32 |
| float ppforest2::stats::accuracy | ( | types::ResponseVector const & | predictions, |
| types::ResponseVector const & | actual ) |
Accuracy of a prediction.
| predictions | Predicted response vector. |
| actual | Actual response vector. |
| double ppforest2::stats::error_rate | ( | types::ResponseVector const & | predictions, |
| types::ResponseVector const & | actual ) |
Error rate of a prediction.
| predictions | Predicted response vector. |
| actual | Actual response vector. |
| std::map< int, int > ppforest2::stats::get_labels_map | ( | types::ResponseVector const & | groups | ) |
Build a sorted mapping from unique group labels to contiguous indices.
| groups | A response vector containing group labels. |
| types::FeatureVector ppforest2::stats::sd | ( | types::FeatureMatrix const & | data | ) |
Column-wise sample standard deviation of a matrix.
| data | Feature matrix with at least 2 rows. |
| double ppforest2::stats::sd | ( | types::FeatureVector const & | data | ) |
Sample standard deviation of a vector.
| data | Feature vector with at least one row. |
| DataPacket ppforest2::stats::simulate | ( | int | n, |
| int | p, | ||
| int | G, | ||
| RNG & | rng, | ||
| SimulationParams const & | params = SimulationParams{} ) |
Generate a simulated dataset with G groups, n rows, and p features.
Each group is drawn from a normal distribution with a shifted mean. The resulting data is sorted by group label.
| n | Number of rows (observations). |
| p | Number of feature columns. |
| G | Number of groups (must be > 1). |
| rng | Random number generator. |
| params | Simulation parameters (mean, separation, sd). |
| void ppforest2::stats::sort | ( | types::FeatureMatrix & | x, |
| types::ResponseVector & | y ) |
Sort a feature matrix and a response vector by the response values.
| x | Feature matrix. |
| y | Response vector. |
| Split ppforest2::stats::split | ( | DataPacket const & | data, |
| float | train_ratio, | ||
| RNG & | rng ) |
Perform a stratified random train/test split on a DataPacket.
Samples indices within each group proportional to train_ratio so that group balance is preserved in both train and test sets.
| data | The full dataset. |
| train_ratio | Proportion of data to use for training (0, 1). |
| rng | Random number generator. |
| std::set< types::Response > ppforest2::stats::unique | ( | types::ResponseVector const & | column | ) |
Unique values of a response vector.
| column | Response vector. |