|
ppforest2 v0.1.0
Projection Pursuit Decision Trees and Random Forests
|
Classes | |
| struct | FeatureSet |
| Result of parsing a feature-only CSV (no response column). More... | |
Functions | |
| stats::DataPacket | read (std::string const &filename) |
| Read a CSV file into a DataPacket. | |
| FeatureSet | read_features_from_string (std::string const &content) |
| Parse a feature-only CSV from an in-memory string. | |
| stats::DataPacket | read_sorted (std::string const &filename) |
| Read a CSV file and sort rows ascending by the response column. | |
| void | write (stats::DataPacket const &data, std::string const &filename) |
| Write a DataPacket to a CSV file (features followed by label, no header). | |
| stats::DataPacket ppforest2::io::csv::read | ( | std::string const & | filename | ) |
Read a CSV file into a DataPacket.
Assumes the last column is the response variable (group label as string) and all preceding columns are features. Categorical feature columns are automatically detected and integer-encoded. String labels are mapped to contiguous integer codes starting at 0.
| filename | Path to the CSV file. |
| std::runtime_error | If the file is empty or has inconsistent columns. |
| FeatureSet ppforest2::io::csv::read_features_from_string | ( | std::string const & | content | ) |
Parse a feature-only CSV from an in-memory string.
The first row is the header (feature names); every subsequent row is one observation. There is no response column. Used by the serve subcommand to parse POST /predict request bodies — categorical encoding runs per-call, so callers must encode categoricals consistently with the training data.
| UserError | on empty body, missing header, no data rows, or malformed shape. |
| stats::DataPacket ppforest2::io::csv::read_sorted | ( | std::string const & | filename | ) |
Read a CSV file and sort rows ascending by the response column.
Mode is detected from the y column's written form:
., e, E), y is parsed as a continuous float response (regression shape) and group_names is empty.group_names carries the original label strings (classification shape). This keeps integer-coded label CSVs like Wine and Glass on the classification path.Rows are sorted ascending by the encoded y, which gives the training routines what they need: classification — contiguous groups; regression — y-ordered rows for ByCutpoint::init's median split.
| UserError | on any failure (missing file, parse error, malformed shape) — CSV reading failures are user-facing by nature. |
| void ppforest2::io::csv::write | ( | stats::DataPacket const & | data, |
| std::string const & | filename ) |
Write a DataPacket to a CSV file (features followed by label, no header).
| data | The DataPacket to write. |
| filename | Output file path. |