ppforest2 v0.1.0
Projection Pursuit Decision Trees and Random Forests
Loading...
Searching...
No Matches
ppforest2::io::csv Namespace Reference

Classes

struct  FeatureSet
 Result of parsing a feature-only CSV (no response column). More...
 

Functions

stats::DataPacket read (std::string const &filename)
 Read a CSV file into a DataPacket.
 
FeatureSet read_features_from_string (std::string const &content)
 Parse a feature-only CSV from an in-memory string.
 
stats::DataPacket read_sorted (std::string const &filename)
 Read a CSV file and sort rows ascending by the response column.
 
void write (stats::DataPacket const &data, std::string const &filename)
 Write a DataPacket to a CSV file (features followed by label, no header).
 

Function Documentation

◆ read()

stats::DataPacket ppforest2::io::csv::read ( std::string const & filename)

Read a CSV file into a DataPacket.

Assumes the last column is the response variable (group label as string) and all preceding columns are features. Categorical feature columns are automatically detected and integer-encoded. String labels are mapped to contiguous integer codes starting at 0.

Parameters
filenamePath to the CSV file.
Returns
A DataPacket containing the feature matrix and response vector.
Exceptions
std::runtime_errorIf the file is empty or has inconsistent columns.

◆ read_features_from_string()

FeatureSet ppforest2::io::csv::read_features_from_string ( std::string const & content)

Parse a feature-only CSV from an in-memory string.

The first row is the header (feature names); every subsequent row is one observation. There is no response column. Used by the serve subcommand to parse POST /predict request bodies — categorical encoding runs per-call, so callers must encode categoricals consistently with the training data.

Exceptions
UserErroron empty body, missing header, no data rows, or malformed shape.

◆ read_sorted()

stats::DataPacket ppforest2::io::csv::read_sorted ( std::string const & filename)

Read a CSV file and sort rows ascending by the response column.

Mode is detected from the y column's written form:

  • If any value carries fractional or scientific notation (., e, E), y is parsed as a continuous float response (regression shape) and group_names is empty.
  • Otherwise, y is mapped to integer codes in first-appearance order and group_names carries the original label strings (classification shape). This keeps integer-coded label CSVs like Wine and Glass on the classification path.

Rows are sorted ascending by the encoded y, which gives the training routines what they need: classification — contiguous groups; regression — y-ordered rows for ByCutpoint::init's median split.

Exceptions
UserErroron any failure (missing file, parse error, malformed shape) — CSV reading failures are user-facing by nature.

◆ write()

void ppforest2::io::csv::write ( stats::DataPacket const & data,
std::string const & filename )

Write a DataPacket to a CSV file (features followed by label, no header).

Parameters
dataThe DataPacket to write.
filenameOutput file path.