partition_data_by_values

hcrystalball.model_selection.partition_data_by_values(df, column, partition_values, default_df=None)[source]

Partition data by one column and a fixed set ov values within that column.

If a value is not present, optionally provide default data for the partition.

Parameters
  • df (pandas.DataFrame) – Data to be partitioned

  • column (str) – column with values to partition by

  • partition_values (list) – values to partition by

  • default_df (pandas.DataFrame) – data to be used as default in case value is not present

Returns

Partition dictionary with keys:

  • labelsTuple of dictionaries whose keys are the column names

    and values are the actual values in the column

  • data : Tuple of pandas.DataFrame objects holding the subset of the data with

Return type

dict