partition_data

hcrystalball.model_selection.partition_data(df, partition_by)[source]

Partition data by values found in one or more columns.

For each of the selected columns the unique values will be determined and a selection will be made for each element in the cross product of the unique values.

Parameters
  • df (pandas.DataFrame) – Data to be partitioned

  • partition_by (list) – Column names to partition by

Returns

Partition dictionary with keys:

  • labelsTuple of dictionaries whose keys are the column names

    and values are the actual values in the column

  • data : Tuple of pandas.DataFrame objects holding the subset of the data with

Return type

dict