partition_data¶

hcrystalball.model_selection.partition_data(df, partition_by)[source]¶

Partition data by values found in one or more columns.

For each of the selected columns the unique values will be determined and a selection will be made for each element in the cross product of the unique values.

Parameters

df (pandas.DataFrame) – Data to be partitioned
partition_by (list) – Column names to partition by

Returns

Partition dictionary with keys:

labelsTuple of dictionaries whose keys are the column names
and values are the actual values in the column
data : Tuple of pandas.DataFrame objects holding the subset of the data with

Return type