hcrystalball.model_selection.select_model(df, target_col_name, partition_columns, grid_search, parallel_over_dict=None, frequency=None, country_code_column=None)[source]

Find the best model from grid_search for each partition

  • df (pandas.DataFrame) – Data to be used for model selection

  • target_col_name (str) – name of target column

  • partition_columns (list) – List of column names in the input dataframe to be used for partitioning the data.

  • grid_search (sklearn.model_selection.GridSearchCV) – Instance compatible with Sklearn GridSearchCV

  • parallel_over_dict (dict) – Dictionary with the partition label used for parallelization (e.g. {‘Region’:’Africa’})

  • frequency (str) – Frequency at which model selection was done. It is mainly for bookeeping purposes and later used to instantiate result objects

  • country_code_column (str) – Name of the column with ISO code of country/region, which can be used for supplying holiday. e.g. ‘State’ with values like ‘DE’, ‘CZ’ or ‘Region’ with values like ‘DE-NW’, ‘DE-HE’, etc.


List of ModelSelectorResults containing all important information for each partition

Return type