select_model_general

hcrystalball.model_selection.select_model_general(df, grid_search, target_col_name, frequency, partition_columns=None, parallel_over_columns=None, executor=None, include_rules=None, exclude_rules=None, country_code_column=None, output_path='', persist_cv_results=False, persist_cv_data=False, persist_model_reprs=False, persist_best_model=False, persist_partition=False, persist_model_selector_results=False)[source]

Run cross validation on data and select best model

Best models are selected for each timeseries and if wanted persisted.

Parameters
  • df (pandas.DataFrame) – Container holding historical data for training

  • grid_search (sklearn.model_selection.GridSearchCV) – Preconfigured grid search definition which determines which models and parameters will be tried

  • target_col_name (str) – Name of target column

  • frequency (str) – Temporal frequency of data. Data with different frequency will be resampled to this frequency.

  • partition_columns (list, tuple) – Column names based on which the data should be split up / partitioned

  • parallel_over_columns (list, tuple) – Subset of partition_columns, that are used to parallel split.

  • executor (prefect.executors) – Provide prefect’s executor. Only valid when parallel_over_columns is set. For more information see https://docs.prefect.io/api/latest/engine/executors.html

  • include_rules (dict) – Dictionary with keys being column names and values being list of values to include in the output.

  • exclude_rules (dict) – Dictionary with keys being column names and values being list of values to exclude from the output.

  • country_code_column (str) – Name of the column with country code, which can be used for supplying holiday (i.e. having gridsearch with HolidayTransformer with argument country_code_column set to this one).

  • output_path (str) – Path to directory for storing the output, default behavior is current working directory

  • persist_cv_results (bool) – If True cv_results of sklearn.model_selection.GridSearchCV as pandas df will be saved as pickle for each partition

  • persist_cv_data (bool) – If True the pandas df detail cv data will be saved as pickle for each partition

  • persist_model_reprs (bool) – If True model reprs will be saved as json for each partition

  • persist_best_model (bool) – If True best model will be saved as pickle for each partition

  • persist_partition (bool) – If True dictionary of partition label will be saved as json for each partition

  • persist_model_selector_results (bool) – If True ModelSelectoResults with all important information will be saved as pickle for each partition

Returns

List of ModelSelectorResult

Return type

list