select_model_general¶

hcrystalball.model_selection.select_model_general(df, grid_search, target_col_name, frequency, partition_columns=None, parallel_over_columns=None, executor=None, include_rules=None, exclude_rules=None, country_code_column=None, output_path='', persist_cv_results=False, persist_cv_data=False, persist_model_reprs=False, persist_best_model=False, persist_partition=False, persist_model_selector_results=False)[source]¶

Run cross validation on data and select best model

Best models are selected for each timeseries and if wanted persisted.

Parameters

df (pandas.DataFrame) – Container holding historical data for training
grid_search (sklearn.model_selection.GridSearchCV) – Preconfigured grid search definition which determines which models and parameters will be tried
target_col_name (str) – Name of target column
frequency (str) – Temporal frequency of data. Data with different frequency will be resampled to this frequency.
partition_columns (list, tuple) – Column names based on which the data should be split up / partitioned
parallel_over_columns (list, tuple) – Subset of partition_columns, that are used to parallel split.
executor (prefect.executors) – Provide prefect’s executor. Only valid when parallel_over_columns is set. For more information see https://docs.prefect.io/api/latest/engine/executors.html
include_rules (dict) – Dictionary with keys being column names and values being list of values to include in the output.
exclude_rules (dict) – Dictionary with keys being column names and values being list of values to exclude from the output.
country_code_column (str) – Name of the column with country code, which can be used for supplying holiday (i.e. having gridsearch with HolidayTransformer with argument country_code_column set to this one).
output_path (str) – Path to directory for storing the output, default behavior is current working directory
persist_cv_results (bool) – If True cv_results of sklearn.model_selection.GridSearchCV as pandas df will be saved as pickle for each partition
persist_cv_data (bool) – If True the pandas df detail cv data will be saved as pickle for each partition
persist_model_reprs (bool) – If True model reprs will be saved as json for each partition
persist_best_model (bool) – If True best model will be saved as pickle for each partition
persist_partition (bool) – If True dictionary of partition label will be saved as json for each partition
persist_model_selector_results (bool) – If True ModelSelectoResults with all important information will be saved as pickle for each partition

Returns

List of ModelSelectorResult

Return type

list