run_model_selection¶

hcrystalball.model_selection.run_model_selection(df, grid_search, target_col_name, frequency, partition_columns, parallel_over_columns, include_rules=None, exclude_rules=None, country_code_column=None, output_path='', persist_cv_results=False, persist_cv_data=False, persist_model_reprs=False, persist_best_model=False, persist_partition=False, persist_model_selector_results=True, visualize_success=False, executor=None)[source]¶

Run parallel cross validation on data and select best model

Best models are selected for each timeseries and if wanted persisted.’

Parameters

df (pandas.DataFrame) – Container holding historical data for training
grid_search (sklearn.model_selection.GridSearchCV) – Preconfigured grid search definition which determines which models and parameters will be tried
target_col_name (str) – Name of target column
frequency (str) – Temporal frequency of data. Data with different frequency will be resampled to this frequency.
partition_columns (list, tuple) – Column names based on which the data should be split up / partitioned
parallel_over_columns (list, tuple) – Subset of partition_columns, that are used to parallel split.
include_rules (dict) – Dictionary with keys being column names and values being list of values to include in the output.
exclude_rules (dict) – Dictionary with keys being column names and values being list of values to exclude from the output.
country_code_column (str) – Name of the column with country code, which can be used for supplying holiday (i.e. having gridsearch with HolidayTransformer with argument country_code_column set to this one)
output_path (str) – Path to directory for storing the output, default is cwd
persist_cv_results (bool) – If True cv_results of sklearn.model_selection.GridSearchCV as pandas df will be saved as pickle for each partition
persist_cv_data (bool) – If True the pandas df detail cv data will be saved as pickle for each partition
persist_model_reprs (bool) – If True model reprs will be saved as json for each partition
persist_best_model (bool) – If True best model will be saved as pickle for each partition
persist_partition (bool) – If True dictionary of partition label will be saved as json for each partition
persist_model_selector_results (bool) – If True ModelSelectoResults with all important information will be saved as pickle for each partition
visualize_success (bool) – If True, generate graph of task completion
executor (prefect.engine.executors) – Provide prefect’s executor. For more information see https://docs.prefect.io/api/latest/engine/executors.html

Returns

flow itself
state of computations

Return type

prefect.Flow, prefect.engine.state.State