run_model_selection

hcrystalball.model_selection.run_model_selection(df, grid_search, target_col_name, frequency, partition_columns, parallel_over_columns, include_rules=None, exclude_rules=None, country_code_column=None, output_path='', persist_cv_results=False, persist_cv_data=False, persist_model_reprs=False, persist_best_model=False, persist_partition=False, persist_model_selector_results=True, visualize_success=False, executor=None)[source]

Run parallel cross validation on data and select best model

Best models are selected for each timeseries and if wanted persisted.’

Parameters
  • df (pandas.DataFrame) – Container holding historical data for training

  • grid_search (sklearn.model_selection.GridSearchCV) – Preconfigured grid search definition which determines which models and parameters will be tried

  • target_col_name (str) – Name of target column

  • frequency (str) – Temporal frequency of data. Data with different frequency will be resampled to this frequency.

  • partition_columns (list, tuple) – Column names based on which the data should be split up / partitioned

  • parallel_over_columns (list, tuple) – Subset of partition_columns, that are used to parallel split.

  • include_rules (dict) – Dictionary with keys being column names and values being list of values to include in the output.

  • exclude_rules (dict) – Dictionary with keys being column names and values being list of values to exclude from the output.

  • country_code_column (str) – Name of the column with country code, which can be used for supplying holiday (i.e. having gridsearch with HolidayTransformer with argument country_code_column set to this one)

  • output_path (str) – Path to directory for storing the output, default is cwd

  • persist_cv_results (bool) – If True cv_results of sklearn.model_selection.GridSearchCV as pandas df will be saved as pickle for each partition

  • persist_cv_data (bool) – If True the pandas df detail cv data will be saved as pickle for each partition

  • persist_model_reprs (bool) – If True model reprs will be saved as json for each partition

  • persist_best_model (bool) – If True best model will be saved as pickle for each partition

  • persist_partition (bool) – If True dictionary of partition label will be saved as json for each partition

  • persist_model_selector_results (bool) – If True ModelSelectoResults with all important information will be saved as pickle for each partition

  • visualize_success (bool) – If True, generate graph of task completion

  • executor (prefect.engine.executors) – Provide prefect’s executor. For more information see https://docs.prefect.io/api/latest/engine/executors.html

Returns

  • flow itself

  • state of computations

Return type

prefect.Flow, prefect.engine.state.State