ModelSelectorResult

class hcrystalball.model_selection.ModelSelectorResult(best_model, cv_results, cv_data, model_reprs, partition, X_train, y_train, frequency, horizon, country_code_column, best_model_rank)[source]

Bases: object

Consolidate infromation/methods from cross validation for 1 time series

Store all relevant information about model selection and provide utility methods (e.g. plot_model_performance) and data (e.g. df_plot) for easier access to further insights.

Parameters
  • best_model (sklearn compatible estimator) – best model found during model selection

  • cv_results (pandas.DataFrame) – cv_results of sklearn.model_selection.GridSearchCV in form of DataFrame

  • cv_data (pandas.DataFrame) – data with models predictions, cv split indication and true target values

  • model_reprs (dict) – dictionary of model representations used in model selection in form of {model_hash : model_repr}

  • partition (dict) – dictionary indicating for which part of the data the model selection results belong to e.g. {“Region”:”Canada”, “Product”:”Chips”}

  • X_train (pandas.DataFrame) – training data features

  • y_train (pandas.Series) – training data target

  • frequency (str) – temporal frequency of data on which the model was trained / selected

  • horizon (int) – how many steps ahead predictions were made

  • country_code_column (str) – Name of the column with ISO code of country/region, which can be used for supplying holiday. e.g. ‘State’ with values like ‘DE’, ‘CZ’ or ‘Region’ with values like ‘DE-NW’, ‘DE-HE’, etc.

Attributes Summary

cv_splits_overlap

Indicator for cv_splits overlap in training data

df_plot

Training data suitable for plotting.

Methods Summary

persist([attribute_name, path])

Persist whole object or particular object attributes

plot_error(**plot_params)

Plot model absolute error during model selection

plot_result([plot_from])

Plot model performance from given plot_from timestamp

Attributes Documentation

cv_splits_overlap

Indicator for cv_splits overlap in training data

Returns

Whether cv_splits in training data contain overlap

Return type

bool

df_plot

Training data suitable for plotting.

Utility, that prepares data from model selection to be used for further model performance analysis

Returns

Data suitable for plotting

Return type

pandas.DataFrame

Methods Documentation

persist(attribute_name=None, path='')[source]

Persist whole object or particular object attributes

Parameters
  • attribute_name (str) – Name of the attribute to be stored - stores whole object

  • path (str) – Where to store the object or object attribute Creates file named as {partition_hash}.{attribute_name} by default at current working directory

Raises

ValueError – If attribute not a valid option. Lists available ones

plot_error(**plot_params)[source]

Plot model absolute error during model selection

Parameters

plot_params (kwargs) – plotting parameters passed down to pandas.DataFrame.plot() dependent on your plotting backend e.g. figsize = (16,9), title = 'Performance of Model'

Returns

plot depending on your plotting backend, by default plot from matplotlib

Return type

pandas.DataFrame.plot()

plot_result(plot_from=None, **plot_params)[source]

Plot model performance from given plot_from timestamp

Parameters
  • plot_from (str) – date from which to show actuals, cv_forecast and forecast, Default behavior does not filter dates

  • plot_params (kwargs) – plotting parameters passed down to pandas.DataFrame.plot() dependent on your plotting backend e.g. figsize = (16,9), title = 'Performance of Model'

Returns

plot depending on your plotting backend, by default plot from matplotlib

Return type

pandas.DataFrame.plot()