ModelSelectorResult¶

class hcrystalball.model_selection.ModelSelectorResult(best_model, cv_results, cv_data, model_reprs, partition, X_train, y_train, frequency, horizon, country_code_column, best_model_rank)[source]¶

Bases: object

Consolidate infromation/methods from cross validation for 1 time series

Store all relevant information about model selection and provide utility methods (e.g. plot_model_performance) and data (e.g. df_plot) for easier access to further insights.

Parameters

best_model (sklearn compatible estimator) – best model found during model selection
cv_results (pandas.DataFrame) – cv_results of sklearn.model_selection.GridSearchCV in form of DataFrame
cv_data (pandas.DataFrame) – data with models predictions, cv split indication and true target values
model_reprs (dict) – dictionary of model representations used in model selection in form of {model_hash : model_repr}
partition (dict) – dictionary indicating for which part of the data the model selection results belong to e.g. {“Region”:”Canada”, “Product”:”Chips”}
X_train (pandas.DataFrame) – training data features
y_train (pandas.Series) – training data target
frequency (str) – temporal frequency of data on which the model was trained / selected
horizon (int) – how many steps ahead predictions were made
country_code_column (str) – Name of the column with ISO code of country/region, which can be used for supplying holiday. e.g. ‘State’ with values like ‘DE’, ‘CZ’ or ‘Region’ with values like ‘DE-NW’, ‘DE-HE’, etc.

Attributes Summary

`cv_splits_overlap`	Indicator for cv_splits overlap in training data
`df_plot`	Training data suitable for plotting.

Methods Summary

`persist`([attribute_name, path])	Persist whole object or particular object attributes
`plot_error`(**plot_params)	Plot model absolute error during model selection
`plot_result`([plot_from])	Plot model performance from given `plot_from` timestamp

Attributes Documentation

cv_splits_overlap¶

Indicator for cv_splits overlap in training data

Returns: Whether cv_splits in training data contain overlap
Return type: bool

df_plot¶

Training data suitable for plotting.

Utility, that prepares data from model selection to be used for further model performance analysis

Returns: Data suitable for plotting
Return type: pandas.DataFrame

Methods Documentation

persist(attribute_name=None, path='')[source]¶

Persist whole object or particular object attributes

Parameters

attribute_name (str) – Name of the attribute to be stored - stores whole object
path (str) – Where to store the object or object attribute Creates file named as {partition_hash}.{attribute_name} by default at current working directory

Raises

ValueError – If attribute not a valid option. Lists available ones

plot_error(**plot_params)[source]¶

Plot model absolute error during model selection

Parameters: plot_params (kwargs) – plotting parameters passed down to pandas.DataFrame.plot() dependent on your plotting backend e.g. figsize = (16,9), title = 'Performance of Model'
Returns: plot depending on your plotting backend, by default plot from matplotlib
Return type: pandas.DataFrame.plot()

plot_result(plot_from=None, **plot_params)[source]¶

Plot model performance from given plot_from timestamp

Parameters

plot_from (str) – date from which to show actuals, cv_forecast and forecast, Default behavior does not filter dates
plot_params (kwargs) – plotting parameters passed down to pandas.DataFrame.plot() dependent on your plotting backend e.g. figsize = (16,9), title = 'Performance of Model'

Returns

plot depending on your plotting backend, by default plot from matplotlib

Return type

pandas.DataFrame.plot()