get_sales_data¶
- hcrystalball.utils.get_sales_data(n_dates=100, n_assortments=2, n_states=3, n_stores=3)[source]¶
Load subset of Rossmann store sales dataset.
This function loads a subset of the Rossmann store sales dataset from https://www.kaggle.com/c/rossmann-store-sales with the 100 stores with the highest sales overall. The data is for stores in Germany, in the date range
2015-04-23
to2015-07-31
.The data is returned as a
pandas.DataFrame
:Date
- DataFrame index, date of recorded sales numbersStore
- a unique Id for each storeSales
- the turnover for any given day (this is what you are predicting)Open
- an indicator for whether the store was open: 0 = closed, 1 = openPromo
- indicates whether a store is running a promo on that daySchoolHoliday
- indicates if the (Store, Date) was affected by the closure of public schoolsStoreType
- differentiates between 4 different store models: a, b, c, dAssortment
- describes an assortment level: a = basic, b = extra, c = extendedPromo2
- Promo2 is a continuing and consecutive promotion for some stores: 0 = store is not participating, 1 = store is participatingState
- String code for state in Germany that the store is in (see https://en.wikipedia.org/wiki/States_of_Germany)HolidayCode
- theState
prefixed withDE-
.
The
Assortment
,State
andStore
serve as data partitioning columns.HolidayCode
will provide country specific holidays for the givenDate
.Open
,Promo
,Promo2
andSchoolHoliday
serve as exogenous variables.Sales
is the target column we will predict.- Parameters
Example
>>> get_sales_data() Store Sales Open Promo SchoolHoliday StoreType Assortment Promo2 State HolidayCode Date 2015-04-23 906 8162 True False False a a False HE DE-HE 2015-04-23 251 16573 True False False a c False NW DE-NW 2015-04-23 320 13114 True False False a c False SH DE-SH 2015-04-23 335 11189 True False False b a True NW DE-NW 2015-04-23 336 10184 True False False a a False HE DE-HE ... ... ... ... ... ... ... ... ... ... ... 2015-07-31 817 23093 True True True a a False BE DE-BE 2015-07-31 831 15152 True True True a a False NW DE-NW 2015-07-31 906 15131 True True True a a False HE DE-HE 2015-07-31 586 17879 True True True a c False NW DE-NW 2015-07-31 251 22205 True True True a c False NW DE-NW
- Returns
Rossmann store sales subset, see description above.
- Return type
- Raises
ValueError – Error is raised if the number of assortments is higher than what dataset holds, if there are less than requested number of states within any assortment, or if there are not enough valid combinations of number of assortments, states and stores.