How to bring StatsForecast Models to MindsDB
Before creating a model, you will need to create an ML engine for StatsForecast using theCREATE ML_ENGINE
statement:
CREATE MODEL
statement to create the StatsForecast model in MindsDB.
USING
clause at the end.
Example
Let’s go through an example of how to use Nixtla’s StatsForecast with MindsDB to forecast monthly expenditures. Please note that before using the StatsForecast engine, you should create it from the MindsDB editor, or other clients through which you interact with MindsDB, with the below command:Tutorial using SQL
In this tutorial, we create a model to predict expenditures based on historical data using the StatsForecast engine. We use a table from our MySQL public demo database, so let’s start by connecting MindsDB to it:historical_expenditures
table stores monthly expenditure data for various categories, such as food
, clothing
, industry
, and more.
Let’s create a model table to predict the expenditures:
Please visit our docs on the
CREATE MODEL
statement to learn more.WINDOW
clause is not required because StatsForecast automatically calculates the best window as part of hyperparameter tuning.
The ENGINE
parameter in the USING
clause specifies the ML engine used to make predictions.
We can check the training status with the following query:
complete
, the behavior is the same as with any other AI table – you can query for batch predictions by joining it with a data table:
historical_expenditures
table is used to make batch predictions. Upon joining the quarterly_expenditure_forecaster
model with the historical_expenditures
table, we get predictions for the next quarter as defined by the HORIZON 3
clause.
Please note that the output month
column contains both the date and timestamp. This format is used by default, as the timestamp is required when dealing with the hourly frequency of data.
MindsDB provides the LATEST
keyword that marks the latest training data point. In the WHERE
clause, we specify the month > LATEST
condition to ensure the predictions are made for data after the latest training data point.
Let’s consider our quarterly_expenditure_forecaster
model. We train the model using data until the third quarter of 2017, and the predictions come for the fourth quarter of 2017 (as defined by HORIZON 3
).
Tutorial using MQL
In this tutorial, we create a model to predict expenditures based on historical data using the StatsForecast engine. Before we start, visit our docs to learn how to connect Mongo Compass and Mongo Shell to MindsDB. We use a collection from our Mongo public demo database, so let’s start by connecting MindsDB to it from Mongo Compass or Mongo Shell:historical_expenditures
collection stores monthly expenditure data for various categories, such as food
, clothing
, industry
, and more.
Let’s create a model to predict the expenditures:
Please visit our docs on the
insertOne
statement to learn more.window
clause is not required because StatsForecast automatically calculates the best window as part of hyperparameter tuning.
The engine
parameter in the training_options
clause specifies the ML engine used to make predictions.
We can check the training status with the following query:
complete
, the behavior is the same as with any other AI collection – you can query for batch predictions by joining it with a data collection:
historical_expenditures
collection is used to make batch predictions. Upon joining the quarterly_expenditure_forecaster
model with the historical_expenditures
collection, we get predictions for the next quarter as defined by the horizon: 3
clause.
Please note that the output month
column contains both the date and timestamp. This format is used by default, as the timestamp is required when dealing with the hourly frequency of data.
MindsDB provides the latest
keyword that marks the latest training data point. In the where
clause, we specify the month > latest
condition to ensure the predictions are made for data after the latest training data point.
Let’s consider our quarterly_expenditure_forecaster
model. We train the model using data until the third quarter of 2017, and the predictions come for the fourth quarter of 2017 (as defined by horizon: 3
).
StatsForecast + HierarchicalForecast
The StatsForecast handler also supports hierarchical reconciliation via Nixtla’s HierarchicalForecast package. Hierarchical reconciliation may improve prediction accuracy when the data has a hierarchical structure. In this example, there may be a hierarchy as total expenditure is comprised of 7 different categories.food
rises in October 2017, it may be more likely that spending on cafes
also rises in October 2017. Hierarchical reconciliation can account for this shared information.
Here is how we can create a model:
CREATE MODEL
statement creates, trains, and deploys the model. Here, we predict the expenditure
column values. As it is a time series model, we order the data by the month
column. Additionally, we group data by the category
column - the predictions are made for each group independently (here, for each category). The HORIZON
clause defines for how many rows the predictions are made (here, for the next 3 rows).
You can use the DESCRIBE [MODEL]
command to check for details: