In this tutorial, we’ll create and train a machine learning model, or as we call
it, an AI Table or a predictor. By querying the model, we’ll predict the
probability of churn for new customers of a telecoms company.Make sure you have access to a working MindsDB installation, either locally or
at MindsDB Cloud.If you want to learn how to set up your account at MindsDB Cloud, follow
this guide. Another way is to set up
MindsDB locally using
Docker or
Python.Let’s get started.
There are a couple of ways you can get the data to follow through with this
tutorial.
You can connect to a demo database that we’ve prepared for you. It contains the data used throughout this tutorial (the example_db.demo_data.customer_churn table).
Pay Attention to the Queries
From now on, we’ll use the
files.churn file as a table. Make sure you replace it with
example_db.demo_data.customer_churn if you connect the data as a database.
We use the customer churn dataset, where each row is one customer, to predict
whether the customer is going to stop using the company products.Below is the sample data stored in the files.churn table.
It indicates whether the customer is a senior citizen (1) or not (0).
integer
Feature
Partner
It indicates whether the customer has a partner (Yes) or not (No).
character varying
Feature
Dependents
It indicates whether the customer has dependents (Yes) or not (No).
character varying
Feature
Tenure
Number of months the customer has been staying with the company.
integer
Feature
PhoneService
It indicates whether the customer has a phone service (Yes) or not (No).
character varying
Feature
MultipleLines
It indicates whether the customer has multiple lines (Yes) or not (No, No phone service).
character varying
Feature
InternetService
Customer’s internet service provider (DSL, Fiber optic, No).
character varying
Feature
OnlineSecurity
It indicates whether the customer has online security (Yes) or not (No, No internet service).
character varying
Feature
OnlineBackup
It indicates whether the customer has online backup (Yes) or not (No, No internet service).
character varying
Feature
DeviceProtection
It indicates whether the customer has device protection (Yes) or not (No, No internet service).
character varying
Feature
TechSupport
It indicates whether the customer has tech support (Yes) or not (No, No internet service).
character varying
Feature
StreamingTv
It indicates whether the customer has streaming TV (Yes) or not (No, No internet service).
character varying
Feature
StreamingMovies
It indicates whether the customer has streaming movies (Yes) or not (No, No internet service).
character varying
Feature
Contract
The contract term of the customer (Month-to-month, One year, Two year).
character varying
Feature
PaperlessBilling
It indicates whether the customer has paperless billing (Yes) or not (No).
character varying
Feature
PaymentMethod
Customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic)).
character varying
Feature
MonthlyCharges
The monthly charge amount.
money
Feature
TotalCharges
The total amount charged to the customer.
money
Feature
Churn
It indicates whether the customer churned (Yes) or not (No).
character varying
Label
Labels and FeaturesA label is a column whose values will be predicted (the y variable in simple
linear regression).A feature is a column used to train the model (the x variable in simple
linear regression).
Let’s create and train the machine learning model. For that, we use the
CREATE MODEL statement and specify the
input columns used to train FROM (features) and what we want to
PREDICT (labels).
Copy
Ask AI
CREATE MODEL mindsdb.customer_churn_predictorFROM files (SELECT * FROM churn)PREDICT Churn;
We use all of the columns as features, except for the Churn column, whose
values will be predicted.
You can make predictions by querying the predictor as if it were a table. The
SELECT statement lets you make predictions for the label
based on the chosen features.
MindsDB predicted the probability of this customer churning with confidence of
around 82%. The previous query predicted it with confidence of around 79%. So
providing more data improved the confidence level of predictions.