Customer Expenditure Prediction Application Using Linear Regression
In this article, I will share about how to create a website application that can be used to predict customer expenses from e-commerce. Previously, I had made a similar model on Jupyter’s Notebook and I summarized it in my previous Medium article in Bahasa Indonesia: Pengenalan Linear Regresi dan Implementasi Dengan Python.
This website application uses the Python programming language by utilizing a library called Streamlit. This open source library can help the process of making Machine Learning-based applications a ready-to-use website. The source code for this application can be seen through the following repositories: Ecommerce-Customers-Prediction-Apps.
Install Streamlit
To use streamlit the first thing to do is install the library in the Anaconda Environment. Open Anaconda Prompt and type this command:
pip install streamlit
Once installed, try running streamlit with the command:
streamlit hello
Then a new tab will open in the default browser showing the streamlit website application page.
Creating Applications
After successfully installing Streamlit and running it successfully, the next step is to create a customer expense prediction application.
- Preparation
Create a new folder, I will create it in my Documents with the name Ecommerce-Customer-Prediction-App.
After that enter the dataset from the previous article into the folder. Then open the folder to a text editor.
- Code
Next is to create a program. I created a python file named main.py.
Import Libraries
To create an application the first thing to do is import the libraries that will be needed in the application.
# Import Librariesimport streamlit as st
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
Next, create a title for the application.
st.title("Predicting Customer Spent")
To see the results, open anaconda prompt and open the applications folder then write the command:
streamlit run main.py
The application will open in the browser and will display the application title.
Load Dataset
Next create a function to open the dataset file in the application and display a preview of the dataset. Save and see the result in the browser.
# load dataset
def load_dataset():
df = pd.read_csv('Ecommerce Customers.csv')
return df# show dataset
st.header("Dataset")
st.write(df)
Drop unnecessary columns like Email, Address, and Avatar.
df.drop([‘Email’, ‘Address’, ‘Avatar’], axis=1, inplace=True)
Tune Parameter
The parameters to be set are test_size and random_state which will be created using the slider.
st.header(“Tune Parameters”)
test_size = st.slider(‘Test Size’, 0.1, 0.5, 0.1)
random_state = st.slider(‘Random State’, 0, 200, 1)
Splitting Data Into Training Set and Testing Set
The dataset will be divided into two types, training and testing sets, where the parameters will be based on the results of the previous parameter tune.
X = df.drop(‘Yearly Amount Spent’, axis=1)y = df[‘Yearly Amount Spent’]X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=random_state)
Training & Testing Model
To create a machine learning model and be able to conduct training and testing, create the following function:
# training
def train_model(X_train, y_train):
linear_regression = LinearRegression()
linear_regression.fit(X_train, y_train)
return linear_regression# testing
def testing_model(model, X_test, y_test):
y_pred = model.predict(X_test)
return y_pred# training model
model = train_model(X_train, y_train)# testing model
y_pred = testing_model(model, X_test, y_test)
Model Evaluation
To see how good the model is, an evaluation function is needed to assess how good the model is.
# evaluation
def evaluate(y_pred, y_test):
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)
e = ['MAE', 'RMSE', 'R-Squared']
eval = pd.DataFrame([mae, rmse, r2], index=e, columns=['Score'])
return eval# show evaluation
st.header(‘Model Performance’)
eval = evaluate(y_pred, y_test)
st.write(eval)
Predict New Data
The model has been created and we know how good it is. Now is the prediction of new data. The model will predict the results based on the input of each attribute using a slider.
Then the results can be viewed via the “Show Results” button.
st.header(‘Predict New Value’)avg_session_length = st.slider(‘Average Session Length’, 0, 90, 1)
time_app = st.slider(‘Time on App’, 0, 90, 1)
time_web = st.slider(‘Time on Web’, 0,90,1)
length_member = st.slider(‘Length of Membership’, 0,10,1)# prediksi
predictions = model.predict([[avg_session_length, time_app, time_web, length_member]])if (st.button(“Show Result”)):
st.header("This predicted Amount Spent: ${}".format(int(predictions)))
The presence of a library such as streamlit makes it easier for data analysts and data scientists with no experience or ability to implement their machine learning models on a website. Streamlit is able to provide interactive features for “tweaking” models and visualization which is quite good. Although there are still shortcomings, the developers and streamlit community will continue to develop it further in the future.
Thank you for taking the time to read this article. Hope it can be useful.