Electricity Demand Modeling Based on Weather Data Using ML

Project data
Type
Authors
Status
Years
OKH Manifest	Download

Every Summer, due to the increased temperature, many regions including Tabriz city in Iran experience electricity shortages. Predicting the load (electricity consumption) would help utility providers as well as consumers better manage the system. One of the factors contributing to the electricity consumption is weather condition, especially in summer. In this project, I have tried to explain (and predict) load profile for Tabriz in Summer by weather data and some other features.

Methodology

There are various techniques and approaches to predict electricity consumption. These methods include, but are not to:

Time Series
Component Decomposition
Artificial Intelligence (ML, DL, RL)

In this project, I decided to use Supervised Machine Learning to try to predict the load profile.

Weather Data

Using Python and the API of www.weather.com I could extract hourly weather variables into an excel file. These variables included temperature, wind velocity, cloud cover, and relative humidity.

After retrieving raw data, I needed to clean them. Cleaning data comprised of looking for missing data, duplicates, invalid data, and outliers. Linear interpolation was used to fill the missing data. It's worth noting that all the cleaning took place in Excel.

Load Data

I, with cooperation of one of my friends, received load data for the specified time (90 days). It was a bunch of CSV files for 48 substations. Using Pandas library in python, I summed up all substation hourly load data and integrated them into an excel file.

Final Dataset

Now that both weather data and load data were ready, I brought them together in a 3rd excel file. The final file contained all the features: date, hour, temperature, load, etc.

Feature Engineering

After extraction, cleaning, and integration, I decided to add some features in order to help us better understand the relations hidden in data. First, I replaced date with week days. Then, labeled data based on their day; weekdays and weekends. Also, based on some quick visualizations, 24 hours of a day were grouped into 3 categories: 1. Night 2. Working Hour 3. Evening. The load profile showed quite distinct trends in these three times.

Visualization

Weather Data

*Data Distribution*
Temperature Distribution	Wind Velocity Distribution	Relative Humidity Distribution

The histograms in the Table above depicts the distribution of Temperature, wind velocity, and relative humidity. While the temperature distribution is quite Normal, those of wind velocity and relative humidity are more like Chi-squared. Although the temperature ranges from 11 to 39 degrees Celsius, it's in the interval 25-30 degrees most of the time.

Temperature is distributed quite largely and in some ranges it's ascending and the others it's descending. The relationship will become more clear later on with grouped plots.

There is a strong, inverse, linear relationship between relative humidity and temperature.

Demand

*Box Plots*
Demand Box plot Grouped by Hour	Demand Box Plot Grouped by Day

The box plots above demonstrate different patterns of demand when grouped by day type (Week Day/ Weekend) and hour type (Night, Work, Evening). The median demand in weekdays is expectedly higher and its IQR (Interquartile Range) is bigger as well. Similarly, the demand at night tends to be lower than that of work time. In the evenings, the demand is the highest; Although the offices are closed, people at home consumes electricity for cooling, watching TV, etc.

Wind Speed & Cloud Cover

Demand vs Wind Speed Scatter Plot	Demand vs Cloud Cover Scatter Plot

There seems to be no significant relationship between demand and neither wind speed nor cloud cover.

Demand vs Hour & Temperature

Demand vs Hour Scatter Plot	Demand vs Hour Grouped Scatter Plot	Demand vs Temperature Scatter Plot

According to the grouped scatter plot above, demand shows 3 different patterns during there groups. For example, at night, it's descending and less distributed. As we expected, the demand for electricity in summer has a strong relationship with temperature. It's mainly because of the widespread use of ACs.

Curve Fitting

A cubic polynomial is fitted to the average hourly data. The curve for the average temperature fits actual data pretty well. Due to the cooling demand, electricity consumption reaches its daily peak at around 2 p.m. Although the temperature continues rising, electricity demand drops. This occurs because most offices close at 2 p.m., which in turn reduces consumption.

Modelling

Due to the small size of data, I preferred to use MATLAB. It provides a tool box with a handy user interface. We simply pass the final excel file (after importing to MATLAB workspace) to the app. the data is divided into two datasets:

Training set
Test set

Only training set is used in modelling while test set is used to evaluate the performance of the model. I used 75 percent of the records for training and the remaining 25 percent for testing.

All the available algorithms in the toolbox are applied to the data. In the end, the performance (accuracy) of each of them is calculated. We focus on two criteria:

RMSE (Root Mean Square Error)
MAE (Mean Absolute Error)
R^2 (R-Squared)

Note that these are calculated based on TEST data so that no overfitting occurs. We want these indexes to be as small as possible.

Results

Model Name	RMSE	MAE	Model Name	RMSE	MAE
Linear Regression--Interactions Linear	15.9	11.7	Support Vector Machine--Medium Gaussian	12.2	8.5
Linear Regression--Linear	17.09	13.2	Support Vector Machine--Cubic	12.8	8.6
Linear Regression--Robust Linear	17.1	13.2	Support Vector Machine--Quadratic	13.3	9.4
Tree--Medium Tree	12.3	9.0	Support Vector Machine--Coarse Gaussian	15.8	12.0
Tree--Fine Tree	12.6	9.2	Support Vector Machine--Linear	17.2	13.2
Tree--Coarse Tree	13.6	9.9	Gaussian Process Regression--Exponential	11.5	8.1
Ensemble--Bagged Trees	12.0	8.5	Gaussian Process Regression--Matern 5/2	11.5	8.1
Ensemble--Boosted Trees	13.2	9.7	Gaussian Process Regression--Rational Quadratic	11.6	8.1
Stepwise Linear Regression--Stepwise Linear	15.1	11.4	Gaussian Process Regression--Squared Exponential	11.7	8.2

The table above summarizes the results. It's evident that GPR (Gaussian Process Regression), Ensemble-Bagged Trees, SVR (Support Vector Machine) have done quite good job predicting the demand.

The tool box provides hyper parameter optimisation as well. So, I optimised GPR in order to reach even smaller numbers for RMSE (and MAE). The result is as follows:

RMSE	10.5
MAE	7.3
R^2	0.92

R-squared shows how well the model is fit to the data. It's value ranges from 0 to 1. The closer to 1, the better the model explains data.

The plot shows that the points are concentrated about the red line. The red line represents all points for which the Actual demand is equal to the Predicted demand by the model.

Future Work

Since we are dealing with Timeseries data, We could also add features for delay; temp(t-1), temp(t-2), Demand(t-1), etc. This may improve the model performance since the temperature and demand may have delayed effect.

Page data
Keywords	Machine Learning, Weather, Data, Electricity, Demand, Prediction, Model, Regression, Supervised Learning, Visualization, curve fitting
SDG
Authors
License	CC-BY-SA-4.0
Organizations	Amirkabir University of Technology (Tehran Polytechnic)
Language	English (en)
Related	0 subpages, 1 pages link here
Views	58 page views (analytics)
Created	September 11, 2022 by MIRALI GHASSEMI
Last edit	January 8, 2026 by MetadescriptionsBot
Cite as	MirAli Ghassemi (2022–2026). "Electricity Demand Modeling Based on Weather Data Using ML". Appropedia. Retrieved July 17, 2026.
API queries	basic, semantic, html, files, more