Demand vs Temp. scatter plot

Every Summer, due to the increased temperature, many regions including Tabriz city in Iran experience electricity shortages. Predicting the load (electricity consumption) would help utility providers as well as consumers better manage the system. One of the factors contributing to the electricity consumption is weather condition, especially in summer. In this project, I have tried to explain (and predict) load profile for Tabriz in Summer by weather data and some other features.

Methodology[edit | edit source]

There are various techniques and approaches to predict electricity consumption. These methods include, but are not to:

  • Time Series
  • Component Decomposition
  • Artificial Intelligence (ML, DL, RL)

In this project, I decided to use Supervised Machine Learning to try to predict the load profile.

Weather Data[edit | edit source]

Using Python and the API of I could extract hourly weather variables into an excel file. These variables included temperature, wind velocity, cloud cover, and relative humidity.

After retrieving raw data, I needed to clean them. Cleaning data comprised of looking for missing data, duplicates, invalid data, and outliers. Linear interpolation was used to fill the missing data. It's worth noting that all the cleaning took place in Excel.

Load Data[edit | edit source]

I, with cooperation of one of my friends, received load data for the specified time (90 days). It was a bunch of CSV files for 48 substations. Using Pandas library in python, I summed up all substation hourly load data and integrated them into an excel file.

Final Dataset[edit | edit source]

Now that both weather data and load data were ready, I brought them together in a 3rd excel file. The final file contained all the features: date, hour, temperature, load, etc.

Feature Engineering[edit | edit source]

After extraction, cleaning, and integration, I decided to add some features in order to help us better understand the relations hidden in data. First, I replaced date with week days. Then, labeled data based on their day; weekdays and weekends. Also, based on some quick visualizations, 24 hours of a day were grouped into 3 categories: 1. Night 2. Working Hour 3. Evening. The load profile showed quite distinct trends in these three times.

Visualization[edit | edit source]

Weather Data[edit | edit source]

Data Distribution
Temperature Distribution
Wind Velocity Distribution
Relative Humidity Distribution

The histograms in the Table above depicts the distribution of Temperature, wind velocity, and relative humidity. While the temperature distribution is quite Normal, those of wind velocity and relative humidity are more like Chi-squared. Although the temperature ranges from 11 to 39 degrees Celsius, it's in the interval 25-30 degrees most of the time.

Temperature vs Hour Scatter Plot

Temperature is distributed quite largely and in some ranges it's ascending and the others it's descending. The relationship will become more clear later on with grouped plots.

Humidity vs Temperature Scatter Plot

There is a strong, inverse, linear relationship between relative humidity and temperature.

Demand[edit | edit source]

Box Plots
Demand Box plot Grouped by Hour
Demand Box Plot Grouped by Day

The box plots above demonstrate different patterns of demand when grouped by day type (Week Day/ Weekend) and hour type (Night, Work, Evening). The median demand in weekdays is expectedly higher and its IQR (Interquartile Range) is bigger as well. Similarly, the demand at night tends to be lower than that of work time. In the evenings, the demand is the highest; Although the offices are closed, people at home consumes electricity for cooling, watching TV, etc.

Wind Speed & Cloud Cover
Demand vs Wind Speed Scatter Plot
Demand vs Cloud Cover Scatter Plot

There seems to be no significant relationship between demand and neither wind speed nor cloud cover.

Demand vs Hour & Temperature
Demand vs Hour Scatter Plot
Demand vs Hour Grouped Scatter Plot
Demand vs Temperature Scatter Plot

According to the grouped scatter plot above, demand shows 3 different patterns during there groups. For example, at night, it's descending and less distributed. As we expected, the demand for electricity in summer has a strong relationship with temperature. It's mainly because of the widespread use of ACs.

Curve Fitting[edit | edit source]

Fitted Curves

A cubic polynomial is fitted to the average hourly data. The curve for the average temperature fits actual data pretty well. Due to the cooling demand, electricity consumption reaches its daily peak at around 2 p.m. Although the temperature continues rising, electricity demand drops. This occurs because most offices close at 2 p.m., which in turn reduces consumption.

Modelling[edit | edit source]

Due to the small size of data, I preferred to use MATLAB. It provides a tool box with a handy user interface. We simply pass the final excel file (after importing to MATLAB workspace) to the app. the data is divided into two datasets:

  1. Training set
  2. Test set

Only training set is used in modelling while test set is used to evaluate the performance of the model. I used 75 percent of the records for training and the remaining 25 percent for testing.

All the available algorithms in the toolbox are applied to the data. In the end, the performance (accuracy) of each of them is calculated. We focus on two criteria:

  1. RMSE (Root Mean Square Error)
  2. MAE (Mean Absolute Error)
  3. R^2 (R-Squared)

Note that these are calculated based on TEST data so that no overfitting occurs. We want these indexes to be as small as possible.

Results[edit | edit source]

Model Name RMSE MAE Model Name RMSE MAE
Linear Regression--Interactions Linear 15.9 11.7 Support Vector Machine--Medium Gaussian 12.2 8.5
Linear Regression--Linear 17.09 13.2 Support Vector Machine--Cubic 12.8 8.6
Linear Regression--Robust Linear 17.1 13.2 Support Vector Machine--Quadratic 13.3 9.4
Tree--Medium Tree 12.3 9.0 Support Vector Machine--Coarse Gaussian 15.8 12.0
Tree--Fine Tree 12.6 9.2 Support Vector Machine--Linear 17.2 13.2
Tree--Coarse Tree 13.6 9.9 Gaussian Process Regression--Exponential 11.5 8.1
Ensemble--Bagged Trees 12.0 8.5 Gaussian Process Regression--Matern 5/2 11.5 8.1
Ensemble--Boosted Trees 13.2 9.7 Gaussian Process Regression--Rational Quadratic 11.6 8.1
Stepwise Linear Regression--Stepwise Linear 15.1 11.4 Gaussian Process Regression--Squared Exponential 11.7 8.2

The table above summarizes the results. It's evident that GPR (Gaussian Process Regression), Ensemble-Bagged Trees, SVR (Support Vector Machine) have done quite good job predicting the demand.

The tool box provides hyper parameter optimisation as well. So, I optimised GPR in order to reach even smaller numbers for RMSE (and MAE). The result is as follows:

RMSE 10.5
MAE 7.3
R^2 0.92

R-squared shows how well the model is fit to the data. It's value ranges from 0 to 1. The closer to 1, the better the model explains data.

Actual vs Predicted Data

The plot shows that the points are concentrated about the red line. The red line represents all points for which the Actual demand is equal to the Predicted demand by the model.

Future Work[edit | edit source]

Since we are dealing with Timeseries data, We could also add features for delay; temp(t-1), temp(t-2), Demand(t-1), etc. This may improve the model performance since the temperature and demand may have delayed effect.

Discussion[View | Edit]

Cookies help us deliver our services. By using our services, you agree to our use of cookies.