STAT 5307 – Time Series Analysis Methods

STAT 5307 – Time Series Analysis Methods
Forecasting the Electricity Price to Compare in the Pennsylvania Market Using SARIMA
Modelling Methods
John C. Henderson, Rachel Salaiz, Wyatt Wu
UHD MDA program
Author Note
Contact information:
hendersonj44@gator.uhd.edu
wyattwu1@gmail.com
rachelsalaiz@gmail.com
2
Introduction
Electric utility companies in Pennsylvania publish a Utility Price to Compare, the
“PTC”, on a calendar cycle depending on the utility. This is the rate at which residential
customers can buy electricity from the utility. Alternatively, customers can shop for themselves
from other competitive retail electricity providers. To determine the PTC, the utilities conduct
RFPs (Requests for Pricing) asking wholesale suppliers to bid on a weighted average of
electricity blocks of different durations of 1, 3, 6 and 12 months and possibly longer.
Natural gas is taken as a proxy for electricity prices since most of the power generated in
PA is natural gas-fired and wholesale electricity prices are highly correlated with natural gas
prices. Our hypothesis is that the PTC can be modelled as a weighted lagged moving average
of historical natural gas forward prices, Heating Degree Days (HDD), Cooling Degree Days
(CDD), and natural gas storage levels. HDD and CDD are used as indicators of electricity and
natural gas demand (since people use more energy when it is hot or cold) and natural gas storage
is used as an indicator for natural gas supply (since natural gas is more abundant when storage
levels are high). Degree days are based on the assumption that when the outside temperature is
65°F, we don’t need heating or cooling to be comfortable. Degree days are the difference
between the daily temperature mean, (high temperature plus low temperature divided by two)
and 65°F. If the temperature mean is above 65°F, we subtract 65 from the mean and the result is
Cooling Degree Days. If the temperature mean is below 65°F, we subtract the mean from 65 and
the result is Heating Degree Days.
If the hypothesis is true then a short-term forecast of the PTC can be generated which
would be useful for electricity retailers planning sales and marketing campaigns in PA. In this
3
report we focus on the Metropolitan Edison (METED) electricity price and we employ time
series methods to model the PTC.
The largest Electric Utilities in PA are Metropolitan Edison, Duquesne Light,
Pennsylvania Electric, Pennsylvania Power, PPL Electric, PECO Energy and West Penn Power.
Figure 1 shows a map of the Electric Utility service areas in PA, and Figure 2 shows the service
area for METED.
Figure 1. Electric utility service areas in Pennsylvania
4
Figure 2. Metropolitan Edison Service Territory
Organization of the Report
We first briefly explore the research literature on timeseries modelling of electricity and
natural gas prices. We then describe the data sets, explore, transform, and select a subset of
potential predictor variables. Next, we fit a “best” univariate SARIMA model to the METED
series and use the model to make a prediction with confidence bands.
We then fit a lagged regression on a predictor index, PRIndex1, comprising natural gas,
HDD, CDD and storage variables and model the error term using a SARIMA model. The lagged
regression SARIMA model is examined for fit and used to make a prediction for the next few
months. We observe that since the PRIndex1 is composed of lagged variables it should be
possible to predict the METED price using actual observed values for the variables over the
lagged period. Beyond this time-frame it is necessary to make a prediction of the PRIndex1 and
to do this we construct a univariate SARIMA model for PRIndex1 as input to the regression
model.
We conclude with some summary thoughts and potential next steps.
5
Review of Literature
A review of the literature shows that most applications of time-series models have been
towards forecasts of electricity and gas demand. Mauro et al. (2015) give an example of
applications of ARIMA models to electricity demand in Italy. There are some references to
short-term price modelling of electricity prices using ARIMA and wavelet-ARIMA methods.
Carpio et al. (2012), for example, discuss applications to short-term electricity prices in
Singapore, and Jin et al. (2015) discuss applications of wavelet, artificial neural networks and
time-series methods for modelling prices for Henry Hub natural gas in the United States. No
mention was found in the literature of models for the PTC in retail electricity markets.
Exploratory Data Analysis
In this section we describe the data and perform preliminary analysis of the variables in
the data set to identify time trends and correlations and suggested transformations prior to model
fitting.
Data and Sources
Four data sets are used for this analysis. The data is collected from 2012 to 2017. The files
used are as follows and contain the following information:
• MNGprices.csv: averaged monthly natural gas 1, 3, 6 and 12 month forward prices (source:
Nymex natural gas futures exchange)
• PAHDDCDD.csv: monthly HDD and CDD data for PA (source: http://www.degreedays.net/
)
• PAPTC2.csv: monthly Electric Utility posted electricity price to compare (i.e. PTC) (source:
PA utility web sites – see http://www.papowerswitch.com/ for links to websites)
6
• ngstorage.csv: Weekly natural gas storage levels (Weekly EIA storage report
http://ir.eia.gov/ngs/ngs.html )
The data was loaded into R and converted into monthly time series for the analysis. No missing
observations were detected.
Plots and transformations
Figure 3. shows a plot of the Utility electricity prices (PTCs) against natural gas forward
prices. The METED series is highlighted in bold. Note that the METED price is reset
approximately every 3 months reflecting the quarterly calendar cycle of the RFP process whereas
natural gas prices change every month. Inspecting the graphs, it appears that natural gas leads
METED by about 6 months. Figure 4 shows METED versus natural gas lagged by 6 months
which seems to confirm the observation.
The stacked time plots in Figure 5 provide another look at this relationship. Gas lagged 6
months seems to line up with METED spikes and natural gas spikes lagged 6 months seems to
line up with high HDD (lagged 6) if storage (lagged 3) is low. This suggests that lagged natural
gas, HDD and storage are related to spikes and regression with ARIMA error term may be a
possible model.
7
Figure 3. Electric Utility price to compare (PTC) versus natural gas prices
Figure 4. Electric Utility price to compare (PTC) versus natural gas prices
8
Figure 5. Stacked chart of power and gas prices and lagged HDD, NG, storage
The CCF of METED versus natural gas, shown in Figure 6, also indicates a lag of about 5
or 6 months. For this analysis we selected 5 as the appropriate lag for the natural gas time series.
Figure 6. CCF of METED versus 3 and 6 month natural gas
9
The scatter plot analysis shown in Figure 7 shows reasonable correlations between
METED and most of the variables. Note the highly skewed nature of HDD and CDD. For the
analysis we decided to create HDD and CDD dummy variables equal to 1 only for very high
values and zero otherwise. The reasoning is that only very high or very cold temperatures are
likely to contribute to exceptionally high demand and spikes in electricity and gas price.
Figure 7. Scatterplot and correlation analysis for METED and predictors
Time Domain Analysis
In this section we develop a univariate SARIMA model and lagged regression with
ARIMA modelled error for the METED Series. We perform diagnostics on each of the models
and use the models to make predictions for the METED PTC.
10
Univariate SARIMA Model for METED Series
We first fit a univariate SARIMA model to the METED series. To do this we took the
following steps. We first examined the series for stationarity and applied a first difference. The
differenced METED series appears to be stationary as shown in Figure 8.
Figure 8. Plot of METED and diff(METED)
The ACFs and PACFs, shown in Figure 9. were then examined to establish potential orders
(p,d,q) x (P,D,Q)[s] for the SARIMA model. Examination of the ACF and PACF indicates
possible seasonal and within season patterns.
• Seasonal: ACF cuts at D=18, Tails in PACF.
• Within Season: Tails in both.
This suggests potential models with d=1, D=0, s=18, p as high as 6, q as high as 6, P = 0, Q=1.
11
Figure 9. ACF and PACF of diff(METED)
Several models were attempted, and we settled on SARIMA (6,1,3) x (0,0,1) [18] as the best
fit model. Models were eliminated if the estimated parameters were not significant or
diagnostics did not meet ACF, normality, and Ljung-Box p-value tests. AIC was used as the tie
breaker between candidate models. The diagnostics were examined for the final model, shown in
Figure 10., and looked reasonable.
Figure 10. Diagnostics for the best fit model, SARIMA (6,1,3) x (0,0,1) [18]
12
The model was used to predict the METED PTC 12 months forward and this is shown in
Figure 11. Note that this model predicts a near-term drop in the METED PTC. Overall this
model feels somewhat unsatisfactory hence the reason for trying a SARIMA lagged
regression model discussed next.
Figure 11. Six Month prediction for METED PTC using univariate SARIMA model
SARIMA Lagged Regression Model for the METED Series
We now examine whether we can fit a better model to the METED series using lagged
regression with ARIMA modelled error term. To do this we first look at the distribution of the
potential predictor variables of natural gas prices, HDD, CDD and natural gas storage to see if
transformations are required in the predictors prior to running the linear regression. After
transforming variables, HDD and CDD, we fit a linear regression model on the lagged variables
13
and select the “best” model using stepwise forward and backward variable selection, including
interaction terms. We simplify the regression model by combing the selected predictors into a
prediction index, PRIndex1, and we fit a SARIMA regression model for METED using
PRIndex1 and an ARIMA modelled error term. The simple regression fit and the SARIMA
regression fit are compared to show the improvement to fit resulting from the inclusion of the
ARIMA error term.
To perform predictions a univariate SARIMA model is fit to the PRIndex1 series and
predictions from this model are used as input to the SARIMA regression model for METED.
Additionally, since the PRIndex1 is composed of lagged variables, we know the actual value of
PRIndex1 for the lag period and we use this fact as input to the SARIMA regression model to
predict METED for the next 3 months.
Exploratory Data Analysis of Regression Predictor Variables
As a first step prior to fitting a simple linear regression we examine the distribution of the
natural gas, HDD, CDD and storage variables for any large deviations from normality and
required transformations. Figure 11 summarizes this analysis. All the variables have reasonable
distributions except for HDD and CDD which are highly skewed. For this reason, we take the
square root of HDD and CDD and create indicator variables for HDD and CDD that take on a
value of 1 for very high values and 0 otherwise and use these dummy variables in the regression.
The natural gas price variables, g1, g3, g6 and g12 and HDD variables are lagged 5 months as
discussed previously, and CDD and NG Store are lagged 3 months.
14
Figure 11. Distributions for input variables for the regression analysis (left to right):
METED, g1, g3, g6, g12, HDD, CDD, NG Store
Fitting the Linear Regression Model
The potential predictors, gas prices and HDD lagged 5 months, CDD and natural gas
storage lagged 3 months, and the HDD and CDD high-level dummy indicators were regressed
against METED. Stepwise forward and backward variable selection using the R step() function
was used to select predictors with significant coefficients. The resulting “best” linear regression
is shown below in Exhibit 1. Note that the regression has a reasonably high adjusted R2 of 0.74.
15
The coefficients for this regression are then applied to the regression predictors to construct a
single predictor index PRIndex1 for the METED series. This will allow us to fit a univariate
SARIMA model to PRIndex1 for predicting PRIndex1 as an input to the SARIMA regression on
METED. The linear regression of METED versus PRIndex1 is shown in Exhibit 2. As
expected the intercept is the same as in Exhibit 1 and the coefficient for PRIndex1 is equal to 1.
Exhibit 1. Linear regression fit of significant predictors and interaction terms vs METED
Call: lm(formula = METED ~ lg1 + lg12 + lHDD + lCDD + ldHDD + ldCDD +

lHDD:lnstor + lg12:ldHDD + lg1:lnstor + lg1:lg12 + lg12:lCDD +
+ lg12:ldCDD, data = dat, na.action = NULL)
lg1:lCDD

Residuals:

Min 1Q Median
0.03086
3Q
0.30736
Max
1.01775
-1.04859 -0.33131

Coefficients:

Estimate
1.154e+01
-2.904e+00
-1.786e+00
3.098e-01
-3.150e-01
-2.963e+00
3.761e+00
Std.Error t value Pr(>|t|)
(Intercept)
lg1
lg12
lHDD
lCDD
ldHDD
ldCDD
1.160e+00
4.955e-01
4.309e-01
5.518e-02
8.876e-02
9.614e-01
1.726e+00
1.697e-05
2.821e-01
1.025e-04
9.808e-02
3.532e-02
1.886e-02
5.041e-01
9.949 1.24e-13 ***
-5.860 3.20e-07 ***
-4.144 0.000126 ***
5.615 7.76e-07 ***
-3.549 0.000831 ***
-3.082 0.003284 **
2.178 0.033931 *
-6.448 3.75e-08 ***
2.948 0.004777 **
5.946 2.35e-07 ***
5.106 4.76e-06 ***
3.586 0.000742 ***
-2.798 0.007194 **
-1.982 0.052828 .
lHDD:lnstor -1.094e-04
lg12:ldHDD
lg1:lnstor
lg1:lg12
lg12:lCDD
lg1:lCDD
lg12:ldCDD
8.316e-01
6.097e-04
5.008e-01
1.266e-01
-5.278e-02
-9.989e-01

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4999 on 52 degrees of freedom
Multiple R-squared: 0.7892, Adjusted R-squared: 0.7365 F-statistic: 14.97
on 13 and 52 DF, p-value: 3.031e-13
16
Exhibit 2. Linear regression fit of PRIndex1 against METED
Call: lm(formula = y ~ PRindex1, data = dat, na.action =
NULL) Residuals:

Min 1Q Median 3Q
0.30995
Max
1.01597
-1.04683 -0.33077 0.02896

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.54945 0.26425 43.71 <2e-16 ***
PRindex1 1.00036 0.06463 15.48 <2e-16 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4506 on 64 degrees of freedom
Multiple R-squared: 0.7892, Adjusted R-squared: 0.7859 F-statistic:
239.6 on 1 and 64 DF, p-value: < 2.2e-16
The residual diagnostics look reasonable but the ACF, and PACF indicate possible AR(1)
and MA(1) structures. An ARIMA (1,0,1) model for the residuals was found to fit best and the
diagnostics are shown in Figure 12.
Figure 12. Regression residuals diagnostics
17
Figure 13. Diagnostics for regression residual errors modelled as ARIMA (1,0,1)
18
The final SARIMA lagged regression fit for METED versus PRindex1 is shown in
Exhibit 13. In Figure 14 we show how the fit for the linear regression and the SARIMA
regression. Note how the modelled error term provides a better fit to the METED data.
Exhibit 13. SARIMA lagged regression fit for METED vs PRindex1 with ARIMA
(1,0,1) error model.
Call: arima(x = dat2$METED, order = c(1, 0, 1), xreg =
dat2$PRindex1)
Coefficients:

ar1
0.8675
ma1 intercept xreg
-0.4603 10.5835 0.7801
s.e. 0.1073 0.2077 0.4695 0.1047
sigma^2 estimated as 0.1542: log likelihood = -32.27, aic = 72.54

Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.005247754 0.3921202 0.3106539 NaN Inf 0.04734422 -0.02645303
Figure 14. Fit of the linear regression and SARIMA regression to the METED series
19
Model Predictions
Univariate SARIMA Model for PRIndex1 and 12-Month Forecast for METED
To predict METED using the SARIMA regression model we need to provide a prediction
of PRIndex1 to the regression model. To do this we first construct a univariate SARIMA model
for PRIndex1. Exhibit 14. shows the ARIMA (1,0,0) model that was selected, and Figure 15
shows the diagnostics for the model fit. The twelve-month forecast for PRIndex1 is shown in
Figure 16. Note how the model forecasts a general upward trend in PRIndex1.
The resulting METED 12 months forecast, and 95% confidence bands are also shown in
Figure 16. Note that the mean forecast is upward trending, but the confidence band is very wide.
For this analysis the confidence bands were approximated by adding the prediction errors of the
PRIndex1 forecast to the prediction error for the METED SARIMA errors since the predictor
index is uncertain and not a know input to the regression. This probably overestimates the error
band since METED and PRIndex1 are unlikely to be perfectly correlated (i.e. var(x+y) <=
var(x)+var(y)).
Exhibit 14. ARIMA (1,0,0) model fit to the PRIndex1 predictor series
Call:
sigma^2 estimated as 0.3504: log likelihood = -59.4, aic = 124.81
t table

Estimate SE t.value p.value
ar1
xmean
0.7163 0.0833 8.5998
-4.0282 0.2479 -16.2525
0
0

20
Figure 15. Diagnostics for ARIMA (1,0,0) fit to PRIndex1
Figure 16. ARIMA (1,0,0) 12 month forecast for PRIndex1 and METED
21
Regression Forecast for METED Using Known Lagged PRIndex1 values
Since the SARIMA regression uses lagged natural gas, HDD, CDD and storage values,
these values are already known in advance for the lag period. These known values can be used
to construct a deterministic value for PRIndex1 that can be used to forecast 3-months of METED
with smaller confidence bands since PRIndex1 is known. Figure 17. shows the model forecasts
using the known PRindex1 values. This is very valuable from a sales and marketing perspective
because even a 3-month advance forecast of the PTC can significantly improve the return on
marketing investments. Note from the figure that the forecast shows METED PTC prices
increasing in the near-term to between $6 and $7.5/Mwhr. These numbers look reasonable and
follow the expected winter increase observed over the past years. Overall, the SARIMA lagged
regression model appears to be producing more reasonable forecasts than the univariate
SARIMA model for METED.
Figure 17. Forecast of METED using known lagged values of PRIndex1
22
Conclusion
These results are very encouraging. As of the time of this writing, the December
PTC for METED was published at $6.816 whereas the SARIMA regression model predicted
$6.74 which is quite close. More importantly, the model correctly predicted the upward trend in
prices. It appears that it may be possible to create near-term forecasts for the METED PTC using
a relatively small set of easily obtained predictors on natural gas, temperatures and natural gas
storage levels. This insight can help retail electricity company marketing and sales departments
better plan the timing of marketing campaigns and budget expenditures.
Next steps include applying other time series modelling techniques such as frequency
domain models and multivariate impulse response methods to refine and extend the model. A
back-test of the model against a hold-out sample should also be conducted to assess the model
predictive accuracy. Further application of the model includes extending the analysis to the other
PA PTC markets besides METED and even to utility PTCs in other states.
Also, recall that for this analysis we used natural gas forward prices as a proxy for
electricity wholesale prices. The model accuracy can probably be improved by using electricity
forward prices instead of natural gas forward prices and this should be considered.
References
Mauro, B., Petrella, L., (2015) Multiple seasonal cycles forecasting model: the Italian electricity
demand. Statistical Methods & Applications.
Kristine Joy E. Carpió, Anne Marie L. Go, Camiiie Krisca M. Ronca (2012) Forecasting
DayAhead Electricity Prices of Singapore through ARIMA and Wavelet ARIMA. DLSU
Business & Economics Review
Junghwan Jin, Jinsoo Kim, (2015) Forecasting Natural Gas Prices Using Wavelets, Time Series,
and Artificial Neural Network. Public Library of Science.
23
Shumway, R.H., Stoffer, D.S. (2017) Time Series Analysis and its Applications. Springer,
Fourth Edition.
Cryer, Johnathan D., Chan, Kung-Sik, (2008) Time Series Analysis with Applications in R.
Springer, Second Edition.
Sheather, Simon J., (2009) A Modern Approach to Regression with R. Springer.
Pennsylvania Public Utilities Commission, (August 2017) Electric Power Outlook for
Pennsylvania, 2016-2021.
Pennsylvania Power to Choose website: http://www.papowerswitch.com/
Pennsylvania Public Utilities Commission website: http://www.puc.state.pa.us/
Weekly EIA natural gas storage report: http://ir.eia.gov/ngs/ngs.html
Natural gas futures prices: http://www.cmegroup.com/trading/energy/natural-gas/naturalgas.html
Historical information on heating degree days (HDD) and cooling degree days (CDD):
http://www.degreedays.net/
24

WhatsApp
Hello! Need help with your assignments?

For faster services, inquiry about  new assignments submission or  follow ups on your assignments please text us/call us on +1 (251) 265-5102

🛡️ Worried About Plagiarism? Run a Free Turnitin Check Today!
Get peace of mind with a 100% AI-Free Report and expert editing assistance.

X