In a previous installment, we described a condo association that needed to know what its electricity budget should be for an upcoming budget. In this posting, we’ll develop a model for electricity utilization, leaving electricity expenses for the next installment.

I like to have pretty pictures above the fold, so let’s take a look at the data and the resulting model, all in one convenient and colorful graph. The graph shows each month’s average daily electricity usage (in kilowatt hours, kWh) versus the month’s average temperature at nearby Ronald Reagan Airport (DCA). Each month’s bill is a point at the center of the month name:

From the graph you can see that the buildings (four high-rises) have a point of minimal electricity usage when the month averages around 54 degrees (Fahrenheit), with increasing usage as the months average cooler or warmer. The steeper increase in warmer months is mainly due to the fact that the primary source of heat for the common areas is natural gas, not electricity. The blue line is the result of fitting a linear, mixed-effects model using lme4’s `lmer`

. So now that you’ve seen the attractive graph, let’s wrap back to the beginning of the process…

The first thing to do was to get the data. The general manager had filed away a couple of years of Dominion Virginia Power’s monthly bills, so I got copies and started entering them into a spreadsheet. Turns out the entire complex has only three monthly bills: one for “the pool area”, one for the two residential towers on the one side of the street (11 stories each, with three below-ground garage levels), and one for the two residential towers on the other side of the street (10 stories each, with two below-ground garage levels). So we won’t be able to dig too deeply, but it is what we have.

As in any real-world analysis, there was a lot of messy variability. First, the billing “months” vary from 28 days to 34 days, depending on when the meter reader comes by. Second, Dominion is regulated and that means they engage in an elaborate dance with the State Corporation Commission (SCC). For example, Dominion is allowed to raise rates on a provisional basis, and the SCC can later decide that the increase was not justified, forcing Dominion to issue refunds. It turns out that it’s very hard to credit refunds back to their originating months, so I chose to work with the average daily usage (monthly usage divided by the length of the billing period), and the average daily expenditures ignoring refunds.

I started recording just these few values into a spreadsheet, but as time went on other questions came up, and in the end I decided to go back and record every number on the bill: days, surcharges, fees, taxes, refunds, all of it. Each line of the spreadsheet totaled it all up so I could double-check against the bill total when I was done, to catch typos. I also found that the Dominion allowed you to establish an account online that would let you download PDFs of your bills, and that the Association had not set this up, so I worked with them to set it up, making the ongoing process much easier. (Also avoiding things like trying to guess a number that had a “PAID” stamp over it and was then photocopied.) The online data only goes back 18 months, though, so paper records were important.

All of those numbers from the bill are nice, but you have to be able to understand what they are. So the next step was to find the tariff that covers the Association. The Association has 42 above-ground floors, 5 below-ground floors, a pool, a gym, and lobby areas, and the bills are for these common areas. (Each owner is billed directly by Dominion for the usage in their condo.) This is a mid-sized infrastructure, which falls under Dominion’s GS-2, Intermediate General Service schedule. The two key points of this schedule are: 1) there is no daily or weekly peak/off-peak, but there is a peak season (Jun-Sep), and 2) this bill includes not only usage but also demand. (I did check if it would be advantageous to switch schedules, but Dominion’s analysis was that it would not.)

Most of us consumers are not familiar with “demand billing” because we’re only billed for usage. Demand is applicable to commercial customers and is a measure of the maximum amount of power they use in any 15-minute period during the month. That is, there’s an important difference between continuously using 150 watts for a month (roughly 720 hours) and using the equivalent 108,000 watts for one hour. Since Dominion has to build enough power-generating capacity to handle peaks and commercial customers can generate huge peaks, they’re allowed to bill commercial customers for their peak demand.

As a final step in the initial data gathering, we need to think of exogenous data. For usage, it makes sense that the temperature would have an effect and also the daylight. I got temperature data from NOAA, for Ronald Reagan airport (DCA), which is about four miles from the condos, though at a lower altitude and by a river.

The most explanatory graph I came up with is the one shown above, that illustrates the relationship between temperature and electricity usage. I created the model using `lmer`

, from the `R`

‘s `lme4`

package, with a formula:

`usage.lmer <- lmer (daily ~ 1 + (1 + I(dca - 54) | temp), data=elect5)`

where daily is the `daily`

average usage (bill usage divided by number of days in the bill), `dca`

is the month’s average temperature at DCA, and `temp`

is an indicator that indicates if the `dca`

is cooler or warmer than 54 degrees. This is essentially a piecewise linear regression which pools across the two pieces.

If you’re not familiar with `lmer`

and `R`

formulas, what the formula is setting up is a model with a global (season-independent) intercept, an intercept for each `temp`

(*warmer* and *cooler*), and a slope for each `temp`

, with the `dca`

centered at 54 degrees (Fahrenheit). The global intercept and the two temp-dependent slopes are statistically significant, while the temp-dependent intercepts are essentially zero, which fits well with theory and keeps things neat. (I didn’t remove the temp-dependent intercepts from the formula, which I could have done by making the second “1” a “0”, so that if I gather more data and the intercepts begin to move away from each other, I’ll see it.)

The season-independent baseline usage (at ~54 degrees) is roughly 6090 kWh per day: in cooler weather it increases roughly 66 kWh per degree cooler, and in warmer weather it increases roughly 137 kWh per degree warmer. The baseline would include the (indoor) lighting, various pumps, air circulation, elevators, etc, that run year-round regardless of weather. The MAPE of the in-sample forecast is 3.6%.

In the next part of the series, we’ll look at modeling and forecasting expenditures with a more complicated fixed-effects model.