In a previous installment, we described a condo association that needed to know what its electricity budget should be for an upcoming budget. In this posting, we’ll develop a model for electricity utilization, leaving electricity expenses for the next installment.
I like to have pretty pictures above the fold, so let’s take a look at the data and the resulting model, all in one convenient and colorful graph. The graph shows each month’s average daily electricity usage (in kilowatt hours, kWh) versus the month’s average temperature at nearby Ronald Reagan Airport (DCA). Each month’s bill is a point at the center of the month name:
This is the story of a condominium association in Northern Virginia, which was in the midst of transforming their budget. In the early days of the association they didn’t have an operational cash reserve built up, so they had to make budget categories a bit oversized, “just in case”. As time went on, they saved the cash from their over-estimates, and eventually arrived at the place where they could set tighter budgets and depend on their cash savings if the budget were exceeded.
In the process of reviewing various categories, they came to “Utilities”, which lumped water, sewage, gas, and electricity all together. They decided to break it down into individual utilities, but when they started with electricity, no one actually knew how much was used nor how much it cost. The General Manager had dutifully filed away monthly bills from Dominion, but didn’t have a spreadsheet. They needed a data analyst, and I was all over it.
In the next four or five postings, I want to show some of the details of my investigation of electricity expenses. I hope it will be an interesting look at the kinds of things that happen in the real world, not just textbooks. Oh, by the way, it turns out that electricity was the single largest operating expense of the association. Here’s a graph of average expenditures, brought up to the present:
The reshape function in R is very useful, but poorly documented. Wingfeet, over at Wiekvoet, has written a well-illustrated article on reshape that steps through SAS’ transpose examples and shows how that’s done with R’s reshape.
This is a quick introduction to using R, the R package `knitr`, and the markdown language to automatically generate monthly reports.
Let’s say I track electricity usage and expenses for a client. I initially spent a lot of time analyzing and making models for the data, but at this point I’m now in more of a maintenance mode and simply want an updated report to email each month. So, when their new electricity bill arrives I scrape the data into R, save the data in an R workspace, and then use knitr to generate a simplistic HTML report.
A Brief Tour of Tree Packages in R
Good news about R is it has thousands of packages. Bad news about R is that it has thousands of packages, and often it’s difficult to know which wavelet packages, for example, do what.
On his blog, Wes has provided a concise, illustrated tour of the major tree/forest packages in R. Go and gain clarity.
Everyone’s interested in the global climate these days, so I’ve been looking at the GISTEMP temperature series, from the GISS (NASA’s Goddard Institute for Space Studies). I was recently analyzing the data and it turned into an interesting data forensics operation that I hope will inspire you to dig a little deeper into your data.
Let’s start with the data. GISS has a huge selection of data for the discerning data connoisseur, so which to choose? The global average seems too coarse — the northern and southern hemispheres are out of phase and dominated by different geography. On the other hand, the gridded data is huge and requires all kinds of spatially-saavy processing to be useful. (We may go there in a future post, but not today.) So let’s start with the two hemisphere monthly average datasets, which I’ll refer to as GISTEMP NH and GISTEMP SH.
To be specific, these time series are GISTEMP LOTI (Land Ocean Temperature Index) which means that they cover both land and sea. GISS has land-only data and combines this with NOAA’s sea-only data from ERSST (Extended Reconstructed Sea Surface Temperature). I’d also point out that all of the temperature data I’ll use is measured as an anomaly from the average temperature over the years 1951-1980, which was approximately 14 degrees Celsius (approximately 57 degrees Farenheit). So let’s plot the GISTEMP LOTI NH and SH data and see what we have.