I’m working on a scheduling project and have so far been using linear programming tools. As I look down the road, I’m beginning to think that I’ll need to supplement or move beyond LP, perhaps with something like constraint programming (CP) techniques. Unfortunately, the CP arena seems to be littered with tools that are highly-spoken of but are no longer (or barely) supported. Anyone out there who can address CP tools?
So far, I’m leaning towards OscaR, because it seems fairly complete and it’s in Scala which is the Official General-Purpose Language of Thinkinator. I’ve looked at a variety of CP tools like Zinc (seems dead) and MiniZinc (seems barely alive), Geocode (C++), JaCoP (java-based), Choco (java-based), Comet (seems dead), etc. It’s a bit worrisome when I compare the level of activity and the apparent long-term stability of these projects to linear programming (Symphony, glpsol, lpsolv, etc).
Any experiences or insights on CP tools? (I’d prefer standalone tools with high-level syntax or Scala/Java-based tools, and not C++/Prolog-based tools.) Thanks!
There are many free online training opportunities, and many of them are reasonable experiences. For example, you can watch videos from Stanford on Youtube, and I’m sure other services come to mind. Today I want to recommend Coursera, which I’m pretty impressed with.
When I first heard of SSA (Singular Spectrum Analysis) and the EMD (Empirical Mode Decomposition) I though surely I’ve found a couple of magical methods for decomposing a time series into component parts (trend, various seasonalities, various cycles, noise). And joy of joys, it turns out that each of these methods is implemented in R packages:
In this posting, I’m going to document some of my explorations of the two methods, to hopefully paint a more realistic picture of what the packages and the methods can actually do. (At least in the hands of a non-expert such as myself.)
In a previous series of postings, I described a model that I developed to predict monthly electricity usage and expenditure for a condo association. I based my model on the average monthly temperature at a nearby NOAA weather station at Ronald Reagan Airport (DCA), because the results are reasonable and more importantly because I can actually obtain forecasts from NOAA up to a year out.
The small complication is that the NOAA forecasts cover three-month periods rather than single month: JFM (Jan-Feb-Mar), FMA (Feb-Mar-Apr), MAM (Mar-Apr-May), etc. So, in this posting, we’ll briefly describe how to turn a series of these overlapping three-month forecasts into a series of monthly approximations.
In Part 4 of this series, I created a Bayesian model in Stan. A member of the Stan team, Bob Carpenter, has been so kind as to send me some comments via email, and has given permission to post them on the blog. This is a great opportunity to get some expert insights! I’ll put his comments as block quotes:
That’s a lot of iterations! Does the data fit the model well?
I hadn’t thought about this. Bob noticed in both the call and the summary results that I’d run Stan for 300,000 iterations, and it’s natural to wonder, “Why so many iterations? Was it having trouble converging? Is something wrong?” The
stan command defaults to four chains, each with 2,000 iterations, and one of the strengths of
Stan‘s HMC algorithm is that the iterations are a bit slower than other methods but it mixes much faster so you don’t need nearly as many iterations. So 300,000 is a bit excessive. It turns out that if I run for 2,000 iterations, it takes 28 seconds on my laptop and mixes well. Most of the 28 seconds is taken up by compiling the model, since I can get 10,000 iterations in about 40 seconds.
So why 300,000 iterations? For the silly reason that I wanted lots of samples to make the CI’s in my plot as smooth as possible. Stan is pretty fast, so it only took a few minutes, but I hadn’t thought of implication of appearing to need that many iterations to converge.
Last time, we modeled the Association’s electricity expenditure using Bayesian Analysis. Besides the fact that MCMC and Bayesian are sexy and resume-worthy, what have we gained by using
Stan? MCMC runs more slowly than alternatives, so it had better be superior in other ways, and in this posting, we’ll look at an example of how. I’d recommend pulling the previous posting up in another browser window or tab, and position the “Inference for Stan model” table so that you can quickly consult it in the following discussion.
If you look closely at the numbers, you may notice that the high season (warmer-high, ratetemp 3, beta) appears to have a lower slope than the mid season (warmer-low, ratetemp 2, beta), as was the case in an earlier model. This seems backwards: the high season should cost more per additional kWh, and thus should have a higher slope. This raises two questions: 1) is the apparent slope difference real, and 2) if it is real, is there some real-world basis for this counter-intuitive result?
There’s a new R-related website out there: Rdocumentation, which gathers the documentation for most R packages (that are hosted on CRAN) in one place and allows you to search through it.
There are a couple major benefits to the site:
- It also provides a comments area where you can share thoughts, questions, and tips on various packages or even individual functions. (Though we’ll have to see if a critical mass builds for this.)
- If you search in your own R installation with ?? you will only search the packages you have installed. I’m a package junkie, so have a huge number installed, but it’s still nice to have a place where I can discover new packages more easily.
So check it out and make suggestions to the creators. This should really be rolled into CRAN at some point.