There are several reasons why everyone isn’t using Bayesian methods for regression modeling. One reason is that Bayesian modeling requires more thought: you need pesky things like priors, and you can’t assume that if a procedure runs without throwing an error that the answers are valid. A second reason is that MCMC sampling — the bedrock of practical Bayesian modeling — can be slow compared to closed-form or MLE procedures. A third reason is that existing Bayesian solutions have either been highly-specialized (and thus inflexible), or have required knowing how to use a generalized tool like BUGS, JAGS, or Stan. This third reason has recently been shattered in the R world by not one but two packages:
rstanarm. Interestingly, both of these packages are elegant front ends to Stan, via
This article describes
rstanarm, how they help you, and how they differ.
Longitudinal Structural Equation Modeling, Todd D. Little, Guilford Press 2013.
Let me start by saying that this is one of the best textbooks I’ve ever read. It was written as if the author was our mentor, and I really get the feeling that he’s sharing his wisdom with us rather than trying to be pedagogically correct. The book is full of insights on how he thinks about building and applying SEMs, and the lessons he’s learned the hard way.
I’ve just discovered a unique app on the Mac App Store called Calca. It’s like a simple word-processor, except you can define variables and functions and do arithmetic with them, and it understands units and currencies and it handles matrices and vectors, and supports basic Markdown, and … it’s pretty amazing.
There are many free online training opportunities, and many of them are reasonable experiences. For example, you can watch videos from Stanford on Youtube, and I’m sure other services come to mind. Today I want to recommend Coursera, which I’m pretty impressed with.
In a previous series of postings, I described a model that I developed to predict monthly electricity usage and expenditure for a condo association. I based my model on the average monthly temperature at a nearby NOAA weather station at Ronald Reagan Airport (DCA), because the results are reasonable and more importantly because I can actually obtain forecasts from NOAA up to a year out.
The small complication is that the NOAA forecasts cover three-month periods rather than single month: JFM (Jan-Feb-Mar), FMA (Feb-Mar-Apr), MAM (Mar-Apr-May), etc. So, in this posting, we’ll briefly describe how to turn a series of these overlapping three-month forecasts into a series of monthly approximations.
I’m always intrigued by techniques that have cool names: Support Vector Machines, State Space Models, Spectral Clustering, and an old favorite Hidden Markov Models (HMM’s). While going through some of my notes, I stumbled onto a fun experiment with HMM’s where you feed a bunch of English text into a two-state HMM and it will (tend to) discover what letters are vowels.
I did a quick Google search on “Stata for R users” (both as separate words and as a quoted phrase) and there really isn’t much out there. At best, there are a couple of equivalence guides that show you how to do certain tasks in both programs. (Plus a whole lot of “R for (ex-) Stata users” articles.) I’m writing this post, as a long-term R user who recently bought Stata, because I believe that Stata is a good complement to R, and many R users should consider adding it to their toolbox.
I’m going to write this in two parts. Part one will describe why an R user might be interested in Stata — with various Stata examples. Part Two will give specific tips and warnings to R users who do decide to use Stata.