Deep Learning Dude pt 1

You’ve probably noticed that Deep Learning is all the rage right now. AlphaGo has beaten the world champion at Go, you can google cat photos and be sure you won’t accidentally get photos of canines, and many other near-miraculous feats: all enabled by Deep Learning with neural nets. (I am thinking of coining the phrase “laminar learning” to add some panache to old-school non-deep learning.)

I do a lot of my work in R, and it turns out that not one but two R packages have recently been released that enable R users to use the famous Python-based deep learning package, Keras: keras and kerasR.


Continue reading

BRMS continues its streak

As you probably know, I’m a big fan of R’s brms package, available from CRAN. In case you haven’t heard of it, brms is an R package by Paul-Christian Buerkner that implements Bayesian regression of all types using an extension of R’s formula specification that will be familiar to users of lm, glm, and lmer. Under the hood, it translates the formula into Stan code, Stan translates this to C++, your system’s C++ compiler is used to compile the result and it’s run.


brms is impressive in its own right. But also¬†impressive is how it continues to add capabilities and the breadth of Buerkner’s vision for it. I last posted something way back on version 0.8, when brms gained the ability to do non-linear regression, but now we’re up to version 1.1, with 1.2 around the corner. What’s been added since 0.8, you may ask? Here are a few highlights:

Continue reading

Donoho nails Data Science! Mostly.

Just a quick note: In his recent (when I wrote this but neglected to publish it) paper, 50 Years of Data Science, David Donaho pretty much nails key foundations of Data Science and how it’s different from (just) Statistics or even (just) Machine Learning. I highly recommend that you read it.

It’s full of great quotes like this:

“… In those less-hyped times, the skills being touted today were unnecessary. Instead, scientists developed skills to solve the problem they were really interested in, using elegant mathematics and powerful quantitative programming environments modeled on that math. Those environments were the result of 50 or more years of continual refinement, moving ever closer towards the ideal of enabling immediate translation of clear abstract thinking to computational results.

“The new skills attracting so much media attention are not skills for better solving the real problem of inference from data; they are coping skills for dealing with organizational artifacts of large-scale cluster computing. …”

Great stuff.

BRMS 0.8 adds non-linear regression

Just a quick posting following up on the brms/rstanarm posting. In brms 0.8, they’ve added non-linear regression. Non-linear regression is fraught with peril, and when venturing into that realm you have to worry about many more issues than with linear regression. It’s not unusual to hit roadblocks that prevent you from getting answers. (Read the Wikipedia links Non-linear regression and Non-linear least squares to get an idea.)

BRMS Diagnostic
Continue reading

R Users Will Now Inevitably Become Bayesians

There are several reasons why everyone isn’t using Bayesian methods for regression modeling. One reason is that Bayesian modeling requires more thought: you need pesky things like priors, and you can’t assume that if a procedure runs without throwing an error that the answers are valid. A second reason is that MCMC sampling — the bedrock of practical Bayesian modeling — can be slow compared to closed-form or MLE procedures. A third reason is that existing Bayesian solutions have either been highly-specialized (and thus inflexible), or have required knowing how to use a generalized tool like BUGS, JAGS, or Stan. This third reason has recently been shattered in the R world by not one but two packages: brms and rstanarm. Interestingly, both of these packages are elegant front ends to Stan, via rstan and shinystan.

This article describes brms and rstanarm, how they help you, and how they differ.

BRMS Diagnostic 1
Continue reading

Map Projections

The Earth is round, and maps are flat. That’s a problem for map makers. And a source of endless entertainment for geeks.

World Map

Carlos A. Furuti has an excellent website with many projections and clear explanations of the tradeoffs of each. The main projection page has links to all types, including two of my favorites: Other Interesting Projections, and Projections on 3D Polyhedra. Enjoy!

In R, the packages maps and mapproj are your entrée to this world. I created the above map (a Mollweide projection, which is a useful favorite), with:

library (maps)
library (mapproj)

map ("world", projection="mollweide", regions="", wrap=TRUE, fill=TRUE, col="green")
map.grid (labels=FALSE, nx=36, ny=18)

Real Data Science pt1: Review of Numbersense

So far, when I’ve written on Data Science topics I’ve written about the fun part: the statistical analysis, graphs, conclusions, insights, etc. For this next series of postings, I’m going to concentrate more on what we can call Real Data Science®: the less glamorous side of the job, where you have to beat your data and software into submission, where you don’t have access to the tools or data you need, and so on. In other words, where you spend the vast majority of your time as a Data Scientist.

I’ll start the series with a review of Kaiser Fung’s Numbersense, published in 2013. It’s not mainly about Real Data Science, but I’ll start with it because it’s a great book that illustrate several common data pitfalls, and in the epilogue Kaiser shares one of his own Real Data Science stories and I found myself nodding my head and saying, “Yup, that’s how I spent several days in the last couple of weeks!”

Continue reading