So far, when I’ve written on Data Science topics I’ve written about the fun part: the statistical analysis, graphs, conclusions, insights, etc. For this next series of postings, I’m going to concentrate more on what we can call Real Data Science®: the less glamorous side of the job, where you have to beat your data and software into submission, where you don’t have access to the tools or data you need, and so on. In other words, where you spend the vast majority of your time as a Data Scientist.
I’ll start the series with a review of Kaiser Fung’s Numbersense, published in 2013. It’s not mainly about Real Data Science, but I’ll start with it because it’s a great book that illustrate several common data pitfalls, and in the epilogue Kaiser shares one of his own Real Data Science stories and I found myself nodding my head and saying, “Yup, that’s how I spent several days in the last couple of weeks!”