So far, when I’ve written on Data Science topics I’ve written about the fun part: the statistical analysis, graphs, conclusions, insights, etc. For this next series of postings, I’m going to concentrate more on what we can call Real Data Science®: the less glamorous side of the job, where you have to beat your data and software into submission, where you don’t have access to the tools or data you need, and so on. In other words, where you spend the vast majority of your time as a Data Scientist.
I’ll start the series with a review of Kaiser Fung’s Numbersense, published in 2013. It’s not mainly about Real Data Science, but I’ll start with it because it’s a great book that illustrate several common data pitfalls, and in the epilogue Kaiser shares one of his own Real Data Science stories and I found myself nodding my head and saying, “Yup, that’s how I spent several days in the last couple of weeks!”
I like to read various Stack Exchange websites, and one of them has a wonderful discussion of how you might divide a sandwich between three people fairly. Most of us are familiar with the two-person version: one person cuts and the other person gets the first choice. But what about if there are three people, or more?
Longitudinal Structural Equation Modeling, Todd D. Little, Guilford Press 2013.
Let me start by saying that this is one of the best textbooks I’ve ever read. It was written as if the author was our mentor, and I really get the feeling that he’s sharing his wisdom with us rather than trying to be pedagogically correct. The book is full of insights on how he thinks about building and applying SEMs, and the lessons he’s learned the hard way.
I’ve just discovered a unique app on the Mac App Store called Calca. It’s like a simple word-processor, except you can define variables and functions and do arithmetic with them, and it understands units and currencies and it handles matrices and vectors, and supports basic Markdown, and … it’s pretty amazing.
I just read about a website, accidental aRt, that shows how artistic R graphics can look when things go bad. Wonderful!
Percolation is the ability of a liquid-like substance to get through a solid-like lattice. An interesting question is how the likelihood of a material allowing percolation changes as the average density of the lattice changes from 100% (i.e. solid with no percolation) to 0% (i.e. nothing with total percolation). Read an interesting article that looks at the case of square lattices using R: Percolation Threshold on a Square Lattice