Just a quick note: In his recent (when I wrote this but neglected to publish it) paper, 50 Years of Data Science, David Donaho pretty much nails key foundations of Data Science and how it’s different from (just) Statistics or even (just) Machine Learning. I highly recommend that you read it.
It’s full of great quotes like this:
“… In those less-hyped times, the skills being touted today were unnecessary. Instead, scientists developed skills to solve the problem they were really interested in, using elegant mathematics and powerful quantitative programming environments modeled on that math. Those environments were the result of 50 or more years of continual refinement, moving ever closer towards the ideal of enabling immediate translation of clear abstract thinking to computational results.
“The new skills attracting so much media attention are not skills for better solving the real problem of inference from data; they are coping skills for dealing with organizational artifacts of large-scale cluster computing. …”
Great stuff.