(Asterisks indicated “must reads”!)
Seminal work by Brieman on the distinction between a more classical statistical modeling approach and more recent “algorithmic” modeling approach.
Excellent introduction to concepts and issues in using machine learning for epidemiologists.
Attempt to demonstrate the fundamentals behind the super learner.
Detailed resource on using the super learner in real data settings.
Important example of some fundamental constraints on using data with algorithms to predict outcomes fairly.
Excellent introduction to machine learning (emphasis on econometrics but very useful for epidemiologists).
Important example of how ML algorithms can yield very misleading predictions when deeper aspects of the data-modeling complex are not taken into account.
Technical Skills
*Burkov (2019) The Hundred Page Machine Learning Book
Burkov (2021) Machine Learning Engineering
Kuhn and Johnson (2016) Applied Predictive Modeling
Wasserman (2006) All of Nonparametric Statistics
Shalev-Schwartz and Ben-David (2014) Understanding Machine Learning: From Theory to Algorithms
Efron and Hastie (2017) Computer Age Statistical Inference: Algorithms, Evidence, and Data Science
Hastie, Tibshirani, Friedman (2009) Elements of Statistical Learning
James, Witten, Hastie, Tibshirani (2017) Introduction to Statistical Learning