This course covers a number of statistical methods that are commonly referred to as "machine learning." These include methods for both regression and classification. Topics include nonparametric regression, including KNN, kernel regression, and splines; regularized regression, including lasso, ridge regression, and elastic net; methods based on logistic regression; tree-based methods, including random forests, bagging, and boosting; double machine learning; generalized random forests; support vector machines; and neural networks.
Students will be expected to understand the key ideas of the methods that are discussed and to become proficient at using several of them to analyze actual datasets. There will be three empirical assignments and a more substantial empirical project.
The course will make considerable use of the books
"An Introduction to Statistical Learning, with Applications in R",
by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani,
Second Edition, Springer, 2021.
and
"An Introduction to Statistical Learning, with Applications in Python",
by Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, and
Jonathan Taylor,
Springer, 2023.
The statistical content of the two books is largely identical, but the examples and exercises use two different programming languages. Students may use either R or Python, both of which can be installed at no cost. At time of writing, the instructor and the T.A. are more familiar with R, but the relative popularity of Python seems to be on the upswing.
Note that PDF versions of both books can be obtained (legally) on-line. How to do so is left as an exercise.
Some use will also be made of the recent book
"Applied Causal Inference Powered by ML and AI",
by Victor Chernozhukov, Christian Hansen, Nathan Kallus, Martin
Spindler, and Vasilis Syrgkanis.
The book and associated materials can be obtained from this website.
[Go to other graduate courses]
[Go to Department of Economics home page]
[Go to Queen's University home page]