STATISTICS 216- Winter 2018


Overview of supervised learning, with a focus on regression and classification methods. Syllabus includes: linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; tree-based methods, random forests and boosting; support-vector machines; Some unsupervised learning: principal components and clustering (k-means and hierarchical). Computing is done in R, through tutorial sessions and homework assignments. This math-light course is offered via video segments (MOOC style), and in-class problem solving sessions. Prerequisites: first courses in statistics, linear algebra, and computing.

Objectives

The objectives of this course are:

The course covers the entire contents of the textbook "An Introduction to Statistical Learning, with Applications in R". This book is available in hardcover at the bookstore or from Springer or Amazon, or in pdf form through the Stanford libraries or from the book website.


Logistics

Mondays and Wednesdays 3:00-4:20
First meeting: January 8, 2018
Last meeting: March 14, 2018
Location: Gates B01

This course is in the "flipped" format. The lectures are pre-recorded, and students will watch them in their own time. Students should register for this online content at
https://suclass.stanford.edu/courses/course-v1:Statistics+Stats216+Winter2018/about
You need to create an account first, by signing in to https://suclass.stanford.edu
with your SUNet Is and create a username, before you can enroll in the course. The course material consists of recorded video chunks (typically around 10-12 minutes long), as well as quiz and review  questions. Each week new material will become available, and students are expected to keep up. The lectures follow the course textbook, and the schedule is:

Week 1 starting January 7: Chapters 1 & 2; Intoduction and Statistical Learning
Week 2 starting January 14: Chapter 3; Linear regression
Week 3 starting January 21: Chapter 4: Classification
Week 4 starting January 28: Chapter 5; Resampling methods
Week 5 starting February 4: Chapter 6; Linear model selection and regularization
Week 6 starting February 11: Chapter 7; Moving beyong linearity
Week 7 starting February 18: Chapter 8; Tree-based methods
Week 8 starting February 25: Chapter 9; Support vector Machines
Week 9 starting March 4: Chapter 10; Unsupervised learning
Week 10 starting March 11: review

(these starting dates are Sundays; the material on EdX will be available at 8am PST on Sunday)

All the slides and R-session scripts used in these videos are available from that website as well.

The in-class sessions will be a more hands-on experience, where we will solve problems both with pen and paper and in R. Attendance of these sessions is required for all on-campus students. Generally these sessions will be held on Wednesdays, and Mondays will be a free day for students to watch videos etc. We may occasionally use the Monday session for in-class work, and will certainly use it for the opening class on January 9.


Instructors


Rob Tibshirani. Sequoia Hall, room 106, <tibs@stanford.edu>, 5-5989 (much more reachable by email); Office Hours: Fri 2-3 Sequioa Hall 106

Teaching Assistants

Claire Donnat <cdonnat@stanford.edu> office hours: wed 5-6, 7-8 (SCPD) Sequioa 207 //(SCPD students: Zoom-meeting ID: 792517472)

Michael Feldman <feldman6@stanford.edu> office hours: Mon 3-5 420-286

Kenneth Tay<kjytay@stanford.edu> ; office hours: Th 1-3, Sequioa Hall 105 (library)

David Walsh <dwalsh@stanford.edu > office hours: Wed 1-3 Sequioa 207

Qian Zhao < qzhao1@stanford.edu> office hours: Tue 6-7pm, 7-8pm (SCPD Zoom-meeting ID: 200922369), Sequoia 105 (Library)

You can reach the teaching staff by sending email to
     stats216-win1718@lists.stanford.edu

 


Course materials

Course materials, homeworks etc will all be available from canvas site


Piazza

We have set up Piazza site for the course, and all students enrolled in the class should already be enrolled on the site. We encourage students to use Piazza to discuss issues related to the course material.

For the sake of the class, please use Piazza carefully. Check whether your question has already been asked, before posting it

Students may earn up to 5% extra credit by participating in the Piazza discussion forum. For each time a student asks a question about course material (that has not yet been asked) or answers an unanswered student question, that student gets 1% point extra credit tacked on to the course grade. No credit is rewarded for Piazza participation after the final exam. And one point maximum per week!

We will moderate the discussion and emphasize that students should NOT ask or answer howework questions, or otherwise use the site inappropriately.

Note that the TAs can answer only some fixed number of questions per day. And the Professor will jump in occasionally, when needed.

Computing

Students will be required to use R in this course, and the lectures include instruction in the use of R. R is free, excellent, and available on most platforms. R is available from CRAN.  There are many introductory documents on the CRAN website (click on the Contributed button under Documentation). The first in the list by John Maindonald is especially recommended.

In the course, we also introduce you to the R Studio IDE (integrated development environment), a user-friendly environment for developing, running, and documenting R code.

Our in-class sessions will often require the use of computing, so please bring your laptops to class. If you don't have a laptop, you can team up with someone who does.


Homework

There will be regular biweekly homework assignments - 4 in all. Homework will include analysis of datasets, analytical and conceptual problems, and programming assignments. Students must do the assignments on their own, since they count toward the final grade for the course. Some parts of the assignments involve computing. Where appropriate, we will indicate that students MAY do the COMPUTING part of the assignments in groups (of size at most 4). If so, students must indicate the membership of the groups.

 

Homework and other handouts will be distributed via the canvas homepage. All of the homework assignments will be graded, and solutions will be made available.
LATE HOMEWORK will be penalized at 5% of the maximum score per day. Homework turned in more than 7 days late will not be graded. The final homework has a sharp deadline of the due date.

Exams

There will be a midterm and final exam. The final grade will be determined according to quizzes (10%), HWs (35%), midterm (15%), final (40%)

Important dates


Jan 17 HW1 assigned
Jan 31 HW1 Due, HW2 assigned
Feb 14 HW2 Due
Feb 19 HW3 assigned
Feb 21 Midterm (1hr, in-class)
Mar 5 HW4 assigned
Mar 7 HW3 due
Mar 16 HW4 Due
Mar 19 Final Exam 3:30-6:30PM Gates B01


Text

An Introduction to Statistical Learning, with Application in R  by G. James, D. Witten, T. Hastie and R. Tibshirani (Springer, 2013).

The book is available in hardcover at the bookstore or from Springer or Amazon, or in pdf form through the Stanford libraries or from the book website. Springer offers a discount if your buy it at springer.com; if so, use the discount code 3Ncaa8eNq33efzG