Syllabus

Downloadable Version

We provide an annotated downloadable (draft) pdf version of the syllabus. It also also linked-to from other sections of the site. If there is a disagreement between the website version and the pdf version, the website version may be newer. Let us know if you see differences. (See below for email etiquette.)

The core content of the downloadable syllabus follows below for easier on-line browsing. Note that some of the listed dates and times may still correspond to the Fall 2020 instance of the predecessor course STAT 430 “DSPM”.

Overview

Data science is emerging as a field that is revolutionizing science and industries alike. Work across nearly all domains is becoming more data driven, affecting both the jobs that are available and the skills that are required. As more data and ways of analyzing them become available, more aspects of the economy, society, and daily life will become dependent on data.

Source: Envisioning the Data Science Discipline: The Undergraduate Perspective. National Acadamies, 2018.

This course provides the principal programming foundations for working with data at scale.

Data analysts are in demand, and particularly those who can walk the walk and not only talk the talk. This course aims for a “hands-on, roll-up-your-sleeves” learning-by-doing approach which can be highly rewarding to those willing to put in the required effort.

Learning Objectives

After this course, students should be able to …

  • analyze code in multiple data science programming languages;
  • write short programs in several relevant languages;
  • manipulate files and data using the command-line;
  • utilize git for version control, collaboration and publishing;
  • use R as a language and environment for programming with data;
  • solve new data science problems using these tools;
  • know a variety a tools based on first-hand experience;
  • show your skills via a group-project with a topic of your choice.

Credits

The course counts for three credits for undergraduate students, and for four credits for graduate students. Graduate students are required to submit more extensive homework assignments.

Key Facts

Core Content

  • shell for managing files, commands, information flow, …
  • git for modern version control supporting social computing
  • sql as a base layer for data management and control
  • markdown for programmatic control of html, pdf, … communication
  • R for programming with data, and our core building block
  • plus some extras such as Docker and more

Instructional Staff

The setup and timetable is as follows:

Title Name Location Hours Type / Booking
Instructor Dirk Eddelbuettel Zoom Mon 7pm - 8pm Open
Zoom via booking system Thu 7pm - 8:30pm 15m, one-on-one
TA Alton Barbehenn Zoom Wed 10am - 11am Open
Zoom via booking system Fri 9am - 12pm 15m, one-one-one

We offer two types of office hours. The first type is open with an open door where you can walk in and out, attend every week, or never—as you see fit. The second type are individual one-on-one office hours that fifteen minutes each, and which you book via the calendly link above. We expect that you limit your use of these to two per term. The booking system only allows one week out so please be considerate of your fellow students. Under genuinely exceptional circumstances, additional visits can be scheduled on demand. (Note that the Zoom links above differ per time slot. Make sure you pick the correct one.)

Lecture Location

What When
Location online
Times no fixed times, aiming for weekly availability
Hours Office hours as scheduled, see below

Homework Schedule

Homework assignments (generally) cover the preceding four lectures, and are not cumulative. They prepare for the quiz (see next section) covering the same period, and permit students to do rigorous exercises which are graded electronically using PrairieLearn and CBTF.

Week Given Due
Homework 1 – Week 2 Thu, Sep 9 Thu, Sep 16
Homework 2 – Week 4 Thu, Sep 23 Thu, Sep 30
Homework 3 – Week 6 Thu, Oct 7 Thu, Oct 14
Homework 4 – Week 8 Thu, Oct 21 Thu, Oct 28
Homework 5 – Week 10 Thu, Nov 4 Thu, Nov 11
Homework 6 – Week 12 Thu, Nov 18 Thu, Dec 2

Homeworks are generally released at 10:00am and due the following week at 10:00am. Graduate students receive (generally two) additional required questions. These questions are typically more substantial in nature and require extra effort than the regular questions for both undergraduate and graduate students. Undergraduates may opt to answer one or both of these questions for additional points, or challenges. Scoring is however capped at 100%.

Computer-Based Testing Quiz Schedule

Quizzes follow the bi-weekly schedule of the homework, and cover the same (typically two week) set of lectures, and are also not cumulative.

Quiz First Date Laste Date Late Weeks Covered
Quiz 1 Thu, Sep 16 Sat, Sep 18 Sun, Sep 19 Weeks 1 and 2
Quiz 2 Thu, Sep 30 Sat, Oct 2 Sun, Oct 3 Weeks 3 and 4
Quiz 3 Thu, Oct 14 Sat, Oct 16 Sun, Oct 17 Weeks 5 and 6
Quiz 4 Thu, Oct 28 Sat, Oct 30 Sun, Oct 31 Weeks 7 and 8
Quiz 5 Thu, Nov 11 Sat, Nov 13 Sun, Nov 14 Weeks 9 and 10
Quiz 6 Thu, Dec 2 Sat, Dec 4 Sun, Dec 5 Weeks 11 and 12

You can schedule our exam time within the window with the CBTF site. Each exam will be a session of 50 minutes. Upon written request to the instructors (see below for email etiquette) documenting the need, accomodation for an STAT 447 ONLINE section can be made on a case by case basis. Note that this must be need-based it is not an elective choice for Urbana-Champaign based students who are expected to test at the CBTF facility in person. Requesting online testing when you were able to attend the CBTF in person may be treated an academic integrity violation with its full consequences.

Prerequisites

Prior to taking this course, students should have:

  • Taken a rigorous Statistics course such as STAT 410 (and we state this as one example rather than a strict requirement).
  • Motivation for participation in an online class: readings, exercises, …
  • Basic computer skills

Online Access and Identification

The course is delivered primarily online and tested online. Students use Single-Sign-On with the University of Illinois ‘netid’ to access

  • all lectures and videos stored on the U of I Box account
  • access to RStudio Cloud for computing resources
  • GitHub via a U of I-administered SSO using GitHub ‘cloud’ resources
  • CBTF and PrairieLearn to access homeworks and quizzes
  • compass2g for grade and other course information

In addition, CBTF Online quizzes use CBTF proctors for student identity verification. The group project requires (recorded) group presentations also identifying each student.

Office Hours

This course offers office hours from different members of the course staff that are held at throughout the week at pre-scheduled times.

GitHub Forum

For class discussion, we will use a GitHub repository and its issue system. This forum will be private and restricted to those in the course.

It is very important that each student

  • registers a Github account (unless they already have one); since the Fall 2020 term we have been using a University of Illinois Single-Sign-On administered GitHub instance.
  • let the instructors know about the Github id so that we can invite the student to the (private, controlled via Single-Sign-On with NetId) course discussion project

Email

Before you start writing an e-mail to a member of the course staff please make sure your question is not:

  • Already answered in this syllabus or course FAQ: the syllabus serves as the guiding document for the course.
  • About exercises or homework: Questions should be asked via GitHub issues so that all students have access to the answer.
  • A technical issue or code error
    • Try to google the error verbatim (e.g. copy and paste into Google).

But please ensure your e-mails meet the following criteria:

  • The e-mail must be sent from an @illinois.edu account.
  • The start of the subject line should contain the tag: [STAT 447]
  • It should be followed by a space and a brief description.
  • Good: ‘[STAT 447] Cannot load data file: error …’
  • Bad: ‘[STAT 447] Need help’ or ‘[STAT 447] Code not working…’.
  • Use the course Staff E-Mail address that is instructors@stat447.com (or if you prefer help@stat447.com).

We try our best to respond within 24 hours.

Do not post homework code. The campus rules on academic integrity apply to all communication, including email.

External Tutors

Please see the FAQ item on for hire tutors.

Assessments

Attendance

As an on-line course there is no attendance count. You are strongly encouraged to follow all the lecture slides and video, study the readings and possibly some or most of the extra readings. Most importantly, you need to try the examples and code we show, and experiment with it. As a proxy for class participation, we consider participation in the Github issue topic discussion which, for an online class, is the closest we have to class discussions.

Homework

Homework assignments serve as a way to interact with the material outside of the classroom. Homework will be due at either 10:00 AM on the assigned due date, which should generally be Thursday. We score the mean of the top five homeworks, i.e. with the lowest homework score being dropped. As this gives one automatic “out”, late homework will generally not be accepted.

In general, there will be no exceptions to this policy. Please start early, make sure your environment is working correctly, and that you are able to produce a working document. We have two course assistants with on-campus office hours, but in order to ask meaningful questions you need to try answering the material first.

Collaboration Policy

While working on homework, students are encouraged to study in groups. But students should strive to independently supply answers to the homework problems. As we use an automated platform, submissions can be compared easily. Academic integrity standards apply.

Distribution Policy

Each homework will be distributed via PrairieLearn,

will be stored as combination of your NetId and the question.

Assignment Submission

Here are a few do and don’t tips for the PrairieLearn web submission. Consider the following stanza from an actual homework:

# Enter your code below: Do not alter the function signature:
# ensure it remains named 'iris_summary' and takes one argument.
# Ensure you return a data.frame as indicated in the question.

iris_summary <- function(irisdata) {

  # Enter code here

}

Consider the following recommendations carefully:

  • Do follow the structure of the provided function.
  • Do enter code where it says # Enter code here.
  • Do not write code before the opening brace.
  • Do not write code after the closing brace.
  • Do use the supplied irisdata object. The function signature clearly states that that is the (only) input you need and are given.
  • Do not load other data. You do not need data(something). You do not need to load anything (unless specifically asked when a question is about data loading or saving).
  • Do use the stated variable names: when the interface (or our instructions) say irisdata, do not deviate to iris or iris_data or any other form. Do write code to match the name exactly.
  • Do not load other packages unless asked to do so. We generally expect you to use an explicitly named package, or just the functions already in R, i.e. what is called ‘base R’.
  • Do follow the instructions. When it asks to return a data.frame do not return a matrix or data.table. Return a data.frame.
  • Do use the GitHub issue ticket linked to each question.
  • Do not post code or (partial or complete) answers at GitHub.

Grading

Each homework assignment will be a variable number of points; however, each homework assignment will have equal weight towards your final grade.

As stated above, we count best five out of six.

Quizzes

Instead of examinations, there will be to six weekly quizzes—see the section Schedule. The quizzes, just like the homework, will (generally) focus on the preceding (two weeks of) lectures and are (generally) not cumulative over the full course content.

And just like with the homework, you can drop one quiz grade over the course of the semester. We aim for six quizzes in total, and with the lowest quiz score being dropped the score will be the mean of the top five quiz scores.

Because of Covid-19, the Fall 2020 instance of the course now uses CBTF Online (instead of in-person testing at the CBTF Facility). CBTF Online is proctored by CBTF stuff over Zoom, and depends on the availability of the proctors. We will have one default time for each quiz along with two alternate times each. Use the CBTF ‘Conflict Request’ form to request an alternate time.

The policies of the CBTF are the policies of this course, and academic integrity infractions related to the CBTF are infractions in this course.

If you have accommodations identified by the Division of Rehabilitation-Education Services (DRES) for exams, please take your Letter of Accommodation (LOA) to the CBTF proctors in person before you make your first quiz reservation. The proctors will advise you as to whether the CBTF provides your accommodations or whether you will need to make other arrangements with your instructor.

Any problem with testing in the CBTF must be reported to CBTF staff at the time the problem occurs. If you do not inform a proctor of a problem during the test then you forfeit all rights to redress.

Group Project

There are several components associated with the group final project:

  • Project Proposal: The repository should contain an outline of what is planned, the sources of the data, possible transformation and possible modeling strategies and/or possible data visualizations. This can be provided via the README.md file of the repository.

  • Project Report: The project report can be thought of as an (informal) paper. Guided by the format of an academic paper, it describes the projects in a succinct yet complete fashion along with references. Markdown should be used to write it, the result can be either in html or pdf format.

  • Project Presentation and Slides: At the end of terms, a short recorded group video presentation, akin to a lightning talk, should introduce, present and summarize the work of the project in a form that is suitable for a general audience. A length of five minutes is a goal. The presentation should be supported by five to six slides, also produced in Markdown.

  • Evaluation of Peers, and Evaluations from Peers: We require a short informal statement of each team member briefly stating who within in the team did (roughly) what percentage of the work.

The Group Project provides an excellent opportunity to “shine” and to demonstrate your passion, skill, and capabilities for data science programming work. It provides a great chance to make a mark to create something special and distinguished.

The group projects have to be finalized by noon (12:00h, Central) time on December 9, 2021 (aka “reading day”).

Exams

There are no midterm or final examinations in this course. Instead, we have homework, quizzes, and a group project.

Late or Missing Work

Late work will not be accepted for either homework or the group project. <!–Watch the deadlines, and plan accordingly.

As the date and time of an exam is chosen by a student over an examination window, there will be no make-up exams administered once the window closes.–>

Course Grades

Type Weight
Homework One Third
Quizzes One Third
Group Project One Third

Grading is discretionary, and performed by the instructor and the course assistant(s). There are no retakes; we mark ‘best five out of six’ for homework and quizzes so everybody gets to drop one each.

Grading Scale

Minimum Grade Points
A- to A+ 90 to 100
B- to B+ 80 to 89.99
C- to C+ 70 to 79.99
D- to D+ 60 to 69.99
F below 60

Each ten point range is equally split over the three components (i.e. from minus to plus). Grades may be curved at the end of term before being finalized.

University Policies

Academic Integrity

The official University of Illinois policy related to academic integrity can be found in Article 1, Part 4 of the Student Code. Section 1-402 in particular outlines behavior which is considered an infraction of academic integrity. These sections of the Student Code will be upheld in the STAT 430 classroom. Any violations will be dealt with in a swift, fair and strict manner.

You may discuss methods for completing assignments with other students, but the execution of these methods and the preparation of the document must be done independently. Furthermore, there can be no discussion with other students or collaboration of any kind on exams. Sufficient evidence of sharing results, collaborating on written assignments, or simply relying on internet resources will generally result in:

  • First offense: receiving an undroppable zero on the assignment and being written up for an academic integrity violation.
  • Second offense: receiving an F in the course, an academic integrity violation, and recommendation for expulsion from the University.

If the evidence is indicative of a larger pattern, then the harshest penalty will be pursued.

Note that cheating includes both obtaining others' work, as well as distributing your own work.

  • You may discuss the assignment with your classmates, but your final answers must be your own. Your final document should be created independently.
  • To avoid any issues, do note copy and paste code. (With an exception for code provided for the course.)
  • Do not share RMarkdown or other submission files.

If we detect academic integrity violations, we will contact you through the FAIR system.

In short, please do not cheat.

Support resources and supporting fellow students in distress

As members of the Illinois community, we each have a responsibility to express care and concern for one another. If you come across a classmate whose behavior concerns you, whether in regards to their well-being or yours, we encourage you to refer this behavior to the Student Assistance Center (333-0050) or online. Based upon your report, staff in the Student Assistance Center reaches out to students to make sure they have the support they need to be healthy and safe.

Further, we understand the impact that struggles with mental health can have on your experience at Illinois; significant stress, strained relationships, anxiety, excessive worry, alcohol/drug problems, a loss of motivation, or problems with eating and/or sleeping can all interfere with optimal academic performance. We encourage all students to reach out to talk with someone, and want to make sure you are aware that you can access mental health support at the Counseling Center or McKinley Health Center. For mental health emergencies, you can call 911 or walk-in to the Counseling Center, no appointment needed.

Accessibility

To obtain disability-related academic adjustments and/or auxiliary aids, students with disabilities must contact the course instructor and the Disability Resources and Educational Services (DRES) as soon as possible. To contact DRES, you may visit 1207 S. Oak St., Champaign, call 333-4603, e-mail disability@illinois.edu or go to the DRES website.

Disclaimer

The instructor reserves the right to make changes that are academically advisable. Such changes, if any, will be announced in class. Please note that it is your responsibility to attend the class and keep track of the proceedings.