We provide an annotated downloadable (draft) pdf version of the syllabus. It also also linked-to from other sections of the site. If there is a disagreement between the website version and the pdf version, the website version may be newer. Let us know if you see differences. (See below for email etiquette.)
The core content of the downloadable syllabus follows below for easier on-line browsing. Note that some of the listed dates and times may still correspond to the Fall 2020 instance of the predecessor course STAT 430 “DSPM”.
Data science is emerging as a field that is revolutionizing science and industries alike. Work across nearly all domains is becoming more data driven, affecting both the jobs that are available and the skills that are required. As more data and ways of analyzing them become available, more aspects of the economy, society, and daily life will become dependent on data.
Source: Envisioning the Data Science Discipline: The Undergraduate Perspective. National Acadamies, 2018.
This course provides the principal programming foundations for working with data at scale.
Data analysts are in demand, and particularly those who can walk the walk and not only talk the talk. This course aims for a “hands-on, roll-up-your-sleeves” learning-by-doing approach which can be highly rewarding to those willing to put in the required effort.
After this course, students should be able to …
The course counts for three credits for undergraduate students, and for four credits for graduate students. Graduate students are required to submit more extensive homework assignments.
shell
for managing files, commands, information flow, …git
for modern version control supporting social computingsql
as a base layer for data management and controlmarkdown
for programmatic control of html, pdf, … communicationR
for programming with data, and our core building blockThe setup and timetable is as follows:
Title | Name | Location | Hours | Type / Booking |
---|---|---|---|---|
Instructor | Dirk Eddelbuettel | Zoom | Mon 7pm - 8pm | Open |
Zoom via booking system | Thu 7pm - 8:30pm | 15m, one-on-one | ||
TA | Alton Barbehenn | Zoom | Wed 10am - 11am | Open |
Zoom via booking system | Fri 9am - 12pm | 15m, one-one-one |
We offer two types of office hours. The first type is open with an open door where you can walk in and out, attend every week, or never—as you see fit. The second type are individual one-on-one office hours that fifteen minutes each, and which you book via the calendly link above. We expect that you limit your use of these to two per term. The booking system only allows one week out so please be considerate of your fellow students. Under genuinely exceptional circumstances, additional visits can be scheduled on demand. (Note that the Zoom links above differ per time slot. Make sure you pick the correct one.)
What | When |
---|---|
Location | online |
Times | no fixed times, aiming for weekly availability |
Hours | Office hours as scheduled, see below |
Homework assignments (generally) cover the preceding four lectures, and are not cumulative. They prepare for the quiz (see next section) covering the same period, and permit students to do rigorous exercises which are graded electronically using PrairieLearn and CBTF.
Week | Given | Due |
---|---|---|
Homework 1 – Week 2 | Thu, Sep 9 | Thu, Sep 16 |
Homework 2 – Week 4 | Thu, Sep 23 | Thu, Sep 30 |
Homework 3 – Week 6 | Thu, Oct 7 | Thu, Oct 14 |
Homework 4 – Week 8 | Thu, Oct 21 | Thu, Oct 28 |
Homework 5 – Week 10 | Thu, Nov 4 | Thu, Nov 11 |
Homework 6 – Week 12 | Thu, Nov 18 | Thu, Dec 2 |
Homeworks are generally released at 10:00am and due the following week at 10:00am. Graduate students receive (generally two) additional required questions. These questions are typically more substantial in nature and require extra effort than the regular questions for both undergraduate and graduate students. Undergraduates may opt to answer one or both of these questions for additional points, or challenges. Scoring is however capped at 100%.
Quizzes follow the bi-weekly schedule of the homework, and cover the same (typically two week) set of lectures, and are also not cumulative.
Quiz | First Date | Laste Date | Late | Weeks Covered |
---|---|---|---|---|
Quiz 1 | Thu, Sep 16 | Sat, Sep 18 | Sun, Sep 19 | Weeks 1 and 2 |
Quiz 2 | Thu, Sep 30 | Sat, Oct 2 | Sun, Oct 3 | Weeks 3 and 4 |
Quiz 3 | Thu, Oct 14 | Sat, Oct 16 | Sun, Oct 17 | Weeks 5 and 6 |
Quiz 4 | Thu, Oct 28 | Sat, Oct 30 | Sun, Oct 31 | Weeks 7 and 8 |
Quiz 5 | Thu, Nov 11 | Sat, Nov 13 | Sun, Nov 14 | Weeks 9 and 10 |
Quiz 6 | Thu, Dec 2 | Sat, Dec 4 | Sun, Dec 5 | Weeks 11 and 12 |
You can schedule our exam time within the window with the CBTF site. Each exam will be a session of 50 minutes. Upon written request to the instructors (see below for email etiquette) documenting the need, accomodation for an STAT 447 ONLINE section can be made on a case by case basis. Note that this must be need-based it is not an elective choice for Urbana-Champaign based students who are expected to test at the CBTF facility in person. Requesting online testing when you were able to attend the CBTF in person may be treated an academic integrity violation with its full consequences.
Prior to taking this course, students should have:
The course is delivered primarily online and tested online. Students use Single-Sign-On with the University of Illinois ‘netid’ to access
In addition, CBTF Online quizzes use CBTF proctors for student identity verification. The group project requires (recorded) group presentations also identifying each student.
This course offers office hours from different members of the course staff that are held at throughout the week at pre-scheduled times.
For class discussion, we will use a GitHub repository and its issue system. This forum will be private and restricted to those in the course.
It is very important that each student
Before you start writing an e-mail to a member of the course staff please make sure your question is not:
But please ensure your e-mails meet the following criteria:
@illinois.edu
account.
[STAT 447]
instructors@stat447.com
(or
if you prefer help@stat447.com
).We try our best to respond within 24 hours.
Do not post homework code. The campus rules on academic integrity apply to all communication, including email.
Please see the FAQ item on for hire tutors.
As an on-line course there is no attendance count. You are strongly encouraged to follow all the lecture slides and video, study the readings and possibly some or most of the extra readings. Most importantly, you need to try the examples and code we show, and experiment with it. As a proxy for class participation, we consider participation in the Github issue topic discussion which, for an online class, is the closest we have to class discussions.
Homework assignments serve as a way to interact with the material outside of the classroom. Homework will be due at either 10:00 AM on the assigned due date, which should generally be Thursday. We score the mean of the top five homeworks, i.e. with the lowest homework score being dropped. As this gives one automatic “out”, late homework will generally not be accepted.
In general, there will be no exceptions to this policy. Please start early, make sure your environment is working correctly, and that you are able to produce a working document. We have two course assistants with on-campus office hours, but in order to ask meaningful questions you need to try answering the material first.
While working on homework, students are encouraged to study in groups. But students should strive to independently supply answers to the homework problems. As we use an automated platform, submissions can be compared easily. Academic integrity standards apply.
Each homework will be distributed via PrairieLearn,
will be stored as combination of your NetId and the question.
Here are a few do and don’t tips for the PrairieLearn web submission. Consider the following stanza from an actual homework:
# Enter your code below: Do not alter the function signature:
# ensure it remains named 'iris_summary' and takes one argument.
# Ensure you return a data.frame as indicated in the question.
iris_summary <- function(irisdata) {
# Enter code here
}
Consider the following recommendations carefully:
# Enter code here
. irisdata
object. The function signature clearly states that that
is the (only) input you need and are given.data(something)
. You do
not need to load anything (unless specifically asked when a question
is about data loading or saving).irisdata
, do not deviate to iris
or iris_data
or
any other form. Do write code to match the name exactly.data.frame
do not return a matrix
or data.table
. Return a data.frame
.Each homework assignment will be a variable number of points; however, each homework assignment will have equal weight towards your final grade.
As stated above, we count best five out of six.
Instead of examinations, there will be to six weekly quizzes—see the section Schedule. The quizzes, just like the homework, will (generally) focus on the preceding (two weeks of) lectures and are (generally) not cumulative over the full course content.
And just like with the homework, you can drop one quiz grade over the course of the semester. We aim for six quizzes in total, and with the lowest quiz score being dropped the score will be the mean of the top five quiz scores.
Because of Covid-19, the Fall 2020 instance of the course now uses CBTF Online (instead of in-person testing at the CBTF Facility). CBTF Online is proctored by CBTF stuff over Zoom, and depends on the availability of the proctors. We will have one default time for each quiz along with two alternate times each. Use the CBTF ‘Conflict Request’ form to request an alternate time.
The policies of the CBTF are the policies of this course, and academic integrity infractions related to the CBTF are infractions in this course.
If you have accommodations identified by the Division of Rehabilitation-Education Services (DRES) for exams, please take your Letter of Accommodation (LOA) to the CBTF proctors in person before you make your first quiz reservation. The proctors will advise you as to whether the CBTF provides your accommodations or whether you will need to make other arrangements with your instructor.
Any problem with testing in the CBTF must be reported to CBTF staff at the time the problem occurs. If you do not inform a proctor of a problem during the test then you forfeit all rights to redress.
There are several components associated with the group final project:
Project Proposal: The repository should contain an outline of what is planned, the sources of the data, possible transformation and possible modeling strategies and/or possible data visualizations. This can be provided via the README.md file of the repository.
Project Report: The project report can be thought of as an (informal) paper. Guided by the format of an academic paper, it describes the projects in a succinct yet complete fashion along with references. Markdown should be used to write it, the result can be either in html or pdf format.
Project Presentation and Slides: At the end of terms, a short recorded group video presentation, akin to a lightning talk, should introduce, present and summarize the work of the project in a form that is suitable for a general audience. A length of five minutes is a goal. The presentation should be supported by five to six slides, also produced in Markdown.
Evaluation of Peers, and Evaluations from Peers: We require a short informal statement of each team member briefly stating who within in the team did (roughly) what percentage of the work.
The Group Project provides an excellent opportunity to “shine” and to demonstrate your passion, skill, and capabilities for data science programming work. It provides a great chance to make a mark to create something special and distinguished.
The group projects have to be finalized by noon (12:00h, Central) time on December 9, 2021 (aka “reading day”).
There are no midterm or final examinations in this course. Instead, we have homework, quizzes, and a group project.
Late work will not be accepted for either homework or the group project. <!–Watch the deadlines, and plan accordingly.
As the date and time of an exam is chosen by a student over an examination window, there will be no make-up exams administered once the window closes.–>
Type | Weight |
---|---|
Homework | One Third |
Quizzes | One Third |
Group Project | One Third |
Grading is discretionary, and performed by the instructor and the course assistant(s). There are no retakes; we mark ‘best five out of six’ for homework and quizzes so everybody gets to drop one each.
Minimum Grade | Points |
---|---|
A- to A+ | 90 to 100 |
B- to B+ | 80 to 89.99 |
C- to C+ | 70 to 79.99 |
D- to D+ | 60 to 69.99 |
F | below 60 |
Each ten point range is equally split over the three components (i.e. from minus to plus). Grades may be curved at the end of term before being finalized.
The official University of Illinois policy related to academic integrity can be found in Article 1, Part 4 of the Student Code. Section 1-402 in particular outlines behavior which is considered an infraction of academic integrity. These sections of the Student Code will be upheld in the STAT 430 classroom. Any violations will be dealt with in a swift, fair and strict manner.
You may discuss methods for completing assignments with other students, but the execution of these methods and the preparation of the document must be done independently. Furthermore, there can be no discussion with other students or collaboration of any kind on exams. Sufficient evidence of sharing results, collaborating on written assignments, or simply relying on internet resources will generally result in:
If the evidence is indicative of a larger pattern, then the harshest penalty will be pursued.
Note that cheating includes both obtaining others' work, as well as distributing your own work.
If we detect academic integrity violations, we will contact you through the FAIR system.
In short, please do not cheat.
As members of the Illinois community, we each have a responsibility to express care and concern for one another. If you come across a classmate whose behavior concerns you, whether in regards to their well-being or yours, we encourage you to refer this behavior to the Student Assistance Center (333-0050) or online. Based upon your report, staff in the Student Assistance Center reaches out to students to make sure they have the support they need to be healthy and safe.
Further, we understand the impact that struggles with mental health can have on your experience at Illinois; significant stress, strained relationships, anxiety, excessive worry, alcohol/drug problems, a loss of motivation, or problems with eating and/or sleeping can all interfere with optimal academic performance. We encourage all students to reach out to talk with someone, and want to make sure you are aware that you can access mental health support at the Counseling Center or McKinley Health Center. For mental health emergencies, you can call 911 or walk-in to the Counseling Center, no appointment needed.
To obtain disability-related academic adjustments and/or auxiliary aids, students with disabilities must contact the course instructor and the Disability Resources and Educational Services (DRES) as soon as possible. To contact DRES, you may visit 1207 S. Oak St., Champaign, call 333-4603, e-mail disability@illinois.edu or go to the DRES website.
The instructor reserves the right to make changes that are academically advisable. Such changes, if any, will be announced in class. Please note that it is your responsibility to attend the class and keep track of the proceedings.