DS1300 at SMU

Recently, I was honored to be asked to teach as part of SMU’s new undergraduate degree in Data Science. Along with another member of my team, Dr. Robert Kalescky, we taught an 11-day intensive May Term course that covered topics ranging from an introduction to programming in Python, to using high-performance computing resources, to data ethics. Due to COVID-19 restrictions, the course was held in a synchronous but primarily virtual modality where students attended the course remotely while Dr. Kalescky and myself traded off lecturing throughout the 6 hour class days.

Teaching DS1300 at SMU during May Term 2021 in SMUFlex modality. Course material was delivered via GitHub and run in JupyterLab running on ManeFrameII, SMU’s high-performance computing cluster.

Teaching DS1300 at SMU during May Term 2021 in SMUFlex modality. Course material was delivered via GitHub and run in JupyterLab running on ManeFrameII, SMU’s high-performance computing cluster.

Class Structure

Each day in the course was divided into roughly 4 sections: Discussion, Lecture, Exercises, Project. Due to the accelerated nature of the course, we thought it was important for the students to have time to do work in class, including working on their “homework” so as not to force them into having long hours of programming at home following lengthy days of lecture.

  • Discussion - As a class we read through Dr. Hannah Fry’s book “Hello World: Being Human in the Age of Algorithms”. This book covers many topics relevant to introductory students as they enter the world of data and programming like data ethics, privacy, and data ownership.

  • Lecture - The core of the class was to teach the students to ask and answer questions about and using data. As such, the lecture topics covered a broad range of topics including using ManeFrame II (our SMU HPC cluster) to host a python development environment, project management in GitHub, basic data visualization, practical statistics and more.

  • Exercises - As the semester progressed, the students we’re tasked with various exercises and assignments to reinforce and evaluate the learning outcomes. Primarily, these were coding exercises where they were asked to replicate or expand on an analysis that was demonstrated in class such as changing the colors on a graph or pulling a new data set and mirroring a data cleaning pipeline. Some exercises were non-technical and focused on data literacy and identifying reliable methodologies in data journalism.

  • Project - Over the course of the short semester, students were tasked with a group project in which, as a group, they had to choose a data set(s), ask a research question, perform the requisite data cleaning, complete an analysis, and present the results with visualizations.

Class Resources

The class was managed internally in our SMU Canvas Learning Management System but to provide content to the students, we designed a workflow centered around GitHub. The instructors (Dr. Kalescky and I) set up a private GitHub reopsitory to which we pushed all lecture content, in class exercises, homeworks, etc. as markdown (.md) files. This allowed us to seemesly use GitHub to track the changes even to content written as a Jupyter ipython notebooks. We then used GitHub actions to automatcally build the course website (as a Jupyter Book) and a public-facing “notebooks” repository with downloadable class materials for the students. Students were then taught how to fork the repo, create a working directory to do their assignments in, and then to push their work back to GitHub for grading.