CS 598XU: Reliability of Cloud-Scale Systems

Spring 2022

Tianyin Xu
4108 Siebel Center
Xu's virtual office can be found in Piazza

Tu/Th 15:30–16:45pm
2233 Everitt Laboratory
Note: The first-week classes will be online (per university policy).

Teaching Assistant
Siyuan Chai

Office Hours
Tu 16:45-17:45pm


Course Overview

The purpose of this course is to teach the principles and practices of reliability engineering in modern "cloud-scale" systems, and expose students to the research of software and system reliability. We will look at how large-scale systems fail in the real world, and we will study the state-of-the-art reliability techniques and practices, including those widely adopted in industry and new ideas proposed by academia.

We will be going over the following topics:

This is a research-oriented seminar course with a major course project.

Prerequisite: CS 423, ECE 391, CS 425, CS 433, CS 523, or CS 525 (or equivalent).

Reading List

The course does not have a textbook. Instead, the course material will come from seminal, noteworthy, or representative papers and articles from the literature. Each lecture (except the first) will have two assigned papers to read, typically including one from academia and the other from industry. You should read these papers before coming to class, and be prepared to discuss them. Occasionally I will also list recommended readings; you are encouraged to read those, but not required.

I highly recommend you to read Griswold's advices on how to read a research paper. The take-home message is that until you can answer a bunch of questions, you are not done reading a paper.

I also strongly encourage you to discuss the papers with other students in the class — you may have insights that others do not, and vice versa. Often students form reading groups, which I heartily encourage. Note that group discussion, however, is not an effective substitute for actually reading the paper.

You are required to write reviews for the assigned papers. The review form (which consists of a number of questions) will be posted at Piazza. The review is due 11:59pm Mon/Wed (the day before the class day). The paper reviews contribute to 10% of your overall grade.

Class Participation

Since this is a discuss-based course, class participation is required. We will discuss the papers and articles that we will have all read before each class. I will lead discussions by asking questions of students at random in class. Note that your answers to these questions form 10% of your overall grade, so it is important that you both show up to class as well as read the papers.

Research Project

The best way to learn is by doing. You will undertake your own research project individually or in a group of two. A group with a size larger than two is not encouraged, but is possible if you have a strong justification that the project needs more members. I will provide a list of ideas to get you started thinking, but I highly encourage you to pursue your own ideas which typically lead to better results. You will write a project report and present it at the end of the course. The details of the research project is described in the following link.

You can find a list of other projects that were subsequently published in workshops, conferences, and journals.

We do not release information that are not opened to public to protect students' work (many of which are closely connected to their thesis research).

Note: You are expected to be aware of Academic Integrity Guidelines of the University of Illinois. Any violation of the course or university policies will be treated seriously, and could lead to severe repercussions. Pleae don't cheat. It's not worth it.


There is no homework, no midterm, no final exams. The course is about discussing Systems research (10%) and doing Systems research (90%). Note that the Project Proposal, Checkpoint Report 1, and Checkpoint Report 2 account for 5%, 5%, and 5%, respectively.

CS Values and Code of Conduct

All members of the Illinois Computer Science department -- faculty, staff, and students -- are expected to adhere to the CS Values and Code of Conduct. The CS CARES Committee is available to serve as a resource to help people who are concerned about or experience a potential violation of the Code. If you experience such issues, please contact the CS CARES Committee. I am also available for issues related to this class.