Course Description

Safety-critical applications require high reliability in the computing and electronic systems, which is achieved by designing fault-tolerant systems. This course presents the fundamentals of fault-tolerant systems and basic fault-tolerance tools at both hardware and software levels, e.g., redundancy and re- execution. It will give a good understanding of how to detect faults, design computing components to tolerate faults, and measure the reliability of systems. This course will help students in how to design and analyze reliable computing systems.

Course Resources