Large Systems Summary

In this course we teach you about the large, distributed systems that by now have become common place and we use everyday. We focus on 3 aspects:

  1. Their design
  2. How this design is implemented
  3. How to administer them

Distributed systems are much harder to deal with than single computer systems. We teach you about the fundamental problems, general solutions that exist (or not!), and how these are solved by particular applications that we use today. Also, you can imagine that administering a system that consists of tens of thousands of servers, and is used 24/7 is much more complicated than a few servers in a rack. Finally, we now have the Cloud that claims that it can solve everything for you, but is that true and at what cost?

Some topics that will be discussed:

  • Scaling techniques: Replication, Partitioning/Sharding, Asynchronous Communication
  • Virtualization: techniques behind Xen, KVM, Containers and Unikernels
  • Fault Tolerance
  • Designs of Google Search, Hadoop, Facebook
  • Configuration Management: Ansible, Puppet, IaC
  • DevOps: changing the way we develop and operate software systems
  • Infrastructure management at a real-life large organization

The course has lectures in the morning and practical assignments in the afternoon. A number of lectures are given by guest speakers with massive experience in their field. In the afternoon, you will be working on your own hypervisor and running a small cluster with fellow students, and getting hands-on experience with a real Cloud.

This topic does not exist yet

You've followed a link to a topic that doesn't exist yet. If permissions allow, you may create it by clicking on “Create this page”.