Class on "Large Scale Computing Infrastructures"

This is class MINF4526 in the fall semester 2011 lectures at UZH.

This class will provide an introductory overview on large scale computing infrastructures. Taking both an engineering and end-user perspective, the main focus will be on how scientific usecases can be enabled on such systems, what are the main technical challenges that have to be addressed and what are the currently available solutions.

The class will survey the current approaches in large scale computing infrastructure in both academic and industrial environments. Grid and Clouds will be treated as different approaches to the same category of problems: integrating large scale high throughput scientific usecases. The first part will cover an in-depth study on actual technologies used in modern distributed infrastructures focusing on security, access protocols, data handling and usability.

The second part will provide the basic tools needed to understand how to enable large scale scientific pipelines: a basic introduction to the Python programming language and examples taken from actual use cases.

The national grid computing infrastructure SMSCG will be used as a reference model as well as for the practical activities.

At the end of this class, student should have an understanding of how a large scale computing infrastructure is composed, operated and used. Students will also learn what are the main challenges that scientists have to face when porting their scientific usecases. Basic knowledge on Distributed Systems as well as on Object Oriented Programming paradigm are required for a full understanding of all modules of the class.

Outline of the Lectures

  1. Principles on distributed computing (Sep. 26, 2011)
  2. Batch-systems concepts and abstractions, I (Oct. 5, 2011)
  3. Introduction to Grids, the ARC middleware, and SMSCG (Oct. 12, 2011)
  4. Examples of High-Throughput Computing use cases (Oct. 19, 2011)
  5. Security, Authorization and Authentication in Grid environments (guest lecture by Andres Aeschlimann, SWITCH)
  6. Discussion on the HTC use cases (Nov. 2, 2011)
  7. Introduction to Python programming, I (Nov. 9, 2011)
  8. Introduction to Python programming, II (Nov. 16, 2011)
  9. Grid scheduling and the information system (Nov. 23, 2011)
  10. Pilot job model (Nov. 20, 2011)
  11. AppPot and virtualization support on the Grid (Dec. 7, 2011)
  12. Introduction to Cloud infrastructures
  13. Final Recap

Lab sessions

  1. Introduction to shell scripting (Sep. 27, 2011)
  2. Introduction to batch systems (Oct. 6, 2011)
  3. Introduction to batch system, continued -- see last lab session's slides (Oct. 13, 2011)
  4. Installation of the ARC client software (Oct. 20, 2011)
  5. Introduction to the ARC Grid middleware (Oct. 27, 2011)
  6. Discussion of HTC use cases (Nov. 3, 2011)
  7. Python programming exercises, I (Nov. 10, 2011)
  8. Introduction to ARClib with Python, I (Nov. 17, 2011)
  9. Introduction to ARClib with Python, II (Nov. 17, 2011)
  10. Introduction to ARClib with Python, III (Dec. 1, 2011)
  11. Introduction to AppPot, I (Dec. 8, 2011)
  12. Introduction to AppPot, II (Dec. 15, 2011)

Recommended reading

References and further reading suggestions are provided at the end of each set of slides, or on each lecture page.

This is a list of references on high-thorughput distributed computing, that do not fit in the scope of any particular lecture.

Additional material

top