Machine Learning Systems (ML Sys)

← Back to Home


Course Description

The Machine Learning Systems (ML Sys) course focuses on the integration of algorithms, system software, and hardware to build efficient and scalable machine learning services. It covers topics such as the evolution of machine learning architectures, including GPUs and specialized AI accelerators, and explores optimization techniques like CUDA programming, parallel training, efficient inference, and model optimization. The course delves into the practical challenges of large-scale neural network development, training and inference, emphasizing the importance of a holistic view of communication-storage-computing optimization and parallelism. Ethical considerations, including privacy, safety, and sustainability, are also addressed. Through hands-on programming and project work, students gain a real-world understanding of ML systems, from design to deployment, and learn to create machine learning systems that are efficient, scalable, and reliable for the fast-evolving and diverse AI applications.


Weekly Lectures

Week Topic Materials
1 Introduction: The Intellectual Map of Machine Learning Systems Book Chapter 1 Slides
2 CPU Foundations and GPU Emergence Book Chapter 2 Slides
3 GPU Architecture for Machine Learning Systems Book Chapter 3 Slides
4 CUDA Programming Begins with Architecture Book Chapter 4 Slides
5 TBD TBD
6 TBD TBD
7 TBD TBD
8 TBD TBD
9 TBD TBD
10 TBD TBD
11 TBD TBD
12 TBD TBD
13 TBD TBD
14 TBD TBD
15 TBD TBD
16 TBD TBD

Course Projects

Students will complete a course project focusing on the design, optimization, or analysis of machine learning systems. Projects may involve GPU programming, distributed training systems, model efficiency techniques, or system–hardware co-design.


Contact

For course-related questions, please contact the instructor.