The Machine Learning Systems (ML Sys) course focuses on the integration of algorithms, system software, and hardware to build efficient and scalable machine learning services. It covers topics such as the evolution of machine learning architectures, including GPUs and specialized AI accelerators, and explores optimization techniques like CUDA programming, parallel training, efficient inference, and model optimization. The course delves into the practical challenges of large-scale neural network development, training and inference, emphasizing the importance of a holistic view of communication-storage-computing optimization and parallelism. Ethical considerations, including privacy, safety, and sustainability, are also addressed. Through hands-on programming and project work, students gain a real-world understanding of ML systems, from design to deployment, and learn to create machine learning systems that are efficient, scalable, and reliable for the fast-evolving and diverse AI applications.
Instructor in Charge: Professor Li Shang, Professor Yuedong Xu
Instructor Email: lishang@fudan.edu.cn, ydxu@fudan.edu.cn
| Week | Topic | Materials |
|---|---|---|
| 1 | Introduction: The Intellectual Map of Machine Learning Systems | Book Chapter 1, Slides |
| 2 | CPU Foundations and GPU Emergence | Book Chapter 2, Slides |
| 3 | GPU Architecture for Machine Learning Systems | Book Chapter 3, Slides |
| 4 | CUDA Programming Through the Lens of GPU Architecture | Book Chapter 4, Slides |
| 5 | CUDA Programming as Hardware-Software Co-Optimization | Book Chapter 5, Slides |
| 6 | Multi-Agent Infra. as LLM Scaffolding | Multi-agent Infra, Slides |
| 7 | Deep Learning Compilers: From “AI Everywhere” to the LLM Era | Book Chapter 7, Slides |
| 8 | Data Parallelism in LLM Training | Book Chapter 8, Slides1, Slides2, Slides3 |
| 9 | Model Parallelism in LLM Training | Book Chapter 9, Slides1 |
| 10 | TBD | TBD |
| 11 | TBD | TBD |
| 12 | TBD | TBD |
| 13 | TBD | TBD |
| 14 | TBD | TBD |
| 15 | TBD | TBD |
| 16 | TBD | TBD |
Students will complete a course project focusing on the design, optimization, or analysis of machine learning systems. Projects may involve GPU programming, distributed training systems, model efficiency techniques, or system–hardware co-design.
For course-related questions, please contact the course coordinator: Mrs.Wu (ping_wu@fudan.edu.cn)