Machine Learning Systems (ML Sys)

← Back to Home


Course Description

The Machine Learning Systems (ML Sys) course focuses on the integration of algorithms, system software, and hardware to build efficient and scalable machine learning services. It covers topics such as the evolution of machine learning architectures, including GPUs and specialized AI accelerators, and explores optimization techniques like CUDA programming, parallel training, efficient inference, and model optimization. The course delves into the practical challenges of large-scale neural network development, training and inference, emphasizing the importance of a holistic view of communication-storage-computing optimization and parallelism. Ethical considerations, including privacy, safety, and sustainability, are also addressed. Through hands-on programming and project work, students gain a real-world understanding of ML systems, from design to deployment, and learn to create machine learning systems that are efficient, scalable, and reliable for the fast-evolving and diverse AI applications.


Administrative

Instructor in Charge: Professor Li Shang, Professor Yuedong Xu

Instructor Email: lishang@fudan.edu.cn, ydxu@fudan.edu.cn

Weekly Lectures

Week Topic Materials
1 Introduction: The Intellectual Map of Machine Learning Systems Book Chapter 1, Slides
2 CPU Foundations and GPU Emergence Book Chapter 2, Slides
3 GPU Architecture for Machine Learning Systems Book Chapter 3, Slides
4 CUDA Programming Through the Lens of GPU Architecture Book Chapter 4, Slides
5 CUDA Programming as Hardware-Software Co-Optimization Book Chapter 5, Slides
6 Multi-Agent Infra. as LLM Scaffolding Multi-agent Infra, Slides
7 Deep Learning Compilers: From “AI Everywhere” to the LLM Era Book Chapter 7, Slides
8 Data Parallelism in LLM Training Book Chapter 8, Slides1, Slides2, Slides3
9 Model Parallelism in LLM Training Book Chapter 9, Slides1
10 TBD TBD
11 TBD TBD
12 TBD TBD
13 TBD TBD
14 TBD TBD
15 TBD TBD
16 TBD TBD

Course Projects

Students will complete a course project focusing on the design, optimization, or analysis of machine learning systems. Projects may involve GPU programming, distributed training systems, model efficiency techniques, or system–hardware co-design.

Course project phase 1

Course project phase 2


Contact

For course-related questions, please contact the course coordinator: Mrs.Wu (ping_wu@fudan.edu.cn)