/gridfm/harvard/ai4ai/harvard -

Energy-Efficient and Flexible LLM Systems

Minlan Yu – Harvard University

Large language model inference is rapidly becoming one of the most resource-intensive workloads, introducing increasing pressure to the electric grid. In this talk, I present our recent work on making LLM serving more energy-efficient by leveraging learning-augmented algorithms and more flexible to meet the target power usage with runtime reconfigurations.