Reversim 2025

Optimizing the Deployment of LLMs for Cost Efficiency

In this talk, we delve into the vital considerations surrounding cost management when deploying LLMs in real-world applications. We explore the nuances of token usage, infrastructure costs, human resources, and ancillary expenses. Furthermore, we explore various optimization methods, including model architecture optimization, fine-tuning, quantization, caching, prefetching, and parallelization, alongside distributed computing. Additionally, we address practical techniques for estimating costs through methodologies such as cost modeling, cost monitoring, and management, as well as budgeting and planning. These insights aim to empower organizations in effectively navigating the financial landscape associated with LLM deployment, ensuring optimized resource allocation and sustainable operations.

Time & Room

Tue, Oct 28th, 14:00 - 14:30 • Room: A4+A5

Speakers

Michael Levinger

Sr. Data Scientist at Melio