It Takes a Village to Build a Good Hebrew Speech-to-Text Model

In this talk we will share the ivrit.ai story, a zero-budget non-profit community project that set out to build an open, high-quality Hebrew speech-to-text (STT) model. We will walk through the hurdles that stand between an idea and a production-ready model. Our story begins with a WhatsApp chat between four strangers asking, “How hard can it be to train a decent Hebrew STT model?”, a question that turned into a two-year adventure. Step by step we discovered the power of a language community to build its own AI, pushing Hebrew toward first-class-citizen status in the global AI landscape. Join us to hear how we navigated the legal minefield, rallied content creators and volunteer annotators, released free public transcription tools, built a community model that cut word-error rate by 25%, all of which would likely have failed had we tried to monetise the project.

Room: Room 3

Tue, Oct 28th, 13:20 - 13:50

Speakers

Yanir Marmor