Reversim 2025

When Labels Are Missing: Building ML Pipelines Without Annotated Data

Access to labeled data remains one of the biggest bottlenecks in deploying machine learning in real-world domains - especially in healthcare, where annotations are costly, slow, and require domain expertise. In this talk, we’ll present how we use a rule-based linguistic approach to generate high-quality supervision signals in the absence of manual labels. We start by capturing domain knowledge through clear, interpretable rules, which lets us generate high-quality pseudo-labels in situations where manual annotations aren’t available. These rule-based labels give us a reliable starting point for training models, and we continue to use the rules throughout the lifecycle: re-training, flagging edge cases, and adapting to new data. Rather than viewing rule-based systems and ML as competing approaches, we design them to work in tandem- where rules provide structure, consistency, and interpretability, and models bring generalization and scalability. This approach helps us improve accuracy over time while staying less dependent on costly human annotation.

Time & Room

Tue, Oct 28th, 15:50 - 16:20 • Room: A4+A5

Speakers

Amit Avni Yaccobi

Nym health, Head of R&D