Royi Rassin

Linguistic Binding in Diffusion Models

Ask a text-to-image model for “a pink bench and a yellow dog” and you'll often get a yellow bench. We traced this to cross-attention maps that don't agree on which tokens modify which objects, and proposed SynGen: a training-free fix that aligns them using the prompt's dependency parse.

paper code project

GRADE: Quantifying Sample Diversity in Text-to-Image Models

For an under-specified prompt like “a dog,” a good model should produce a variety of reasonable images. GRADE uses a VLM to propose prompt-relevant attributes (breed, color, setting) and measures entropy over them. Most production models are more mode-collapsed than people think — and it's getting worse, not better.

paper code project

D-MERIT: Rewarding Retrievers that Are Actually Right

Retrieval benchmarks mark one gold document per query — but many queries have several valid answers, so models that find a different-but-correct document get penalized. D-MERIT annotates the full answer set, and rankings shift meaningfully when you do.

paper project

Featured in The Guardian

DALL·E 2 is Seeing Double

An early look at a strange failure mode: asking for two different concepts often produces two of the same thing. We characterized the behavior and traced it to how the text encoder maps nouns to concepts — a paper that later found its way into public conversations about what these models actually understand.

paper code