GRADE: Quantifying Sample Diversity
in Text-to-Image Models

Bar-Ilan University1 Allen Institute for AI2 University of Washington3 ETH Zรผrich4

We introduce GRADE, a method for assessing the output diversity of images generated by text-to-image models. Using LLMs and Visual-QA systems, GRADE quantifies diversity across concept-specific attributes by estimating attribute distributions and calculating normalized entropy.

Our findings reveal substantial homogeneity in T2I outputs, with low diversity across all 12 models we test. Surprisingly, we find bigger and more prompt adherent models deteriorate in diversity.

Finally, we hypothesize that low diversity is due to reporting bias in the training data and show that the diversity in LAION, closely corresponds to the diversity of Stable Diffusion 1 and 2, the models that were trained on it.

More

Select a concept to explore it's generated diversity ๐ŸŒŸ ๐Ÿ–ผ๏ธ

We identified the following visual attributes for a cake. Click an attribute to see its measured diversity across models.


Explore images generated by different models ๐Ÿค– ๐ŸŽจ

Click on one of the models to view a sample of the generated images for this concept

Cite Us ๐Ÿ“œ๐Ÿ–Š๏ธ


@misc{rassin2024gradequantifyingsamplediversity,
  title={GRADE: Quantifying Sample Diversity in Text-to-Image Models}, 
  author={Royi Rassin and Aviv Slobodkin and Shauli Ravfogel and Yanai Elazar and Yoav Goldberg},
  year={2024},
  eprint={2410.22592},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2410.22592}, 
}