Text-to-Image Speed
Object Detection
Planning T2I
Planning Editing
Reflection
LaViDa-O (18s / image)
BAGEL (45s / image)
LaViDa-O (1 step detection)
a cute dog
[mask]
[mask]
[mask]
[mask]
a boy
[mask]
[mask]
[mask]
[mask]
ships
[mask]
[mask]
[mask]
[mask]
Qwen2.5-VL (autoregressive)
Text-to-Image with Planning
Image-Editing with Planning
Self-Reflection
*demo accelerated for better viewing experience