Computer Science Phd Student at UCLA
Original Works
Publications
-
LaViDa: A Large Diffusion Model for Vision-Language Understanding
Shufan Li, Konstantinos Kallidromitis, Hritik Bansal, Akash Gokul, Yusuke Kato, Kazuki Kozuka,
Jason Kuen, Zhe Lin, Kai-Wei Chang, Aditya Grover
Universal Multimodal Diffusion Language Model for Vision-Language Understanding
-
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
Shufan Li*, Konstantinos Kallidromitis*, Akash Gokul*, Arsh Koneru, Yusuke Kato, Kazuki Kozuka, Aditya Grover
In-Context Reflection Improves Text-to-Image Generation
-
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Shufan Li*, Konstantinos Kallidromitis*, Akash Gokul*, Zichun Liao, Yusuke Kato, Kazuki Kozuka, Aditya Grover
Universal Multimodal Diffusion Generative Model for Image, Audio and Text
-
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following.
Shufan Li*, Harkanwar Singh, Aditya Grover
Image editing following multi-modal instructions.
-
SegLLM: Multi-round Reasoning Segmentation
XuDong Wang*, Shaolun Zhang*, Shufan Li*, Konstantinos Kallidromitis, Kehan Li, Yusuke Kato, Kazuki Kozuka, Trevor Darrell
Multi-round interactive Segmention using Large Language Model
-
Aligning Diffusion Models by Optimizing Human Utility.
Shufan Li*, Konstantinos Kallidromitis, Akash Gokul, Yusuke Kato, Kazuki Kozukar
Aligning text-to-image models with human feedback.
-
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data.
Shufan Li*, Harkanwar Singh, Aditya Grover
Modeling multi-dimensional data with linear complexity.
-
xT: Nested Tokenization for Larger Context in Large Images .
Ritwik Gupta*, Shufan Li*, Tyler Zhu*, Jitendra Malik, Trevor Darrell, Karttikeya Mangalam
Long context visual perception on large images.
-
Hierarchical Open-vocabulary Universal Image Segmentation.
Xudong Wang*, Shufan Li*, Konstantinos Kallidromitis*, Yusuke Kato, Kazuki Kozuka, Trevor Darrell
Segmentating arbritrary objects and object parts using text prompts
-
Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning.
Colorado Reed*, Shufan Li*, Ritwik Gupta*, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, Trevor Darrell
Scale-Aware representation learning with prior information of image scale.
Preprints
-
Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation
Shufan Li, Jiuxiang Gu, Kangning Liu, Zhe Lin, Zijun Wei, Aditya Grover, Jason Kuen
Universal Multimodal Diffusion Model for Image Understanding, Generation, and Editing
-
LaViDa: A Large Diffusion Model for Vision-Language Understanding
Shufan Li, Konstantinos Kallidromitis, Hritik Bansal, Akash Gokul, Yusuke Kato, Kazuki Kozuka,
Jason Kuen, Zhe Lin, Kai-Wei Chang, Aditya Grover
Universal Multimodal Diffusion Language Model for Vision-Language Understanding
-
PopAlign: Population-Level Alignment for Fair Text-to-Image Generation
Shufan Li*, Harkanwar Singh, Aditya Grover
Aligning Diffusion Model For Fairness
-
Refine and Represent: Region-to-Object Representation Learning
Akash Gokul*, Konstantinos Kallidromitis*, Shufan Li*, Yusuke Kato, Kazuki Kozuka, Trevor Darrell, Colorado J Reed
Representation learning benefits by first learning from image regions and then learning from actual objects.
Technical Reports
-
Mercury: Ultra-Fast Language Models Based on Diffusion
Samar Khanna*, Siddhant Kharbanda*, Shufan Li*, Harshit Varma*, Eric Wang*
Sawyer Birnbaum
, Ziyang Luo
, Yanis Miraoui
, Akash Palrecha
Stefano Ermon
, Aditya Grover
, Volodymyr Kuleshov
Frontier Diffusion Language Models (with Inception Labs)
-
Interpreting Audiograms with Multi-stage Neural Networks
Shufan Li*, Congxi Lu, Linkai Li,Jirong Duan, Xinping Fu, Haoshuai Zhou
Accelerate Hearing Aid Fitting using Computer Vision (with Orka Labs)
-
Chart-RCNN: Efficient Line Chart Data Extraction from Camera Images
Shufan Li*, Congxi Lu, Linkai Li, Haoshuai Zhou
Line Chart Data Extraction in the wild.