Computer Science Phd Student at UCLA
Original Works
Publications
-
Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning.
Colorado Reed*, Shufan Li*, Ritwik Gupta*, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, Trevor Darrell
Scale-Aware representation learning with prior information of image scale.
-
Hierarchical Open-vocabulary Universal Image Segmentation.
Xudong Wang*, Shufan Li*, Konstantinos Kallidromitis*, Yusuke Kato, Kazuki Kozuka, Trevor Darrell
Segmentating arbritrary objects and object parts using text prompts
-
xT: Nested Tokenization for Larger Context in Large Images .
Ritwik Gupta*, Shufan Li*, Tyler Zhu*, Jitendra Malik, Trevor Darrell, Karttikeya Mangalam
Long context visual perception on large images.
-
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data.
Shufan Li*, Harkanwar Singh, Aditya Grover
Modeling multi-dimensional data with linear complexity.
-
Aligning Diffusion Models by Optimizing Human Utility.
Shufan Li*, Konstantinos Kallidromitis, Akash Gokul, Yusuke Kato, Kazuki Kozukar
Aligning text-to-image models with human feedback.
-
SegLLM: Multi-round Reasoning Segmentation
XuDong Wang*, Shaolun Zhang*, Shufan Li*, Konstantinos Kallidromitis, Kehan Li, Yusuke Kato, Kazuki Kozuka, Trevor Darrell
Multi-round interactive Segmention using Large Language Model
-
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following.
Shufan Li*, Harkanwar Singh, Aditya Grover
Image editing following multi-modal instructions.
-
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Shufan Li*, Konstantinos Kallidromitis*, Akash Gokul*, Zichun Liao, Yusuke Kato, Kazuki Kozuka, Aditya Grover
Universal Multimodal Diffusion Generative Model for Image, Audio and Text
Preprints
-
LaViDa: A Large Diffusion Model for Vision-Language Understanding
Shufan Li, Konstantinos Kallidromitis, Hritik Bansal, Akash Gokul, Yusuke Kato, Kazuki Kozuka,
Jason Kuen, Zhe Lin, Kai-Wei Chang, Aditya Grover
Universal Multimodal Diffusion Language Model for Vision-Language Understanding
-
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
Shufan Li*, Konstantinos Kallidromitis*, Akash Gokul*, Arsh Koneru, Yusuke Kato, Kazuki Kozuka, Aditya Grover
In-Context Reflection Improves Text-to-Image Generation
-
PopAlign: Population-Level Alignment for Fair Text-to-Image Generation
Shufan Li*, Harkanwar Singh, Aditya Grover
Aligning Diffusion Model For Fairness
-
Refine and Represent: Region-to-Object Representation Learning
Akash Gokul*, Konstantinos Kallidromitis*, Shufan Li*, Yusuke Kato, Kazuki Kozuka, Trevor Darrell, Colorado J Reed
Representation learning benefits by first learning from image regions and then learning from actual objects.
Technical Reports
-
Mercury: Ultra-Fast Language Models Based on Diffusion
Samar Khanna*, Siddhant Kharbanda*, Shufan Li*, Harshit Varma*, Eric Wang*
Sawyer Birnbaum
, Ziyang Luo
, Yanis Miraoui
, Akash Palrecha
Stefano Ermon
, Aditya Grover
, Volodymyr Kuleshov
Frontier Diffusion Language Models (with Inception Labs)
-
Interpreting Audiograms with Multi-stage Neural Networks
Shufan Li*, Congxi Lu, Linkai Li,Jirong Duan, Xinping Fu, Haoshuai Zhou
Accelerate Hearing Aid Fitting using Computer Vision (with Orka Labs)
-
Chart-RCNN: Efficient Line Chart Data Extraction from Camera Images
Shufan Li*, Congxi Lu, Linkai Li, Haoshuai Zhou
Line Chart Data Extraction in the wild.