I am currently a first-year ms student in computer science at the University of Massachusetts Amherst, working with Prof. Chuang Gan. Previously, I was an undergraduate at Wuhan University.
My research interest lies in multimodal foundation model and (embodied) agent.
🔥 News
- 2024.09: I serve as a reviewer for ICLR 2025.
- 2024.08: FlexAttention is accepted by ECCV 2024. Check our Project Page and also GitHub
- 2024.07: I serve as a reviewer for WACV 2025.
- 2024.01: CoVLM is accepted by ICLR 2024. Check our Project Page and also GitHub.
- 2024.02: I serve as a reviewer for ACM MM 2024.
- 2023.10: One paper is accepted at MMMI@MICCAI 2023 and one paper is accepted by ACM MM 2023.
- 2023.10: I attended China National Computer Congress (CNCC) and was awarded the honor of CCF (China Computer Federation) Elite Collegiate Award (102 Students nation-wide).
- 2023.08: GSS is accepted by ICCV 2023, a novel self-distillation algorithm for multimodal data.
- 2023.06: Introduce TransRef, first transformer-based model for Reference-Guided Image Inpainting. Check our Github.
📝 Publications
† Equal Contribution
Autonomous Agents from Automatic Reward Modeling and Planning
Zhenfang Chen†, Delin Chen†, Rui Sun†, Wenjun Liu†, Chuang Gan
- A new framework for LLM-based agents using automatic reward models.
- We build effective, scalable, and controllable reward models to integrate Reflexion and MCTS.
FlexAttention for Efficient High-Resolution Vision-Language Models
Junyan Li, Delin Chen, Tianle Cai, Peihao Chen, Yining Hong, Zhenfang Chen, Yikang Shen, Chuang Gan
Paper | GitHub | Project Page |
- FlexAttention is a plug-and-play attention module that can enhance VLMs’ ability to perceive details in high resolution image in an efficient way.
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
Junyan Li, Delin Chen, Yining Hong, Zhenfang Chen, Peihao Chen, Yikang Shen, Chuang Gan
Paper | GitHub | Project Page|
- CoVLM is specifically designed to guide the VLM to explicitly compose visual entities and relationships among the text and dynamically communicate with the vision encoder and detection network to achieve vision-language communicative decoding. It boosts the compositional reasoning ability of VLMs and achieve SoTA performance on various tasks involving compositional analysis.
Scratch Each Other’s Back: Incomplete Multi-Modal Brain Tumor Segmentation via Category Aware Group Self-Support Learning
Yansheng Qiu†, Delin Chen†, Hongdou Yao, Yongchao Xu, Zheng Wang
- We proposed Group Self-Support Learning framework to utilize the dominating characteristics of modalities to direct the distillation of mutual knowledge between modalities without expanding the complexity of the baseline network. The result obtained SOTA on BraTs 2015, 2018 and 2020 datasets.
TransRef: Multi-Scale Reference Embedding Transformer for Reference-Guided Image Inpainting
Liang Liao†, Taorong Liu†, Delin Chen, Jing Xiao, Zheng Wang, Chia-Wen Lin, and Shin’ichi Satoh
- Transref utilizes a reference image that depicts a scene similar to the corrupted image to facilitate image inpainting.
🎡 Service
- Reviewer for ICLR’2025
- Reviewer for WACV’2025
- Reviewer for ACM MM’2024
🎖 Honors and Awards
- 2024.04 Outstanding Graduate of Wuhan University.
- 2023.11 Leijun Undergraduate Computer Science Scholarship Wuhan University
- 2023.10 CCF (China Computer Federation) Elite Collegiate Award (CCF优秀大学生) (102 Students nation-wide) China Computer Federation
- 2023.09 First Class Scholarship (Award Rate: 5% school-wide) Wuhan University
- 2021, 2022, 2023 Excellent Student Award Wuhan University
📖 Educations
-
University of Massachusetts Amherst
MS, Computer Science, 2024.09 - (now),
Advisor: Prof. Chuang Gan
-
Wuhan University
Undergraduate, Computer Science, 2020.09 - 2024.06,
Advisor: Prof. Zheng Wang