Delin Chen

My name is Delin Chen (陈德林). I'm currently a first-year master student at the University of Massachusetts Amherst, advised by Prof.Chuang Gan. Previously, I was an undergraduate student majoring Computer Science at Wuhan University (WHU), China. I was fortunate to work with Prof. Yu Wu, Prof. Zheng Wang.

My research interests lie in multimodal foundation model and embodied AI.

I'm open to any possible discussions or collaborate opportunities.

Email  /  CV  /  GitHub /  Google Scholar /  Twitter  / 

Research

I am excited to delve into the field of artificial intelligence, specifically in the realm of multimodal data. My research interests currently center around two main areas: a) Foundation models and multimodal reasoning and b) AI for healthcare.

Publications

(*=equal contribution)

FlexAttention for Efficient High-Resolution Vision-Language Models
Junyan Li, Delin Chen, Tianle Cai, Peihao Chen, Yining Hong, Zhenfang Chen, Yikang Shen, Chuang Gan
European Conference on Computer Vision (ECCV) , 2024
[Paper][Code][Project Page]

We present FlexAttention, a novel attention mechanism tha could be seamlessly plugged into most vision-language models to empower their abilities to perceive images with higher resolutions in an efficient manner.

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
Junyan Li, Delin Chen, Yining Hong, Zhenfang Chen, Peihao Chen, Yikang Shen, Chuang Gan
International Conference on Learning Representations (ICLR) , 2024
[Paper][Code][Project Page]

A remarkable ability of human beings resides in compositional reasoning, i.e., the capacity to make "infinite use of finite means". We propose Compositional VLM, which can guide the LLM to explicitly compose visual entities and relationships among the text and dynamically communicate with the vision encoder and detection network to achieve vision-language communicative decoding. Specifically, we first devise a set of novel communication tokens for the LLM, for dynamic communication between the visual detection system and the language system.

Scratch Each Other^s Back: Incomplete Multi-modal Brain Tumor Segmentation Via Category Aware Group Self-Support Learning
Yansheng Qiu*, Delin Chen*, Hongdou Yao, Yongchao Xu, Zheng Wang
IEEE International Conference on Computer Vision (ICCV), 2023
[Paper][Code]

We proposed Group Self-Support Learning framework to utilize the dominating characteristics of modalities to direct the distillation of mutual knowledge between modalities without expanding the complexity of the initial network. The results obtained SOTA on BraTs 2015, 2018 and 2020 datasets.

Modal-aware Visual Prompting for Incomplete Multi-modal Brain Tumor Segmentation
Yansheng Qiu, Ziyuan Zhao, Hongdou Yao, Delin Chen, Zheng Wang
ACM-MM, 2023
[Paper]

In this work, we introduce a novel incomplete multi-modal segmentation framework called Modal-aware Visual Prompting (MAVP), which draws inspiration from the widely used pre-training and prompt adjustment protocol employed in natural language processing (NLP). We utilize embeddings as the prompts generated by a modality state classifier that focuses on the missing modality states. Additionally, we integrate modality state prompts into both the extraction stage of each modality and the modality fusion stage to facilitate intra/inter-modal adaptation.

Query Re-Training for Modality-Gnostic Incomplete Multi-modal Brain Tumor Segmentation
Delin Chen, Yansheng Qiu, Zheng Wang
International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) Workshop on Multiscale Multimodal Medical Imaging, 2023
[Paper]

we proposed a Modality-Gnostic transformer module with learnable modality combination embeddings as queries to effectively handle all the modality-missing states. Furthermore, we adopt a query re-training mechanism to facilitate the model convergence to a better local minimum under small datasets.

TransRef: Multi-Scale Reference Embedding Transformer for Reference-Guided Image Inpainting
Liang Liao*, Taorong Liu*, Delin Chen, Jing Xiao, Zheng Wang, Chia-Wen Lin, Shin'ichi Satoh
IEEE Transactions on Image Processing (TIP), 2022 (Under Review)
[Paper] [Code] [Data]

In this paper, we investigate the concept of reference-guided image inpainting as a means of completing complex scenes with insufficient information. and propose a transformer-based encoder-decoder network with a multi-scale reference embedding procedure to address the issues of image alignment and content restoration in the presence of missing large regions.

Service
  • Reviewer for ACM MM'2024
  • Reviewer for WACV'2025
Academic Performance
  • GPA: 92/100 3.93/4.00
Related courses Artificial Intelligence. (96)
Computer Graphics. (100)
Machine Learning and Pattern Recognition. (96)
Data Structure. (93)
Computer Vision. (93)
Embedded System. (92)
Discrete Mathematics. (93)
Linear Algebra. (92)
Probability and Mathematical Statistics. (95)
Combinatorial Mathematics. (93)
Advanced Mathematics I(90)&II(91)
...
Honours and Membership
  • Wuhan University Excellent Student Award 2021,2022,2023
  • Leijun Undergraduate Computer Science Scholarship
  • A-Class Academic Excellence Scholarship (top 5% in WHU) 2023
  • B-Class Academic Excellence Scholarship 2021,2022
  • CCF (China Computer Federation) Elite Collegiate Award 2023
  • - 102 Students Nationwide


The source code is from Jon Barron's public academic website.