Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
Region-level multimodal understanding with a focus on precise pixel grounding and contextual reasoning.
My work spans multimodal LLMs, computer vision, biometric security, and applied multimedia understanding. For citation counts and the most up-to-date list, please refer to my public research profiles.
A compact view of the topics that have shaped my research trajectory, from multimodal reasoning to biometric security.
Region-level multimodal understanding with a focus on precise pixel grounding and contextual reasoning.
Template protection for face recognition with strong attention to privacy and security requirements.
A study of the reconstruction risks behind deep face templates and their security implications.