Multimodal LLMs
Fine-grained visual grounding, contextual reasoning, and precise region understanding for multimodal models.
This page acts as a topic map for the research areas that connect my publications and engineering work.
Fine-grained visual grounding, contextual reasoning, and precise region understanding for multimodal models.
Large-scale understanding for video and livestream products, especially in real-time and high-traffic settings.
Face template protection, reconstruction risk, and privacy-aware biometric representation learning.
Building practical systems that connect model capability with deployment constraints and measurable user value.
If you arrive here from an old search result, the best next stop is usually the publications page.