This page acts as a topic map for the research areas that connect my publications and engineering work.

Multimodal LLMs

Fine-grained visual grounding, contextual reasoning, and precise region understanding for multimodal models.

Audio-video understanding

Large-scale understanding for video and livestream products, especially in real-time and high-traffic settings.

Biometric security

Face template protection, reconstruction risk, and privacy-aware biometric representation learning.

Applied computer vision systems

Building practical systems that connect model capability with deployment constraints and measurable user value.

Representative Keywords

If you arrive here from an old search result, the best next stop is usually the publications page.

  • Multimodal large language models
  • Pixel understanding
  • Region grounding
  • Computer vision
  • Audio-video understanding
  • Biometric security
  • Template protection
  • Privacy
  • Applied AI systems