Research + engineering

I am a researcher and algorithm engineer working on large-scale real-time audio-video understanding for video and livestream products. My recent work has focused on multimodal understanding, real-time perception, and selected generative directions for media applications.

I am particularly interested in problems that sit between research depth and practical impact: designing algorithms that remain strong under scale, latency, and complex product constraints. Much of my work is motivated by live and interactive media, where perception quality and robustness matter as much as model novelty.

Before joining TikTok, I worked on computer vision and multimedia understanding at YY Live (Baidu Group) and on applied research at Lenovo Machine Intelligence Center. I received my Ph.D. in Computer Science from Hong Kong Baptist University and was a visiting scholar at Michigan State University.

Current Focus

I focus on algorithms for understanding live and recorded media under real-time and large-scale constraints, with particular interest in multimodal reasoning, visual-audio perception, and research directions that can hold up beyond offline benchmarks.

Algorithms for live media understanding

Algorithm design for large-scale livestream and video understanding under real-time constraints.

Multimodal and generative directions

Multimodal reasoning and selective exploration of generative methods for media understanding and creation.

Algorithm direction and ownership

Shaping algorithm direction, evaluation standards, and technical priorities for important AI capabilities.

Experience Snapshot

Roles spanning livestream AI, audio-video understanding, and applied computer vision.

Current

Algorithm Engineer, TikTok

Leading core algorithm work in large-scale audio-video understanding and multimodal intelligence for global content products.

Previously

Senior Computer Vision Algorithm Engineer, YY Live (Baidu Group)

Led key algorithm directions in livestream understanding, real-time media AI, and creator-facing visual intelligence.

Earlier

Staff Researcher, Lenovo Machine Intelligence Center

Worked on applied computer vision research, reusable vision platforms, and intelligent system prototyping.

Academic foundation

Hong Kong Baptist University and Michigan State University

Ph.D. training in computer science, with research collaborations spanning biometric security, computer vision, and machine perception.

Selected Work

A compact selection across recent multimodal work and earlier research on secure biometrics and visual understanding.

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

ICLR 2026 Poster / arXiv 2025

A step toward fine-grained, context-aware region understanding for multimodal large language models.

SecureFace: Face Template Protection

IEEE Transactions on Information Forensics and Security, 2021

Face template protection with a focus on privacy, security, and deployable biometric systems.

On the Reconstruction of Face Images from Deep Face Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019

A widely cited study on reconstructing face images from deep templates and the security implications for biometrics.

LeapDetect: An Agile Platform for Inspecting Power Transmission Lines from Drones

IEEE ICDMW Demo, 2019

An applied computer vision system for transmission-line inspection using drone imagery.

Community

I regularly review for leading journals and conferences in computer vision, multimedia, and AI.

  • IEEE TPAMI
  • IEEE TIFS
  • IEEE TIP
  • IEEE TCYB
  • Pattern Recognition
  • IEEE/CAA Journal of Automatica Sinica
  • ICME
  • ICPR
  • ICASSP

I occasionally review internship and full-time resumes. You can reach me by email.