Algorithms for live media understanding
Algorithm design for large-scale livestream and video understanding under real-time constraints.
I am a researcher and algorithm engineer working on large-scale real-time audio-video understanding for video and livestream products. My recent work has focused on multimodal understanding, real-time perception, and selected generative directions for media applications.
I am particularly interested in problems that sit between research depth and practical impact: designing algorithms that remain strong under scale, latency, and complex product constraints. Much of my work is motivated by live and interactive media, where perception quality and robustness matter as much as model novelty.
Before joining TikTok, I worked on computer vision and multimedia understanding at YY Live (Baidu Group) and on applied research at Lenovo Machine Intelligence Center. I received my Ph.D. in Computer Science from Hong Kong Baptist University and was a visiting scholar at Michigan State University.
I focus on algorithms for understanding live and recorded media under real-time and large-scale constraints, with particular interest in multimodal reasoning, visual-audio perception, and research directions that can hold up beyond offline benchmarks.
Algorithm design for large-scale livestream and video understanding under real-time constraints.
Multimodal reasoning and selective exploration of generative methods for media understanding and creation.
Shaping algorithm direction, evaluation standards, and technical priorities for important AI capabilities.
Roles spanning livestream AI, audio-video understanding, and applied computer vision.
Leading core algorithm work in large-scale audio-video understanding and multimodal intelligence for global content products.
Led key algorithm directions in livestream understanding, real-time media AI, and creator-facing visual intelligence.
Worked on applied computer vision research, reusable vision platforms, and intelligent system prototyping.
Ph.D. training in computer science, with research collaborations spanning biometric security, computer vision, and machine perception.
A compact selection across recent multimodal work and earlier research on secure biometrics and visual understanding.
A step toward fine-grained, context-aware region understanding for multimodal large language models.
Face template protection with a focus on privacy, security, and deployable biometric systems.
A widely cited study on reconstructing face images from deep templates and the security implications for biometrics.
An applied computer vision system for transmission-line inspection using drone imagery.
I regularly review for leading journals and conferences in computer vision, multimedia, and AI.
I occasionally review internship and full-time resumes. You can reach me by email.