Real-time audio-video understanding
Algorithms for understanding live media under strict real-time constraints, with emphasis on robustness and practical utility.
I am a researcher and algorithm engineer with a background that combines academic research, industrial R&D, and product-facing algorithm work in large-scale real-time audio-video understanding, computer vision, and multimodal AI.
Leading core algorithm work in large-scale audio-video understanding and multimodal intelligence for global content products.
Led key algorithm directions in livestream understanding, real-time media AI, and creator-facing visual intelligence.
Applied computer vision research, reusable vision platforms, and intelligent system prototyping.
A high-level summary of the algorithm areas I have worked on, without exposing sensitive implementation details.
Algorithms for understanding live media under strict real-time constraints, with emphasis on robustness and practical utility.
Methods for richer interpretation of audio-video content through multimodal reasoning and perception.
Selected work on problems where generation, understanding, and visual media enhancement need to work together.
Research and development across visual understanding, multimedia intelligence, and related perception problems.
Ph.D. in Computer Science, 2018
B.Eng. in Computer Science and Technology, 2013
Visiting Scholar, Pattern Recognition and Image Processing Lab
Real-time audio-video understanding, multimodal LLMs, livestream intelligence, generative design, and applied computer vision.
Courses include Web Intelligence and Information & Knowledge Management at Hong Kong Baptist University.
Excellent Teaching Assistant Performance Award, 2017.
Third Runner-up Award, ACM Hong Kong Collegiate Programming Contest, 2014.
Google Scholar, OpenReview, ORCID, and LinkedIn provide the most current public record.