Guangcan Mai

Research + engineering

I am a researcher and algorithm engineer working on large-scale real-time audio-video understanding for video and livestream products. My recent work has focused on multimodal understanding, real-time perception, and selected generative directions for media applications.

I am particularly interested in problems that sit between research depth and practical impact: designing algorithms that remain strong under scale, latency, and complex product constraints. Much of my work is motivated by live and interactive media, where perception quality and robustness matter as much as model novelty.

Before joining TikTok, I worked on computer vision and multimedia understanding at YY Live (Baidu Group) and on applied research at Lenovo Machine Intelligence Center. I received my Ph.D. in Computer Science from Hong Kong Baptist University and was a visiting scholar at Michigan State University.

View Publications Resume Google Scholar LinkedIn

Current Focus

I focus on algorithms for understanding live and recorded media under real-time and large-scale constraints, with particular interest in multimodal reasoning, visual-audio perception, and research directions that can hold up beyond offline benchmarks.

Experience Snapshot

Roles spanning livestream AI, audio-video understanding, and applied computer vision.

Current

Algorithm Engineer, TikTok

Leading core algorithm work in large-scale audio-video understanding and multimodal intelligence for global content products.

Previously

Senior Computer Vision Algorithm Engineer, YY Live (Baidu Group)

Led key algorithm directions in livestream understanding, real-time media AI, and creator-facing visual intelligence.

Earlier

Staff Researcher, Lenovo Machine Intelligence Center

Worked on applied computer vision research, reusable vision platforms, and intelligent system prototyping.

Academic foundation

Hong Kong Baptist University and Michigan State University

Ph.D. training in computer science, with research collaborations spanning biometric security, computer vision, and machine perception.

Selected Work

A compact selection across recent multimodal work and earlier research on secure biometrics and visual understanding.

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

ICLR 2026 Poster / arXiv 2025

A step toward fine-grained, context-aware region understanding for multimodal large language models.

OpenReview arXiv

SecureFace: Face Template Protection

IEEE Transactions on Information Forensics and Security, 2021

Face template protection with a focus on privacy, security, and deployable biometric systems.

DOI

On the Reconstruction of Face Images from Deep Face Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019

A widely cited study on reconstructing face images from deep templates and the security implications for biometrics.

DOI arXiv

LeapDetect: An Agile Platform for Inspecting Power Transmission Lines from Drones

IEEE ICDMW Demo, 2019

An applied computer vision system for transmission-line inspection using drone imagery.

DOI

Community

I regularly review for leading journals and conferences in computer vision, multimedia, and AI.

IEEE TPAMI
IEEE TIFS
IEEE TIP
IEEE TCYB
Pattern Recognition
IEEE/CAA Journal of Automatica Sinica
ICME
ICPR
ICASSP

I occasionally review internship and full-time resumes. You can reach me by email.