Zhizhang "Kevin" Hu

Zhizhang "Kevin" Hu

About Me

Hi, Welcome to my page! I am a research data scientist at Microsoft AI. I work on open-world multi-agent spatial intellegence, and multimodal large language models. I earned my Ph.D. in Electrical Engineering and Computer Science at University of California, Merced with a focus on multimodal learning and ubiquitous computing. Please feel free to contact me if you want to chat/ collaborate/ ask for the latest resume :-D.

Interests
  • Multimodal Large Language Model
  • Multi-agent Reasoning
Education
  • Ph.D. in Computer Science, 2024

    University of California, Merced

  • M.Sc in Building Science, 2020

    Carnegie Mellon University

  • B.Eng in Mechanical Engineering, 2018

    Southwest Jiaotong University

Experience

 
 
 
 
 
Microsoft AI
Research Data Scientist II
Microsoft AI
Jan 2025 – Present Redmond, Washington
Multi-agent spatial intellegence and open-world visual language reasoning.
 
 
 
 
 
University of California, Merced
Ph.D. Researcher
University of California, Merced
Aug 2020 – Sep 2024 Merced, California

Work in following areas:

  • Physics-informed multimodal sensing and learning algorithms.
  • Causal inference for deep learning bias reduction.
  • Smart healthcare with heterogeneous Internet of Things(IoT) systems.
  • Representation learning and transfer learning algorithms.
 
 
 
 
 
Microsoft AI
Data Scientist Intern
Microsoft AI
May 2024 – Aug 2023 Redmond, Washington
Working with multimodal large language models' knowledge distillation and fine-tuning on referral visual grounding tasks. Advised by Dr. Jianguo Long and Pak Kiu Chung. Supervised and advised by Dr. Ming Tan.
 
 
 
 
 
Amazon Inc.
Applied Scientist Intern
Amazon Inc.
May 2023 – Aug 2023 Palo Alto, California
Working on the visual cues-guided online text denoising for robust multimodal learning. Advised by Dr.Shasha Li, Dr. Ming Du. Supervised and advised by Dr. Arnab Dhua. Output paper: MM-LTP @CVPRW 2024.
 
 
 
 
 
Amazon Inc.
Applied Scientist Intern
Amazon Inc.
May 2022 – Aug 2022 Palo Alto, California
Working on the vision-language multimodal learning for language-guided compostional image-image retrieval. Advised by Dr.Xinliang Zhu, Dr. Son Tran and Dr. Rene Vidal. Supervised and advised by Dr. Arnab Dhua. Output paper: ProVLA @ ICCVW 2023.

Recent Publications

Quickly discover relevant content by filtering publications.
(2024). De-noised Vision-language Fusion Guided by Visual Cues for E-commerce Product Search. 2024 IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) Workshops.

PDF

(2024). IOTeeth: Intra-Oral Teeth Sensing System for Dental Occlusal Diseases Recognition. IMWUT 2024.

PDF DOI

(2023). ProVLA: Compositional Image Search with Progressive Vision-language Alignment and Multimodal Fusion. 2023 IEEE/CVF International Conference on Computer Vision (ICCV) Workshops.

PDF DOI

(2022). CIPhy: Causal Intervention with Physical Confounder from IoT Sensor Data for Robust Occupant Information Inference. SenSys 2022 AIoT Workshop.

PDF Cite DOI

(2022). MODES: multi-sensor occupancy data-driven estimation system for smart buildings. ACM e-Energy.

PDF Cite DOI

(2022). Demo: Real-Time Teeth Functional Occlusion Monitoring via In-Mouth Vibration Sensing. IPSN 2022.

PDF DOI

(2022). Poster: Sedentary Posture Muscle Monitoring via Active Vibratory Sensing. IPSN 2022 (Best Poster Award).

PDF DOI

(2021). AutoQual: task-oriented structural vibration sensing quality assessment leveraging co-located mobile sensing context. CCF Transactions on Pervasive Computing and Interaction.

PDF Cite

(2021). Footstep-Induced Floor Vibration Dataset: Reusability and Transferability Analysis. SenSys 2021 DATA Workshop.

PDF Cite Dataset

(2021). Poster: Vibration-based Indoor Occupant Gait Monitoring with Robot Vacuum Cleaners. IoTDI 2021.

PDF Cite

(2021). Vibration-based indoor human sensing quality reinforcement via Thompson sampling. Proceedings of the First International Workshop on Cyber-Physical-Human System Design and Implementation.

PDF Cite

(2020). A window-based sequence-to-one approach with dynamic voting for nurse care activity recognition using acceleration-based wearable sensor. UbiComp/ISWC 2020 Workshop(Best Paper Award).

PDF Cite

(2020). Fine-grained activities recognition with coarse-grained labeled multi-modal data. UbiComp/ISWC 2020 CML-IoT Workshop.

PDF Cite

(2020). Improving the Interoperability of gbXML Data Model through Redefining Data Mapping Rules of HVAC Systems. ASHRAE Transactions.

PDF Cite

(2019). Device-free sleep stage recognition through bed frame vibration sensing. BuildSys 2019 DFHS Workshop.

PDF Cite

Contact Me

  • 515010 NE 36th St, Redmond, WA 98052