西湖工程讲座系列第79期 | Chunhua Shen 沈春华: An overview of recent work in large multimodal models: video generation and perception

新闻与活动活动信息

时间

2024年11月29日（周五）
10:00-11:30

地点

西湖大学云谷校区E10-211

主持

西湖大学工学院助理教授张驰

受众

全体师生

分类

学术与研究

西湖工程讲座系列第79期 | Chunhua Shen 沈春华: An overview of recent work in large multimodal models: video generation and perception

时间：2024年11月29日（周五）10:00-11:30

Time: 10:00-11:30, Friday, November 29, 2024

地点：西湖大学云谷校区E10-211

Venue: E10-211, Yungu Campus

主持人: 西湖大学工学院助理教授张驰

Host: Dr. Chi Zhang, Assistant Professor, Westlake University

语言：英文

Language: English

主讲嘉宾/Speaker：

Prof. Chunhua Shen 沈春华

Qiushi Chair Professor

School of Computer Science and Technology

Zhejiang University

主讲人简介/Biography:

Chunhua Shen is a Chair Professor at the College of Computer Science & Technology, Zhejiang University, a position he has held since 2022. Prior to this, he held various roles in Australia from 2002 to 2021, including: Principled Applied Scientist at Amazon Australia; Full Professor at the Australian Institute for Machine Learning, The University of Adelaide; Adjunct Professor at Monash University; Researcher at Australian Centre for Robotic Vision; NICTA (National ICT Australia); and Australian National University.

His research focuses on the intersection of computer vision and statistical machine learning. Professor Shen holds a PhD from The University of Adelaide and has also studied at Australian National University (MPhil), and Nanjing University, China (BSc and MSc). Notable awards and honors include: Australian Research Council Future Fellowship (2012), and Distinguished Professorship of the Changjiang Scholars Programme (2021). His Google Scholar citation count is ~80,000 with an H-index of 128.

讲座摘要/Abstract:

In this talk, I will give an overview of some of my recent work in the area of large multimodal models. In particular, I am interested in video generation and multi-modality perception. We propose a novel hierarchical framework that integrates the strengths of autoregressive models with diffusion-based rendering to pioneer long-duration video generation with intricate plot progressions and high visual fidelity. Second, we propose a method termed Framer for interactive frame interpolation, which targets producing smoothly transitioning frames between two images as per user creativity. I will also briefly some relevant work we did in multi-modal understanding.

讲座联系人/Contact:

符丁文

fudingwen@westlake.edu.cn