
搜索网站、位置和人员

新闻与活动 活动信息
理学讲坛Science Forum | Understanding Deep-Learning as a physicist: What would Einstein do?
时间
2024年3月28日(周四)
14:00-15:30
地点
西湖大学云谷校区E10-222教室
主持
西湖大学交叉科学中心 讲席教授汤雷翰
受众
全体师生
分类
学术与研究
理学讲坛Science Forum | Understanding Deep-Learning as a physicist: What would Einstein do?
时间:2024年3月28日(周四)14:00-15:30
Time:14:00-15:30, Thursday, March 28th, 2024
主持人: 西湖大学交叉科学中心 讲席教授汤雷翰
Host: Dr. Leihan Tang, Chair Professor, the Center for Interdisciplinary Studies
地址:西湖大学云谷校区E10-222教室
Venue: Room E10-222, Yungu Campus, Westlake University
涂豫海 研究员
Prof. Yuhai Tu,
Research Staff Member,
IBM Thomas J. Watson Research Center
主讲人/Speaker:
Yuhai Tu received his PhD in theoretical physics from UCSD in 1991. He was a Division Prize Fellow at Caltech from 1991-1994. He joined IBM Watson Research Center as a Research Staff Member in 1994 and served as head of the Theory group during 2003-2015. He has been an APS Fellow since 2004 and served as the APS Division of Biophysics (DBIO) Chair in 2017. He is also a Fellow of AAAS. For his work in theoretical statistical physics, he was awarded (together with John Toner and Tamas Vicsek) the 2020 Lars Onsager Prize from APS: "For seminal work on the theory of flocking that marked the birth and contributed greatly to the development of the field of active matter."
讲座摘要/Abstract:
Despite the great success of deep learning, it remains largely a black box. For example, the main search engine in deep neural networks is based on the Stochastic Gradient Descent (SGD) algorithm, however, little is known about how SGD finds ``good" solutions (low generalization error) in the high-dimensional weight space. In this talk, we will first give a general overview of SGD followed by a more detailed description of our recent work [1-3] on the SGD learning dynamics, the loss function landscape, and their relationship. If time permits, we will discuss a more recent work on trying to understand why flat solutions are more generalizable and whether there are other measures for better generalization based on an exact duality relation we found between neuron activity and network weight [4].
[1] “The inverse variance-flatness relation in Stochastic-Gradient-Descent is critical for finding flat minima”, Y. Feng and Y. Tu, PNAS 118 (9), 2021.
[2] “Phases of learning dynamics in artificial neural networks: in the absence and presence of mislabeled data”, Y. Feng and Y. Tu, Machine Learning: Science and Technology (MLST), July 19, 2021. https://iopscience.iop.org/article/10.1088/2632-2153/abf5b9/pdf
[3] “Stochastic Gradient Descent Introduces an Effective Landscape-Dependent Regularization Favoring Flat Solutions”, N. Yang, Chao Tang, and Y. Tu, Phys. Rev. Lett. 130, 237101, 2023.
[4]“Activity–weight duality in feed-forward neural networks reveals two co-determinants for generalization”, Yu Feng, Wei Zhang, D Zhang, Y Tu, Nature Machine Intelligence, 2023
讲座联系人/Contact:
理学院,徐恺吟,邮箱:xukaiyin@westlake.edu.cn
School of Science, Kaiyin Xu, Email: xukaiyin@westlake.edu.cn