搜索网站、位置和人员
新闻与活动 活动信息
交叉科学中心系列讲座CIS Seminar Series | Physics Meets Machine Learning: Past, Present, and Future
时间
2024年11月12日(周二)
下午16:10-17:40
地点
西湖大学云谷校区学术环E10-405
主持
西湖大学交叉科学中心讲席教授汤雷翰
受众
全体师生
分类
学术与研究
交叉科学中心系列讲座CIS Seminar Series | Physics Meets Machine Learning: Past, Present, and Future
时间: 2024年11月12日(周二)下午16:10-17:40
Time: 16:10-17:40 PM, Tuesday, Nov. 12, 2024
主持人: 西湖大学交叉科学中心讲席教授汤雷翰
Host: Dr. Leihan Tang, Chair Professor, the Center for Interdisciplinary Studies
地点:西湖大学云谷校区学术环E10-405
Venue: E10-405, Yungu Campus
讲座语言:英文
Lecture Language: English
Yuhai Tu, Ph.D.
IBM T. J. Watson Research Center
主讲人/Speaker:
Yuhai Tu received his PhD in theoretical physics from UCSD in 1991. He was a Division Prize Fellow at Caltech from 1991-1994. He joined IBM Watson Research Center as a Research Staff Member in 1994 and served as head of the Theory group during 2003-2015. He has been an APS Fellow since 2004 and served as the APS Division of Biophysics (DBIO) Chair in 2017. He is also a Fellow of AAAS. For his work in theoretical statistical physics, he was awarded (together with John Toner and Tamas Vicsek) the 2020 Lars Onsager Prize from APS: "For seminal work on the theory of flocking that marked the birth and contributed greatly to the development of the field of active matter."
讲座摘要/Abstract:
Most modern machine learning algorithms are based on artificial neural network (ANN) models originated from the marriage of two natural science disciplines—statistical physics and neuroscience. At their core, ANNs describe collective behaviors of a group of highly abstracted “neurons” interacting with each other in an adaptive way in a network that bears certain resemblance to the real neural network in the brain. Dynamics of ANNs effectively implement various computing algorithms, which allow them to memorize, compute, make associations, and learn.
In the first part of this talk, we will give a brief historical account of the development of ANN-based machine learning paradigms. We will focus on explaining the foundational discoveries and inventions based on statistical physics as exemplified by the Hopfield model and the Boltzmann machine, which are the cited works for the recent Nobel Prize in Physics awarded to Hopfield and Hinton.
Next, we will describe our recent work in developing a theoretical foundation for feedforward deep-learning neural networks underlying the current AI revolution by using a statistical physics approach. We will discuss our recent work [1-2] on the learning dynamics driven by stochastic gradient descend (SGD) and how SGD leads to flat minima in the loss function landscape. We will also discuss another recent work to explain why flat solutions are more generalizable and whether there are other measures for better generalization based on an exact duality relation we discovered between neuron activity and network weight [3].
Finally, we will discuss a few future directions that are worth pursuing by using physics-based approach, e.g., the neuro-scaling law observed in large language models and in-context-learning in transformer-based ANN models [4].
[1] “The inverse variance-flatness relation in Stochastic-Gradient-Descent is critical for finding flat minima”, Y. Feng and Y. Tu, PNAS, 118 (9), 2021.
[2] “Stochastic Gradient Descent Introduces an Effective Landscape-Dependent Regularization Favoring Flat Solutions”, Ning Yang, Chao Tang, and Y. Tu, Phys. Rev. Lett. (PRL) 130, 130 (23), 237101, 2023.
[3] “Activity–weight duality in feed-forward neural networks reveals two co-determinants for generalization”, Yu Feng, Wei Zhang, D Zhang, Y Tu, Nature Machine Intelligence, 2023.
[4] “Physics Meets Machine Learning: A Two-Way Street”, Herbert Levine and Yuhai Tu, PNAS, 121 (27), e240358012, 2024.
理学院,徐恺吟,邮箱:xukaiyin@westlake.edu.cn
School of Science, Kaiyin Xu, Email: xukaiyin@westlake.edu.cn