基本信息
浏览量:340
职业迁徙
个人简介
My research focuses on scalable machine learning algorithms and systems over relational data. Specifically, it explores the fundamental connections between data preparation, data integration, and knowledge management with statistical machine learning and probabilistic inference:
Generative Models for Data Quality: We are exploring the fundamental connections between data cleaning and generative machine learning. The HoloClean project introduced Generative Machine Learning to the problem of data cleaning: We showed how to model data cleaning as statistical learning problem, how attention-based mechanisms and self-supervised learning can automate data cleaning and introduced multiple theoretical results on how to deal with noisy/dirty data. More recently we are exploring the synergies between data cleaning and machine learning deployments in the Picket project. This talk at the Stanford MLsys Seminar provides an overview.
Neural Relational Engines over Billion-scale Data: We are developing a new paradigm of systems to make the use of deep learning models over billion-scale structured data easier, faster, and cheaper. We have started with the Marius project that focuses on a key bottleneck in the development of machine learning systems over large-scale graph data: data movement during training. Marius addresses this bottleneck with a novel data flow architecture that maximizes resource utilization of the entire memory hierarchy (including disk, CPU, and GPU memory). Marius is under active development and available as an open-source project. You can learn more about Marius from our recent OSDI`21 and MLOpsWorld talks.
Generative Models for Data Quality: We are exploring the fundamental connections between data cleaning and generative machine learning. The HoloClean project introduced Generative Machine Learning to the problem of data cleaning: We showed how to model data cleaning as statistical learning problem, how attention-based mechanisms and self-supervised learning can automate data cleaning and introduced multiple theoretical results on how to deal with noisy/dirty data. More recently we are exploring the synergies between data cleaning and machine learning deployments in the Picket project. This talk at the Stanford MLsys Seminar provides an overview.
Neural Relational Engines over Billion-scale Data: We are developing a new paradigm of systems to make the use of deep learning models over billion-scale structured data easier, faster, and cheaper. We have started with the Marius project that focuses on a key bottleneck in the development of machine learning systems over large-scale graph data: data movement during training. Marius addresses this bottleneck with a novel data flow architecture that maximizes resource utilization of the entire memory hierarchy (including disk, CPU, and GPU memory). Marius is under active development and available as an open-source project. You can learn more about Marius from our recent OSDI`21 and MLOpsWorld talks.
研究兴趣
论文共 81 篇作者统计合作学者相似作者
按年份排序按引用量排序主题筛选期刊级别筛选合作者筛选合作机构筛选
时间
引用量
主题
期刊级别
合作者
合作机构
ACM / IMS Journal of Data Scienceno. 2 (2024): 1-27
CoRRno. 2 (2023): 197:1-197:25
CoRR (2023): 253-259
引用1浏览0EI引用
1
0
Proc. VLDB Endow.no. 11 (2023): 2962-2975
引用0浏览0EI引用
0
0
Kun Qian, Anton Belyi,Fei Wu,Samira Khorshidi, Azadeh Nikfarjam, Rahul Khot, Yisi Sang, Katherine Luna, Xianqi Chu, Eric Choi, Yash Govind,Chloe Seivwright,
CoRR (2023)
引用0浏览0EI引用
0
0
Ali Mousavi, Xin Zhan,He Bai,Peng Shi,Theo Rekatsinas, Benjamin Han,Yunyao Li,Jeff Pound,Josh Susskind, Natalie Schluter,Ihab Ilyas,Navdeep Jaitly
CoRR (2023)
引用0浏览0EI引用
0
0
PROCEEDINGS OF THE 2ND ACM WORKSHOP ON SUSTAINABLE COMPUTER SYSTEMS, HOTCARBON 2023 (2023): 4:1-4:8
arXiv (Cornell University) (2023): 87:1-87:15
加载更多
作者统计
合作学者
合作机构
D-Core
- 合作者
- 学生
- 导师
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn