华球城在线注册:2016-06-01
Probabilistic Models for Aggregate Analysis of Large Scale Non-Gaussian Data | |
华球城在线注册:2016-06-01 来源: | |
时间:6月3日上午9:00 地点:天津大学25教学楼三层A教室 题目:Probabilistic Models for Aggregate Analysis of Large Scale Non-Gaussian Data 主讲人:吕萌 主讲人简介 Meng Lu received her B.S degree in computer science from China University of Mining and Technology, Beijing in 2007; and her Ph.D. in computer engineering from Texas A&M University, College Station in 2015. Lu's research interests include data mining, statistical machine learning, and their applications in electronic commerce and bioinformatics. 报告简介: The big objective of many big data applications is to perform association analysis and create predictive models using data mining methods. However, the high dimensionality and complex data types of large-scale data pose great challenges nowadays. In this talk, I will introduce my research efforts towards providing effective and scalable tools for dimension reduction and aggregate association analysis of large-scale non-Gaussian data. A sparse exponential family PCA (SePCA) method is developed to perform sparse dimension reduction for non-Gaussian data, e.g., binary data, categorical data and counts, that can be assumed following the distributions in the exponential family. The regularization of the principal component loading vectors is involved in this model. We derived closed-form updating rules to solve the formulated optimization problem, leading to high computational efficiency. A key contribution of my research is a scalable and effective dimension reduction method for large-scale complex data. SePCA can also be extended for supervised learning. The supervised SePCA provides a hierarchical understanding of the mechanism in the association analysis for large-scale data. As a matrix factorization method, SePCA also has wide contributions in electronic commerce, bioinformatics and other areas. | |
相关文章 | |