Berkeley engineers win 2018 best paper in data mining

engineering-researchers (1)

IEOR PhD student Salar Fattahi and EECS assistant professor in residence Somayeh Sojoudi have won the INFORMS 2018 Data Mining Best Paper Award. Fattahi and Sojoudi presented their paper “Graphical Lasso and Thresholding: Equivalence and Closed-form Solutions” at the INFORMS Annual Meeting in Phoenix.

The paper can be found here.

Abstract:  Graphical Lasso (GL) is a popular method for learning the structure of an undirected graphical model, which is based on an l1 regularization technique. The first goal of this work is to study the behavior of the optimal solution of GL as a function of its regularization coefficient. We show that if the number of samples is not too small compared to the number of parameters, the sparsity pattern of the optimal solution of GL changes gradually when the regularization coefficient increases from 0 to infinity. The second objective of this paper is to compare the computationally-heavy GL technique with a numerically-cheap heuristic method for learning graphical models that is based on simply thresholding the sample correlation matrix. To this end, two notions of sign-consistent and inverse-consistent matrices are developed, and then it is shown that the thresholding and GL methods are equivalent if: (i) the thresholded sample correlation matrix is both sign-consistent and inverse-consistent, and (ii) the gap between the largest thresholded and the smallest un-thresholded entries of the sample correlation matrix is not too small. By building upon this result, it is proved that the GL method–as a conic optimization problem–has an explicit closed-form solution if the thresholded sample correlation matrix has an acyclic structure. This result is then generalized to arbitrary sparse support graphs, where a formula is found to obtain an approximate solution of GL. The closed-form solution approximately satisfies the KKT conditions for the GL problem and, more importantly, the approximation error decreases exponentially fast with respect to the length of the minimum-length cycle of the sparsity graph. The developed results are demonstrated on synthetic data, electrical circuits, functional MRI data, and traffic flows for transportation networks.