We consider a popular optimization formulation arising from variable selection in statistical modeling. Such optimization problems concern two criteria, one of which being a loss function that measures how well our model (to be estimated) fit the data available, while the other being a ``complexity” measure of the model. Well studied cases include the sparse regression, where many proposed convex (e.g., lasso) and non-convex methods have contributed greatly in advancing high-dimensional data analysis. However in many applications additional logic conditions are necessary, such as, certain variables must enter the model before some other variables, or variables selected should follow certain group structures, in order to develop models that are easily interpretable.
We propose a flexible constraint framework that can model (almost any kind of) logic conditions. It consists of a composition of linear systems of inequalities with binary indicator functions. Such constraints are not just nonconvex, but also involve discontinuity.
We address a few fundamental questions, such as closedness, set convergence (in Pompeiu-Hausdorff distance) of continuous approximations, and (algebraic) characterizations of tangent cones. Our results lay a mathematical foundation for algorithm design for this class of problems.
This is a (very) recent joint work with Jong-Shi Pang (USC) and Miju Ahn (USC).