Jieping Ye

Mining Brain Region Connectivity for Alzheimer's Disease Study via Sparse Inverse Covariance Estimation
July 1 10:30AM
Effective diagnosis of Alzheimer's disease (AD), the most common type of dementia in elderly patients, is of primary importance in biomedical research. Recent studies have demonstrated that AD is closely related to the structure change of the brain network, i.e., the connectivity among different brain regions. The connectivity patterns will provide useful imaging-based biomarkers to distinguish
Normal Controls (NC), patients with Mild Cognitive Impairment (MCI), and patients with AD. In this paper, we investigate the sparse inverse covariance estimation technique for identifying the connectivity among different brain regions. In particular, a novel algorithm based on the block coordinate descent approach is proposed for the direct estimation of the inverse covariance matrix. One appealing feature of the proposed algorithm is that it allows the user feedback (e.g., prior domain knowledge) to be incorporated into the estimation process, while the connectivity patterns can be discovered automatically. We apply the proposed algorithm to a collection of FDG-PET images from 232 NC, MCI, and AD subjects. Our experimental results demonstrate that the proposed algorithm is promising in revealing the brain region connectivity differences among these groups.
Large-Scale Sparse Logistic Regression
June 29 2:50PM
Logistic Regression is a well-known classification method that has been used widely in many applications of data mining, machine learning, computer vision, and bioinformatics. Sparse logistic regression embeds feature selection in the classification framework using the L1-norm regularization, and is attractive in many applications involving high-dimensional data. In this paper, we propose Lassplore for solving large-scale sparse logistic regression. Specifically, we formulate the problem as the L1-ball constrained smooth convex optimization, and propose to solve the problem using the Nesterov's method, an optimal first-order black-box method for smooth convex optimization. One of the critical issues in the use of the Nesterov's method is the estimation of the step size at each of the optimization iterations. Previous approaches either applies the constant step size which assumes that the Lipschitz gradient is known in advance, or requires a sequence of decreasing step size which leads to slow convergence in practice. In this paper, we propose an adaptive line search scheme which allows to tune the step size adaptively and meanwhile guarantees the optimal convergence rate. Empirical comparisons with several state-of-the-art algorithms demonstrate the efficiency of the proposed Lassplore algorithm for large-scale problems.
Mining Discrete Patterns via Binary Matrix Factorization
June 30 10:55AM
Mining discrete patterns in binary data is important for subsampling, compression, and clustering. We consider rank-one binary matrix approximations that identify the dominant patterns of the data, while preserving its discrete property. A best approximation on such data has a minimum set of inconsistent entries, i.e., mismatches between the given binary data and the approximate matrix. Due to the hardness of the problem, previous accounts of such problems employ heuristics and the resulting approximation may be far away from the optimal one. In this paper, we show that the rank-one binary matrix approximation can be reformulated as a 0-1 integer linear program (ILP). However, the ILP formulation is computationally expensive even for small-size matrices. We propose a linear program (LP) relaxation, which is shown to achieve a guaranteed approximation error bound. We further extend the proposed formulations using the regularization technique, which is commonly employed to address overfitting. The LP formulation is restricted to medium-size matrices, due to the large number of variables involved for large matrices. Interestingly, we show that the proposed approximate formulation can be transformed into an instance of the minimum s-t cut problem, which can be solved efficiently by finding maximum flows. Our empirical study shows the efficiency of the proposed algorithm based on the maximum flow. Results also confirm the established theoretical bounds.
Drosophila Gene Expression Pattern Annotation Using Sparse Features and Term-term Interactions
July 1 12:00PM
The Drosophila gene expression pattern images document the spatial and temporal dynamics of gene expression and they are valuable tools for explicating the gene functions, interaction, and networks during Drosophila embryogenesis. To provide text-based pattern searching, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with ontology terms manually by human curators. We present a systematic approach for automating this task, because the number of images needing text descriptions is now rapidly increasing. We consider both improved feature representation and novel learning formulation to boost the annotation performance. For feature representation, we adapt the bag-of-words scheme commonly used in visual recognition problems so that the image group information in the BDGP study is retained. Moreover, images from multiple views can be integrated naturally in this representation. To reduce the quantization error caused by the bag-of-words representation, we propose an improved feature representation scheme based on the sparse learning technique. In the design of learning formulation, we propose a local regularization framework that can incorporate the correlations among terms explicitly. We further show that the resulting optimization problem admits an analytical solution. Experimental results show that the representation based on sparse learning outperforms the bag-of-words representation significantly. Results also show that incorporation of the term-term correlations improves the annotation performance consistently.